Article about this journal's infrastructure #3
4 changed files with 98 additions and 0 deletions
1
.gitattributes
vendored
Normal file
1
.gitattributes
vendored
Normal file
|
@ -0,0 +1 @@
|
||||||
|
assets/ filter=lfs diff=lfs merge=lfs -text
|
BIN
assets/giga_chad.jpg
Normal file
BIN
assets/giga_chad.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 9.7 KiB |
4
assets/the_pipeline.svg
Normal file
4
assets/the_pipeline.svg
Normal file
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 201 KiB |
93
src/how_to_run_a_journal.dj
Normal file
93
src/how_to_run_a_journal.dj
Normal file
|
@ -0,0 +1,93 @@
|
||||||
|
# How to Run a Journal
|
||||||
|
|
||||||
|
Hi! I'm Isaac Mills, I'm the guy managing the infrastructure behind Compute! In this article, I'd like to talk about just that: the infra behind this journal, how it all works, and why it is the way it is.
|
||||||
|
|
||||||
|
## Plain text
|
||||||
|
|
||||||
|
Plain text is kind of insane. It's capable of being anything, and can also be transmuted into anything. It's infinite extensibility makes it a powerful tool that every developer should have in their arsenal. For this journal, we use a lot of plain text. In fact, the article you're reading right now is written in plain text, _not with some web UI_. A while back, I found a markup language called [djot](https://djot.net). It was created by the same person who created CommonMark, a flavor of markdown, to be easier to parse and more featureful. Below is some example djot
|
||||||
|
|
||||||
|
```djot
|
||||||
|
# Heading
|
||||||
|
|
||||||
|
paragraph
|
||||||
|
|
||||||
|
*bold* _italic_ _*bold italic*_ {-strikethorugh-} {+underline+}
|
||||||
|
|
||||||
|
- list
|
||||||
|
- list
|
||||||
|
|
||||||
|
1. list
|
||||||
|
2. list
|
||||||
|
```
|
||||||
|
|
||||||
|
The benefit of using djot is that it compiles directly to HTML, thus the jorunalists who have joined Compute don't need to learn HTML to write articles. They also don't need to learn a clunky slow website editor like Wix or Squarespace.
|
||||||
|
|
||||||
|
This is another superpower of plain text, if we used Wix for our website, our journalists would need to learn how to use the Wix UI, and how to write articles _for_ that UI. If we needed to change our tooling at any time, they would need to re-learn everything for _that_ tool. Not only that, but we would need to port the entire journal (_every_ article), to use that new tooling. This is not so with plain text, if things change in the pipeline, or even if you're just joining the journal, there's no need to re-learn how to write text. At worst, you just need to convert the plain text to another format of plain text (djot to HTML for example). All our journalists need to know how to do is write their articles in djot, and submit it to the team via the pipeline
|
||||||
|
|
||||||
|
## The Pipeline
|
||||||
|
|
||||||
|
![A flowchart of the pipeline](assets/the_pipeline.svg)
|
||||||
|
|
||||||
|
Pictured above is the full pipeline that articles go through in order to reach you readers at home, it goes like this
|
||||||
|
|
||||||
|
1. Articles are written by our journalists in a plain text format (djot in our case)
|
||||||
|
2. Once an article is done, the journalist who wrote the article opens a pull request on our git repository with their new article and associated assets
|
||||||
|
3. The article can be reviewed by the team and edits can be made by them.
|
||||||
|
4. Once the article has been edited, the pull request gets merged into the main branch of our git repository, which is where the articles you see live
|
||||||
|
5. From there, the article goes through CI and gets deployed (we'll get into that in more detail later)
|
||||||
|
|
||||||
|
Basically, this is just the workflow you would use for code, but adapted for journalism. In other words, no learning curve for our journalists! And if they do need to learn it, then this is information that they *should* know _anyway_. The workflow you see above has been in the making since git was created in 2005, with the sole purpose of efficiently moving code from development, into production. The more efficiently we can accomplish that, and the more bad code we can filter out of production, the better. If this workflow has worked for nearly 2 decades for a pathologically huge project like the Linux kernel (which git was tailor-made to handle), the better.
|
||||||
|
|
||||||
|
## CI
|
||||||
|
|
||||||
|
Consider the following: If I'm accepting untrusted code from the public into my open source project, and I need that code to be production ready, how can I ensure that the code I accept _is_ actually production ready. The answer is with _continuous integration_, or _CI_. The idea is that every piece of code submitted to an open source project would undergo automated testing, linting, and checking to ensure that nothing will break upon merging the code into the production code base. For a project like [`egui`](https://lib.rs/crates/egui), their CI pipeline contains 19 checks.
|
||||||
|
|
||||||
|
Their pipeline checks if the library with your new code compiles to every platform it's compatible with, with every feature enabled. It also makes sure that your code is well-formatted, contains no conflicts of license, uses no libraries banned by the project, and contains no security advisories. The _only_ way this many checks can be done on every git commit, is through CI, GitHub Actions in egui's case.
|
||||||
|
|
||||||
|
If you're making an open source project, and it becomes big enough to pull in contributions from a lot of developers, CI can not only serve as a means to filter bugs out of pull requests, but also as a way to communicate to open source developers _what a project wants_ out of their code. Instead of having to read a big `CONTRIBUTORS.md` file to get an idea of that, developers can know that their code is good quality if it just passes CI.
|
||||||
|
|
||||||
|
Fortunately, the level of CI I've described above is not required for journalism. Our CI simply compiles our journalists' unreviewed articles, and serves them on an un-indexed (not visible on production) web page so that they and the team can preview their work before merging it. Our CI is also responsible for indexing and publishing finished articles onto our production website. We could get the CI to do an automated grammar check, but there are too many tech-terms that the checker would need to know in order for it to pass consistently.
|
||||||
|
|
||||||
|
## Deployment
|
||||||
|
|
||||||
|
Deployment is the most complex part of our pipeline; getting the written, production ready articles, onto the website _you_ are reading this on. As I said earlier in the article, we use CI to compile and index finished articles. The CI tool we use is called [Woodpecker CI](https://woodpecker-ci.org/), a self-hosted docker-based CI tool. Self-hosted basically means that we can run the CI tool on the same server we use to serve our website, making deployment to there as easy as moving the generated files into the directory that our web server is serving (NGINX in our case). What's important to know is that when a pull request is opened on this journal's git repository, and when a pull request is merged into production, Woodpecker CI will run a custom program that I wrote in Rust to...
|
||||||
|
|
||||||
|
- Compile djot articles to HTML
|
||||||
|
- Minify and compress compiled HTML
|
||||||
|
- Index articles with our search engine
|
||||||
|
|
||||||
|
Because our CI tool is running this code, we can know which articles need compilation, who wrote those articles, and if any articles need to be deleted. Via environment variables, our CI tool will pass to our code the git branch we're running on, the git commit that came before the latest one, and the latest git commit. Our code can then use this information to...
|
||||||
|
|
||||||
|
- Run a diff between the two commits, which is how we know which files need to be compiled, and which files have been deleted
|
||||||
|
- Run a blame on the new articles, which is how we figure out who wrote them
|
||||||
|
- See if we are we have changed the main branch, and index new articles if so
|
||||||
|
|
||||||
|
In our case, what CI allows us to do is keep as much of our pipeline as automated as possible. Our journalists should only need to focus on writing good articles, not wrestling with tooling. Coming back to the benefits of plain text, git is an extremely powerful tool for working with plain text.
|
||||||
|
|
||||||
|
- It allows us to separate the WIP and the finished articles
|
||||||
|
- It allows us to keep an accurate and automated reference of who wrote and edited each article
|
||||||
|
- It allows a copy of the entire journal to be stored in many different places as backups.
|
||||||
|
- It allows us to easily sync new articles and changes to any git-compatible software forge of our choice (we use [forgejo](https://forgejo.org/))
|
||||||
|
|
||||||
|
By and large, with the power of git, plain text can fill more use cases than you could possibly imagine.
|
||||||
|
|
||||||
|
## Our webpages (and staying based)
|
||||||
|
|
||||||
|
![An image of the giga-chad](assets/giga_chad.jpg)
|
||||||
|
|
||||||
|
The modern web sucks. Most webpages are not only bloated with ads, cookie banners, autoplaying BS, popups and the like; most webpages are also inundated with copious amounts of JavaScript. We only use JavaScript in 2 places
|
||||||
|
|
||||||
|
1. On our homepage to power the search bar and display articles
|
||||||
|
2. Our web design tool, Webflow, bundles a small amount of JavaScript in every page (more on that later)
|
||||||
|
|
||||||
|
Other than that, the actual article pages, such as this one, depend on nothing but the JavaScript that Webflow bundles in. And for our homepage, it's built and optimized so it can be served statically with it's _one_ dependency. Basically, I wanted to make our website as [suckless](https://suckless.org/philosophy/) (as lightweight, and as free from bloat) as possible. I say _I_ wanted to because our founder wanted to use Wix originally, yuck.
|
||||||
|
|
||||||
|
Instead of _that_, I used [Webflow](https://webflow.com/) to design our webpages. For a one-time fee of $24, you can have access to the Webflow editor for 1 month, and then export your web pages to HTML/CSS/JS. Webflow is very different from your average Wix/Squarespace, those editors are designed for non-programmers who don't know and don't want to know HTML or CSS. Webflow is an editor for _developers_ that know what they're doing. It gives you the full power of HTML and CSS in a responsive, visual editor; making it an incredibly flexible tool capable of generating very based and performant webpages. Unlike the latter tools which generate bloated and obfuscated garbage.
|
||||||
|
|
||||||
|
## Search engine
|
||||||
|
|
||||||
|
I was expecting to have to put most of the effort of getting our website ready into getting a search bar working. Instead, I found [Meilisearch](https://www.meilisearch.com/), and I fell in love immediately. Not only is it incredibly easy to deploy _and_ use, but it's also much more than just a search engine. You can basically use it as a way to index and display pages on your website. And by using [instantsearch-meilisearch](https://github.com/meilisearch/meilisearch-js-plugins/tree/main/packages/instant-meilisearch) (that one dependency that our homepage uses), we can just make an article browser and search experience, _incredibly_ easily. It's not just easy on the front-end, the [meilisearch-sdk](https://lib.rs/crates/meilisearch-sdk) is also incredibly easy to use to index articles. Meilisearch, despite being a complex full-text search engine, is incredibly simple to use. Just initialize a search index, and send documents to it in a structured JSON format. On the front end, you can get that JSON back with a simple HTTP request containing the search query and field constraints (sorting, only sending snippets of big fields, etc.). All in all, Meilisearch's unexpected simplicity suprised me in a big way, and it made my job so _sooo_ much easier!
|
||||||
|
|
||||||
|
## In conclusion
|
||||||
|
|
||||||
|
Computers have an inconcievable amount of potential, but they're only as smart as their programmer. When you're building something with a computer, it's often much better to do more with less, than less with more. Don't use 17 different JavaScript frameworks with your hypervisor GPU WEB2.0 interface-driven scripting framework to drive your map/reduce-aware proxy-oriented software API. Just start simple, build simple, and if you need complexity, create it with the simple. Often times in computer software, plain text is the simplest place to start. From there, you can add complexity by processing the plain text in some way (hell, I made a whole ass [PowerPoint clone](https://github.com/StratusFearMe21/grezi-next) centered around plain text). If you need more than plain text, try the terminal. And if you need more than that, try [wxWidgets](https://www.wxwidgets.org/). The point I'm trying to make here is that bloat is your enemy, and it's often better for you, your team, and your users to just KISS (Keep it simple, stupid!)
|
Loading…
Reference in a new issue