Baby CTO: Toolkit

Documentation Archaeology: How to Extract Knowledge from Abandoned Codebases with AI

Rémy — Sat, 02 Aug 2025 09:42:25 GMT

You need to work on a legacy project. The engineer who knew all about it left the company a year ago. Because the deadline was too tight, they didn’t take time to write a documentation. They also didn’t take a step back on the architecture and simply piled up technical debt. Welcome to what is essentially the digital version of Man versus Wild!

The good news is: this is a strong use-case for generative AI. Many of you will have strong reservations against clankers, but the general rule of thumb with LLMs is that they are good to make something long shorter.

More specifically, if you feed them some code they will perfectly understand what it does and most likely even why it does it. It also has strong knowledge of all business areas, so anything that isn’t pure company jargon should be picked up as well.

In this article, we will explore the less obvious techniques that will give you superhuman abilities to jump into any project that you have never seen before.

tldr/spoilers:

pfff src/**/*.py | llm -m 'gemini-2.5-pro' -s 'Please write a complete documentation of this project. I want a high-level overview of the main user flows. For each flow, generate proper Mermaid diagrams explaining the communication between all the different parties. Then go into the detail of each flow and explain the specific business decisions taken, edge cases, special rules, etc. For each step of the flow, tell me roughly where to look in the code in case I want to change something.'

Workflow setup

We’ll use two main tools for this:

llm from Simon Willison 1 gives a great CLI interface to various LLMs of the market
pfff from this author, which is simply a way to generate a context with source code in and pipe it into llm

Given that they are both Python tool, I can only recommend to use uvx alongside with your shell’s aliasing system.

For fish users, that would be:

alias -s llm='uvx --with llm-gemini --with llm-claude-3 llm'
alias -s pfff='uvx pfff'

For bash, add to your ~/.profile (or the relevant file to your configuration):

alias llm='uvx --with llm-gemini --with llm-claude-3 llm'
alias pfff='uvx pfff'

Note how we add llm-gemini and llm-claude-3 as dependencies to llm. This is because there are many plugins for many providers.

If you are going to get only one plugin, you should get llm-gemini. This is by far the most useful LLM for the task at hand for a very simple reason. While most top-of-line LLMs have the same exact capabilities, Gemini shines with a 1 million token context window. This is big enough to fit a lot of entire codebases, and this will come in handy.

Once the plugin is installed, reach to Google’s AI Studio to grab an API key then run:

llm keys set gemini

Once this is done, you should be able to run something through Gemini, for example:

❯ echo "What is the answer to Life, the Universe and Everything? Give me the answer in JSON and only JSON." | llm -m gemini-2.5-pro
```json
{
  "question": "What is the answer to the ultimate question of Life, the Universe and Everything?",
  "answer": 42
}
```

From this point on, you’re ready to go!

As a bonus however, you can have a look at the following tools:

Mermaid, a lib/tool for diagrams embeddable in Markdown (and supported by GitHub).
Typora, a nice desktop Markdown editor, which also happens to support Mermaid. Use any editor you want of course, but make sure to have one at hand for the rest of the article.

Project documentation

The LLM being a translation system, we often use it to translate a specification into code, with more or less effect. But on the other hand the code is the ultimate specification, which is fairly easy to translate back into English.

Good documentation

In order to get something useful, you first need to understand what it is that you seek.

A good documentation takes you through a story. Not of a princess sleeping in the highest room of the highest tower, but of the various user and data flows that compose the application. A transverse view if you prefer.

Apart from obscure Doxygen-generated documentations, all popular open source code essentially gives you a set of things:

A “Getting Started” guide, whose job is to get you doing something useful within 3 minutes, beyond which point you would lose patience and try another tool
A set of “Tutorials” or “Guides”, which will cover specific use-cases
And the “Reference” that goes into the nitty gritty details of how individual functions or pieces of function work

The “Getting Started” does not usually make sense in a corporate project, given that it has one instance and that’s it. It’s already running, you can observe it, not a problem.

The “Reference” well, you will see later but essentially that is not the biggest help at the moment.

Which leaves you with the topic-centric “Guides”. This is what you’re going to look at generating. What you want to know is, for each “story”:

Who speaks to whom in which order. This is what sequence diagrams are for, and they are entirely supported by Mermaid
Why this exists in a first place
Where to find it in the code
What are important implementation details that you should be aware of

All you need to do to get this, is to ask :)

From scratch

Let’s imagine that at this point, you have no useful documentation. Provided that your project is reasonably small (less than about 100k lines of code) and managed in Git, the first thing you need is to figure a way to list all useful code files.

A very obvious approach might be something like:

ls src/**/*.py

You might otherwise want to look at all the non-binary files in Git:

git ls-tree -r --name-only HEAD | xargs -I{} sh -c 'git show HEAD:"{}" | grep -Iq . && echo "{}"'

You are doing an important job of curating the context for the LLM: picking the right files, as exhaustive as possible, without also throwing huge useless content, confusing or contradictory information, etc.

In the next part we’ll be piping this into the LLM and you’ll start seeing results. If you are happy with the results it’s great, otherwise you might want to come back and revise the list of files to make it more relevant and/or try to make it fit into the context window if you exceeded it.

But don’t overthink it. Do something quick and dirty first. If you like it, go to the next step. And come back only if it fails.

That’s where pfff comes into action. It’s a very small tool whose sole purpose is to print the content of all the files you provided, alongside with their name so that the LLM can get a sense of the project’s structure.

Try it out:

pfff src/**/*.py

You should end up with your terminal full of your source code. That’s what you will be sending to the LLM.

Now let’s send it with the question:

pfff src/**/*.py | llm -m 'gemini-2.5-pro' -s 'Please write a complete documentation of this project. I want a high-level overview of the main user flows. For each flow, generate proper Mermaid diagrams explaining the communication between all the different parties. Then go into the detail of each flow and explain the specific business decisions taken, edge cases, special rules, etc. For each step of the flow, tell me roughly where to look in the code in case I want to change something.'

Adjust the prompt if needed, this one should give you a good first draft. You will receive in the output a long Markdown file containing the long-lost documentation of your project!

Auto-generated documentation for the fuse-backend-rs open source project, which does exactly what I want but whose documentation is still a bit lacking

Copy/paste it into your favorite Markdown editor, you should be able to see all the lovely Mermaid flows and explanation of what is happening under the hood.

Keeping the documentation up-to-date

What is fantastic with this process is that you can also use it to keep the documentation up-to-date. Nothing easier:

pfff src/**/*.rs README.md Cargo.toml | llm -m gemini-2.5-pro -s 'Give me an updated version of the README which reflects the state of the code'

You can integrate other ideas into your prompts, depending on the expected results:

“Fix all the docstrings that no longer match what the code actually does, or those who are incomplete. Do not rewrite text uselessly, only change things that need changing. Only give me the changed bits.”
“Compare the documentation with the current state of the code. Add sections that do not exist yet and adjust existing sentences that are inconsistent with the reality of the code. Avoid minor adjustments. Only give me the changed bits.”

Figuring the why

More often than not you will encounter functions who remain mysterious to you. The previous concepts can be laser-focused onto a specific part of the code. For example, you’re trying to understand how something specific works in the Linux kernel, which is millions of code thick. From the kernel’s root folder:

pfff fs/{ext4,fuse}/**/*.c | llm -m gemini-2.5-pro -s "The FUSE system has a lookup count system. From the implementer's point of view, what should I know? And which opcodes affect it?"

And there you go. A straight answer from one of the most massive pieces of code that you will ever see.

Debugging

Another way LLMs are surprisingly efficient is in the unfolding of bugs. That one is not fool-proof, but you can still get very interesting results that will certainly help you to get going.

Let’s say that you have a weird bug. You open up your network inspector and grab the query that seems to be the issue:

Browsers usually have a “Copy as cURL” option to let you reproduce the issue in your shell later on

Then run it in your terminal, copy/paste the command + the output at the same time, and feed that again into the llm (I’ll use pbpaste here to paste the text easily, yes you can get it on Linux as well):

begin pbpaste; pfff **/*/*.py; end | llm -m gemini-2.5-pro -s "I'm getting an error in this query, how come?"

You can usually ask more information than this, for example:

“Give me the steps to reproduce the bug”
“What would be a successful outcome?”

The actual diagnostic of the bug is often wrong or at least misleading. However the explanation of what happens and/or how to reproduce the bug is very helpful.

Here the example is using cURL, but you can of course get your information from somewhere else. For example a suspicious stack trace from Sentry, a more or less accurate description from the customer support, etc.

DO NOT SHARE this post if you want to be the only one looking smart at work

Conclusion

LLMs are very useful tools when used in the right way. Here you are shown how to leverage Gemini’s 1 million token context window in order to be able to dive quickly and efficiently into legacy projects. You were even shown how to curate your context to get interesting output out of behemoths such as the Linux kernel.

This comes a bit against current practices such as coding agents (see Cursor, Windsurf, Junie, etc). What makes them useful is their ability to interact with the real world without a human in the loop. But when it comes to efficiency, if a LLM can one-shot a given task—such as the kind of tasks showcased here—then you’re much better off piping everything at once rather than waiting 10 minutes for the agent to do its job.

Co-creator of Django and extremely active user of LLMs with a ton of interesting takes on his blog

100 Ways LLMs can Boost Your Business

Rémy — Sat, 13 Jul 2024 06:01:22 GMT

LLMs certainly are a breakthrough in terms of natural language processing. However the real spark that turned to world mad is ChatGPT. Before it, you could still use GPT-3, but few people outside of specialists did. It’s when the chat form factor appeared that the general public started to realize the power of LLMs.

Unfortunately, chat — or at least passing as intelligent humans — is not the main strength of this technology, which is rather a sort of elaborate parser/translator. As such, there is a million ways you could integrate a LLM into your business at different levels, optimizing 10% of someone’s job here and there.

To prove this point, today we’ll explore 100 use cases that stand besides the stereotypical uses of LLMs to imagine what you could truly do in a wide range of industries, provided a bit of brain juice and a few lines of code.

Development and Project Management

Automated compliance checks in code or documents

Any company beyond a few dozen employees ends up forced to draft policies, processes and rules that must be followed. Some of those require a big picture thinking, but some of them are precise checkpoints that can be easily checked in text-based outputs: source code, contracts, commercial propositions, etc. A series of robots could entirely make sure that the bulk of policies is indeed applied throughout the company.

Programming language conversion

As said in introduction, LLMs are great at translating. But while this works amazingly for human-to-human languages, it also works quite well for programming languages. Typically, you can take any API vendor documentation in any language, get the example snippets and convert them into your current language. This also works within a given programming language to replace a specific library by another one that has equivalent features but different structure.

Detect bug reports from user reviews

It becomes easy to apply Linus’s law: “given enough eyeballs, all bugs are shallow”. If your product is meeting a certain level of success, people will inexorably start complaining about their frustractions online: through social media, app store reviews and so forth. Using a LLM, you can parse the whole lot of those reviews to detect if any of them actually describe a potential bug that you should care about.

Validate business strategies against doctrine

It is no secret that I am a fan of Wardley Maps. The only issue being: the source material is very long and complex. A potential use for LLMs (especially long context ones) is to be able to assist you in the map creation but most of all to check that your predictions and projections are actually taking into account all the rules from the 800 pages of the book.

R&D progress audit

It is always tedious to document R&D as the nature of it asks to iterate rapidly between various experiments. However if you were to centralize all your results in a semi-formal way, you can imagine have a LLM take over this reporting process and generate exact day-by-day reports of who did what, what are the conclusions and what are the next things being tested. Extremely convenient in the case of grant justification as well.

Task break-down and planning

Why are developers always so late? Sometimes, it’s simply unforseeable problems popping up, but most of the time — and especially for juniors — it’s because they fail to decompose the tasks that they have not already done. If you never did something, your brain will probably ignore all the sub-tasks that you will have to accomplish. A LLM could be a good help to break down a given task until all the steps and dependencies are clear.

Natural language programming

Instead of having to code a specific behavior from a software component (email filter, automation platform, data ingestion platform, etc), you could simply specify what you want in plain human language and have it transformed into code under the hood.

Drive processes (CRM, issue tracking, etc)

Having a system read all your emails, messages and so forth will definitely be a privacy challenge, but on the other hand this could enable automatically reporting status updates and changes to CRMs, issue trackers and so forth. For example you could analyze the Git history to move an issue’s status (alongside with comments explaining what happened). Or track commercial emails to automatically report on a lead’s status.

Run end-to-end tests of applications written in natural language

Isn’t that so fucking annoying to write front-end tests? This could change with appropriate use of LLMs. They could not only write tests for you but — and most importantly — they could also heal existing tests to adapt for code changes.

Visually test applications

LLMs can have vision capabilities. As such, they are able to do something more smart than a pixel-perfect validation. They could check two images and tell you if there are significant differences. Look at a web page and tell you about obvious issues (text overflowing, alignment problems, etc).

Log analysis to detect abnormal behaviors

Server logs are usually very long files that you keep to be able to diagnostic a particular issue if it happens, but when it’s about knowing what happens in real time then it becomes more complicated. Log monitoring tools exist but they are limited by the fact that logs are extremely diverse and unexpected. Instead, LLMs could be used to read all logs in real time and raise alerts when needed.

Threat modeling assistance

How do you secure a product? Nothing can be considered secure in the absolute, best practices are only good as long as they fit your needs. That is why you need to model your threat, which basically comes down to finding the weakest link in all the components holding the product’s security and figuring which might break easily enough for the prize to be worthy of the effort. This requires to imagine a full dependency map of everything related to the product, which a LLM could help enumerate.

Open source issue qualification

Open source projects historically always had issue with bug reports and feature requests, that are often done in a terribly unclear way. A robot could on the other hand be able to assist people in doing their report, until the produced description is clear enough for all parties.

E-commerce

Image-based search

Classical e-commerce facetted search requires detailed product description with a structured model of the product’s characteristics. And while for a stick of RAM this might be kind of easy, for some fields like clothing for example it’s already harder to categorize everything. On the other hand you could be asking questions about images, like “I want a pair of blue jeans with a contrasting seam” and the search engine could smartly filters images on this unexpected characteristic.

Mix-and-match assistant

Imagine that you find your perfect pair of pants but you are looking for a shirt to go with it. A LLM would be able to understand the level of formality, the color and the style of those pants and then to find a matching shirt for it. Let’s note that it’s a different concept from “recommended products” that exist today: here we consider the user’s explicit intention. This works for all kinds of products: cosmetics, food, tools, etc.

Organize products from raw pictures and spec sheet

Imagine that you are building a e-commerce in which you have raw material for each product that are pictures and PDF datasheets. You could have AI take care of creating categories, structured product characteristics and product description completely automatically, only leaving humans for review.

Product composition decoder

Imagine that you are lactose-intolerant and are looking to buy food. Or your skin has some specific alergies to chemicals. It would be interesting to be able to ask those questions to the e-commerce directly, which will decode tricky product compositions for you. Or even better, state in your profile which components you wish to avoid and the system will automatically put a warning tag on all corresponding products, along with a warning before checking out.

Product suggestion

You are redecorating your terrace and you need to figure what to put there. Send a picture of it to your furniture store and have directly matching suggestions displayed to you. It also works for various cases where the user could state a problem: “my computer is too slow”, “I need to water my tomatoes”, etc.

Visual audit of second-hand products

Since LLMs are able to view and to follow instructions, second-hand platforms could specifically ask to visually check known defects on pictures. This could help the user into qualifying their own product, as well as highlight important checkpoints to customers.

Price suggestion for second-hand platforms

In the same vein of being able to analyze products visually, you could as well automatically compare a given product to similar products sold in the past and suggest from there a fair price completely automatically.

Extract and categorize pain points from online reviews

Online reviews are a trove of user feedback for some products sold beyond a certain scale. Using a LLM to systematially parse them can be an interesting way to find out defects, use cases, quantify perception through time, etc.

Faceted search

Most e-commerce websites have what you call faceted search. It’s those filters on the left that allow you to refine a listing by some characteristics whether it’s size, color or anything else. Sometimes the experience is great but sometimes it is also not super smart. A way to improve the experience wouuld be to have a search bar that lets you specify in natural language the filters that you want to apply and then let the AI translate into the right request. No more awkward clicking, scrolling and waiting for page load again and again.

Entertainment

Drive NPCs — basically, Westworld

The Westworld show was pretty good — at least season 1 — at showing us instinctively what AI could accomplish for us and how it could do it. Give structured scenarios to NPCs and let actual player interact with them. LLMs can entirely be used to generate dialogues, figure the steps to stay on the scenario, etc. Potentially very exciting amusement parks in perspective, but also of course video games.

Generate backstories and character sheets for RPGs

If you are a RPG player — like D&D and the sort — you probably know that getting your character off the ground can be a lengthy process. Generate a backstory, specs, etc. It’s hours spent doing administrative procedures instead of playing. Instead you could just prompt the basic concept of the character and have it all generated in an instant.

Assist users learning how to play a game

Board games are always hard to understand the first time you play them. You could however imagine that an LLM-boosted agent could understand those rules and help beginners to play: explain what happened, let them know of potential moves, etc.

Answer questions about movies halfway

Sometimes in the middle of a movie you are just lost at what happened. However platforms already have a lot of data a LLM could exploit, like subtitles for example. Using this, you could imagine asking Netflix to clarify specific plot points and have the system check the transcript of the movie thus far to help you understand.

Image culling and storytelling

Ever came back from holidays with thousands of pictures that you never actually sort through? LLMs would be good at making a consistent story and picking the top X pictures to tell it in your album.

Book, podcast, etc. length or style adjustment

I personally hate reading fiction. For some reason, I’ve been devouring wikis and theories from GoT, LOTR and so forth but never actually managing to finish the books. It’s too long and too indirect. What if book — and podcast, news, etc — platforms let you adjust the length and style of what you are reading? 50-pages version of GoT? 2 minutes, to-the-point version of a 20 minutes podcast? The ability to further explore topics that piqued interest? Lots of people already watch movies in 1.5x, this would only be a logical next step.

Trope analysis and novelty factor

Star Wars episode IV is always a good example of how contextual movies need to be. Watch it in 1977 and it will fucking blow your mind. On the other hand I’ve shown it recently to a friend that was like “oh come on and then it’s going to be his father? how fucking original”. If you want to make a movie or an article entertaining, it must be composed of a good mixture of things that people are used to, spiked with an edge of novelty. Using AI to systematically explore and quantify tropes in existing scripts can help establish the novelty factor of a new project. Let’s note that this also works for politics, journalism, fiction and basically anything targeted at the mass market.

Auto-edition of video interviews

Interviews are a significant pain in the ass to edit. But using a LLM you could transcribe everything said, ask for it to pick the best part that will fill up X minutes and automatically slice and edit the video at the proper timestamps.

Conspiracy theory generator for social media

Whether we like it or not, social media is full of trolls trying to influence people’s choices and votes. A way of doing this is to attack specific pillars of a society (science, government, etc) by throwing an insane amount of conspiracy theories to destroy them. It doesn’t need to be consistent, it just needs to be massive. That is great, given that LLMs are excelling at making text that sounds good but that is utterly shallow. Pick your target, throw a LLM at Twitter and enjoy a massive ideological destruction.

Fan-fiction generation

Fans usually like their media so much that they want to keep exploring this world endlessly. Without making those stories canon, entertainment giants could easily generate literally endless stories by fine-tuning LLMs on the specific do’s and don’ts of a universe and let them generate content for their fans. As a bonus, the most successful stories could serve as a basis for major projects.

Data Analysis

Data visualization

Data visualization is a hard topic in the sense that managing all those graph libraries, SQL queries and the weirdest APIs like Pandas, it’s not very accessible to your average executive Joe. On the other hand LLMs are excellent at this, given a proper human intent. They are going to play a key role at making data more accessible.

Transform natural language signals into structured data

Scrape social media, listen to Slack messages or emails and turn this into structured data that you can quantify and analyze easily through graphs and statistics.

Loosely structured data cleanup

How many times data is provided in CSV form with completely inconsistent content? Poorly escaped lines, inconsistent IDs, etc. A usually tedious cleanup job could be entirely automated away with a properly trained LLM.

Reverse-engineer structures

Have you ever tried to understand what a company does from the outside? It’s usually very hard, given that the corporate website will tell you that they “deliver excellence” across a wide range of industries, present their “solutions” and “case studies” but will never go into the detail of what they actually did. The best way to understand the truth in my opinion is to look at job descriptions, both their quantity and their content. Gather them all together and you understand exactly which operational tasks, tools and hierarchy those companies have. Tedious by hand, but very suitable for LLMs to complete.

Natural Language Processing

Translate

All right this one is obvious in a conversational setup, but it of course also works if you are trying to internationalize a service. In an e-commerce or social media for example, the level of translations from a top LLM is good enough that you can trust it unsupervised in many languages for many non-critical use-cases.

Generate alt tags

Something that all CMSes should start doing: automatically generate alt tags for their image library. LLMs are now entirely capable of describing an image, and it’s so good for SEO and accessibility that this should become the norm very quickly.

Spellcheck

LLMs are also very good at spell-checking and can be used in a wide range of applications to help you improve your writing.

Find acronyms

The hardest thing when starting a project is to find a good name for it. Well not anymore, as now you can simply describe what your project does, ask Claude for a fitting acronym and there you go!

Parse free-form numbers

It’s not uncommon to end up with a data table where you need to parse prices or different kind of amounts but unfortunately they have been given in various forms, like “30 millions” or “45k”. While you can solve this with regular expressions, a cheap LLM can often be very efficient at parsing this.

Anything to Markdown

Given the ability of LLMs understand documents structure — textual or from images — they are excelling at producing markdown from anything. Just rasterize your PDF, throw it into a LLM and you’ll get your markdown version pretty easily.

Parse citations from academic papers

My understanding is that academic parpers follow a formal structure but in a semi-formal way technically speaking. Typically they are all linked to each other through citations, but their parsing is tedious. LLMs could empower this.

Smart replace in document

Imagine you write a long proposal for a client and refer repeatedly the name of their product or some important concept. But then your boss swoops in and asks you to remove or change all those references by another one. Sometimes search and replace can do the job, but sometimes it will affect the grammar or the structure of sentences. LLMs could do this job completely automatically.

Auto-adaptation of texts for different targets

Imagine writing a scientific revue. Maybe you want to address different levels of readers from the most advanced to kids. Or imagine a publisher that wants to make Shakespeare accessible to foreigners. LLMs are able to translate between languages but also between styles.

Re-phrasing of customer input

Customer support is a fantastic world where you get insulted for things you didn’t do. Instead, LLMs could act as a buffer between the customer and the support where aggressive, sarcastic sentences are turned into plain and clear ideas.

Content Generation and Management

Generate FAQ from website content

Gather all the content of your website, figure all the questions that it answers and generate the FAQ pages from this.

Generate decent usernames

It is quite hard to generate a decent username when subscribing to a platform. With a few smart questions and methodology, a pretty cheap LLM that could even run locally would produce many interesting name propositions in real time.

Create recipe/tutorial variations

When cooking, doing some work in your home, taking care of your garden or anything hobby-level in which you have no particular expertise, you will tend to follow tutorials to learn how to do things — and more importantly to achieve particular goals. The only issue with those tutorials is that they might have details incompatible with your particular situation. For example you want to cook a cake but you are alergic to one particular component. How do you replace it? That’s where the LLM can make educated guesses and alter the content dynamically to fit the user’s need.

Smart filling of templatized documents

Newsletter software allows you to place people’s name and a few other details within the text. But what if you could go much further than that? Create templates for documents like contracts, commercial outreach, etc. Then have a LLM fill up the blanks respecting grammar, gender or even making up whole sentences based on meta information: “Hi John, you expressed on our contact form that you need XXX, which can be filled by products in your YYY range. Let’s schedule a call?”.

Generate onboarding procedure and training path

Anyone running a company knows that transmitting the company’s knowledge is a tedious endavor. Pages and pages of process have been written over the many years of existence of the company, all at different levels of maturity. How do you introduce a newcomer to all this in a consistent order? You can feed all your documents to a long context LLM (like Gemini’s 2M tokens) and have it sort out documents in topological order and that are interesting in respect to a given job description.

Organize asset production based on company policy

Many companies have a process for rolling out a product or communication: social media assets, press releases, etc. While generating them directly will still be a human work, a LLM could allow high-level definition of guidelines in plain text, with more nuances possible than regular automation platforms, and automatically create outlines for the assets that need to be created.

SEO and keyword-centric upgrade of articles

CMS and other content management tools could receive specific directives regarding SEO and keywords that need to be present to perform not only live audits but also suggestions of modifications to the content in order to integrate the desired keywords.

Meeting prep, create meeting agenda

What worse than a poorly prepared meeting? Gathering information from previous meetings, ticket trackers and other digital platforms, an AI could outline the agenda of upcoming meetings, while at the same time helping each participant to gather their own content to present.

Auto-update documentation

All products and company processes need to be documented at various levels from the most technical to the most high-end. As the product grows and changes are made, it becomes hard to keep track of what needs to be updated in the documentation. Combining LLMs and embeddings, you could track the overall company activity and highlight parts of the documentation that become obsolete, list the missing parts and even automatically propose edits.

Dynamic course re-writing

Imagine a student learning online in front of their computer. Some topics will be easy but assuredly some others will prove more challenging. These courses are often evaluating the student’s skills all along the way. What if depending on those evaluations the content of the courses was adapted to the strengths and weaknesses of the student? Catch-up texts can be generated from the original course but focusing on the weaknesses and ellaborating on them further than in the initial content.

Infinite copy generator

How do you know which words are going to transform your audience best? What if you generated one version of your content for every single time that someone reads it? Then observe which versions worked the best and use this as reinforcement for your model, to produce more and more efficient versions of the copy.

Dynamic content

In the same vein, you can also observe the user’s behavior and browsing history in order to dynamically re-write or optimize pages when he lands there. Connect the dots with concepts freshly ingested, push forward detected interests, etc.

Image and Visual Processing

Transcribe handwritten notes

This might sound like a miracle but GPT-4o is able to read my handwriting. Not only this but it can transform it into a well-structured Markdown document. And then of course translate, summarize and all the perks. This can be helpful in number of scenarios from digitizing meeting notes to processing and translating on-the-fly antique manuscripts.

Pet control

Pets tend to behave differently when their owners aren’t home, like jumping on the bed or sofa. LLMs are definitely not the most efficient but they are for sure the easiest way to express to a machine “if a dog rolls in my sheets yell at them to stop”.

Generate color palettes

Just like LLMs are trained on word patterns, they are trained on visual patterns and including the understanding of colors. This means that you can generate smartly color palettes that actually work (as opposed to this color wheel madness you often see). This can help you generate your own UIs but even more than that what if the LLM were able to generate all the design tokens up to the user’s taste, ending up with a unique, custom and beautiful UI for every single user?

Art explanation

If like me you are art-illiterate but still end up in museum wondering what happened in a specific painting, only to find the name of the painter with a vague title next to it, with a lengthy audio guide telling you everything except what you want to know… you’ll understand this idea. Instead of audiobooks, musuems could provide interactive assistants fed with in-depth knoweldge on every work of the musuem but able to distil it in a way tailored to the visitor’s taste and to reply to their questions directly.

Picture-based food search

Google Maps is trying very hard to create ontologies of the real world, especially with its “questions” program asking you if a given service or food is available in various places that you have visited. However if you are not american you probably ended up confused when you got asked if your local high-end bakery was making smores. Food simply does not translate between cultures. That’s where a deep understanding of images could lead to a much more efficient search that would echo one’s way of expressing their wants.

Better narration for GPS

Did you ever take the Madrid highway with some US-optimized GPS voice? How long did it take you to take the wrong turn? With ample imagery available — street view, 3D maps, etc — you could absolutely have much more descriptive directions from the GPS. Referring landmarks, taking into account perspective, etc.

Drone or CCTV-based visual inspection of equipment or land

As you can describe what you want to see, you can have drones or cameras film something you want to inspect and ask the LLM to tell you if it matches your expectations or not. Look at satellite imagery and ask “tell me places where forrests have been depleted”. Look at a building and say “tell me if any tile is missing”. And so forth.

Auto-design simple, templated flyers, posters, etc

Some apps will help people organize events or do marketing. Especially for small businesses it’s going to be hard to create those assets on their own, as they will not have the means to work with bigger agencies and are most likely unaware of best practices. On the other hand the app could leverage LLMs to apply best practices, pick colors, use and customize proven layouts to generate all kinds of visuals.

Check that translations are meaningful in context by visually analyzing apps

A common example of translation error that infuriates me is around the word “check” in English that can be understood as two distinct French words: either as in “verify” or as “check this box”. And very often, the meaning is lost, leading to crazy translations like “Verify the terms and conditions to continue”. Since LLMs can read texts and context, they could be used to apply translation files on an UI and make sure that all buttons make sense.

Document Processing

Parse invoices

Invoice management is the bane of any small business. You receive hundreds of them, need to extract different items and taxes systematically, but on the other hand every single invoice has a different format. Fortunately LLMs are pretty good at extracting this information and putting it into a JSON — whether it comes from an email, a PDF, a picture of a ticket, etc.

Pick food at restaurant

Did you ever end up undecisive at a restaurant? Just snap the menu, feed it into a LLM and let it guide you into ordering something. It even works with hand-written texts that you can’t understand — Japan explorers will rejoice. If you are a restaurant you can even push this further and help users through a custom assistant.

Normalize recipes

As a nutrition app or related, you might want to make the link between the food listed in a recipe and the calories for example. But people writing recipes love to use the weirdest units or even have things implied — like some common ingredients not even being listed as ingredients. With the help of LLMs you can extract these ingredient lists, transform them into units that make sense and get the nutritional value of what you are cooking.

Convert mind-maps to structured linear document

Mind maps — or the Post-It method as well — produces a lot of ideas around one given topic but you might end up overwhelmed at the end of the process by the amount of information that needs to be processed. LLMs can transform those ideas into a linear structure, properly sorted and organized.

Paper forms digitalization

As an intern I have been copying lots of paper forms made on-the-spot for fidelity cards in a store. Or recently, we all have been filling up countless COVID forms whenever taking the plane. Using LLMs, you can understand and transcribe those forms completely automatically into a digital system.

Transcribe, tag and reference historical paper-only archives

Countless historical documents or books have been scanned but how many are properly referenced? You can guess that over thousands of years of history, we could set to map out all those documents, link them together, analyze references and ideas over time, to build a better understanding of our history and our currents of thoughts.

Customer Support and User Experience

Start workflows from emails

Customer supports will often be drowning in emails. You can parse them, detect intents and trigger the proper systems in your back-office to start procedures, without any human intervention.

Request routing inside the company

When the organigram starts becoming big, it might be hard to navigate the responsibilities and knowledge of the people in there. Especially for newcomers, it cn be a challenge to find the right person to talk to while also not bothering them unnecessarily. As CTO and founder I can answer most questions on most topics within my company, but should an intern come and ask me how to connect the printer? A reassuring AI could help people orient themselves in the hierarchy to know who they can confidently reach out for in order to receive help.

Prioritize incoming messages and notifications

You are probably like everyone else drowning in countless useless sollicitations, from services to which you subscribed 20 years ago to urgent business emails. Depending on the time of the day and your personal goals, you might want to be notified of one thing but not the other. Or you might want to receive notifications in bulk for some topics. For example I’d love to see Slack create a “what do you want to be notified about?” option and then burry irrelevant mentions and messages.

Configure complex features

When you use some apps, there will be features that are extremely complex to grasp. For example, try any product in Binance, it’s complex enough to throw you off if you are not eager enough to learn about it. Through the use of AI they could instead ease the user into setting the right parameters according to their own personal goals.

Voice message summarization

Some people love voice messages, some people like me are loathing them. Having an AI skip through the “sorry I’m sending you a voice message because I’m in the street and it’s easier to send a voice message […]” and instead deliver to you just the point of the message would be a great WhatsApp addition.

Conversation coach

We are constantly exposed to conflictual situations, especially in low-stake but annoying uses cases like negociating a refund overe an incorrect package received. Email and messaging apps could help you understand what you could obtain in that situation and redact emails for you, helping you every step of the way and reducing your mental load.

Automated test grading

The point of MCQs is that they are easy to grade, including if done by a computer. That’s why e-learning platforms use them so much. But given the advances of LLMs, it would be easy to imagine having them grade even textual responses, looking to see specific bits of information and telling you if it’s correctly explained or not.

Interpretation of complex diagnostics

Some diagnostics are not nice to hear, especially when they are particularly complex. From medical reports to SEO audits, if you are not an expert you might be confused by the terms and implications of those documents. A properly trained LLM could instead simplify them for you and even answer potential questions you might have.

Allow customers to do self-diagnostic on products

Vice-versa, some products are complex and have many failure modes. Companies internally have debugging procedure that can pin-point exactly what is faulty, but it’s hard for the regular customer to follow such procedures. Instead of paying a human being to tell you to turn it on and off again, such workflows could be assisted entirely by a LLM, specially driven by a tailor-made logic engine for your product.

IT support for end-users

The most feared and annoying department of a company is often the IT, that has the important task of securing the company’s data and intellectual property while also having to explain to users how to connect the Wifi. By the proper use of LLMs, with their general knowledge of how computer works, but with a specific training adapted to the policy, they could skim a lot of useless requests off the pile from IT departments.

Suggest A/B testing variants

A/B testing is great to test how the user is going to react to different UX or copies, but how might you do it? You need the ideas, after all. A well-trained model that knows the UX best practices for different industries could do this job of taking a human’s work and proposing potential optimizations.

Algorithm transparency

Many complex algorithms are ruling our lives. For example every electronic payment goes through a set of rules to determine if the action is legitimate or not. These departments are utterly closed and opaque to the rest of the company. Typically, if your card gets blocked then nobody in the bank can tell you why nor for how long it will stop working. Having a LLM being aware of the different rules of the algorithm, it could explain in simple words the reasons of this block to the bank advisor, and the available perspectives. This works for banking, but any sector with complex algorithm could leverage this.

Personal Assistance

Help user stay on plan

So many apps are helping us become a better version of ourselves, wether it’s for diet, exercise, jet lag, pet training, etc. But how often can you follow 100% of the plan? With a bit of intelligence you could let users report their deviations and help them stay on track without overreacting with counter-productive actions or simply getting demotivated.

Context-picking for events, emails, etc

Imagine an event in your calendar with few details. When the event comes up, the system could read your emails, meeting notes talking about this event and then infer useful information ranging from the latest tickets from the issue tracker, the weather if you need to go somewhere, remind you to take your IDs or advising you to dress in a certain way. Overall, for one item and a lot of context, the LLM could pick the top few elements that are relevant for you not to forget — wheter it’s an event, an email you are writing, a plane ticket, etc.

Long-term goal tracking

The human brain is very wired to small tasks and has a hard time taking a step back to see if you are achieving your long term goals. On the other hand an AI could be aware of your goals and rank each of your actions telling you if they seem to be helpful or not to achieve that specific goal.

Natural language passwords

Passwords are a notoriously hard problem. How do you make a password that is secure yet that you can remember? You could imagine generating complete phrases through LLMs but not only. When the user types back the phrase, you could use LLMs to normalize the text before hashing it so that spelling mistakes or punctuation or even word arrangements do not affect the outcome of the hashing.

Business and Legal

Categorization of items

The other day I wrote an article about GDPR, which parsed a big HTML page from the CNIL. The information there was semi-structured and I had to categorize things further in order to make sense of them. Same for the current article, the different ideas have been categorized by a LLM. This is a great tool to group similar concepts together.

Critique political programs

While LLMs are obviously biased — especially with the US culture wars — but are nonetheless able to project without ego into many persona. As such they are quite interesting to tools to use in order to review political programs and see how they are backed by facts and theory. Journalistic platforms could enhance their content with thorough review of every single politician, detect their parting from party ideas, and most important let people explore concepts on their own, for their personal situation or their vision of society.

Find out adequation between candidate and job description

Large companies will match candidates only based on keywords based on a first pass. However we know how biased this approach is, given specific technical knowledge is not shared by recruiters — and even less by those in charge of unpiling thousands of CVs, which must not be the highest ranking ones. On the other hand AI is great at matching a CV to a job description, describing quite well the fitting and lacking areas as well as the challenges to work with this person. Even further, based upon a transcript of the interview, you can ask questions and validate specific checkpoints by searching intelligently for the relevant parts, without having to listen to hours of recording.

Check ToC exhaustivity

When you reply to a RFP, a grant or any exercise of that style, there will be a list of requirements you need to meet and precise points that need to be addressed. Obviously it is more subtle than just filling up a form, you need to make sure that various aspects are answer consistently throughout the response. You can use a LLM to both extract from the RFP the list of points that need answering — or at least cross-check it — and also to check if your response provides adequate light on each of those elements.

Insurance claim/policy matching

Insurance policies are always a bit obscure. A well-trained LLM could allow customers to role-play use cases before subscribing, as well as gather all the necessary information in case of a real claim.

Social Media and Content Moderation

Automated content moderation

As OpenAI has proven to us with its extremely restrictive usage policy, LLMs can be used to detect offensive content — or any kind of content that you don’t want to see, offensive isn’t the same for everyone. In a day and age where social media operates at a great scale, being able to detect “forbidden” content would not only make the platforms safer but also more customizable. Indeed, what if instead of having one single policy, different communities had their own policy automatially applied? Free speech and safety for all!

Social media filters

In the same idea, what if instead having algorithms sorting your feed in the most opaque way, you could express what you want to see? The same as TweetDeck allows you to do by keywords for example, but with concepts instead. “Tell me all about space news” or “I’m sick and tired of meme X”. On top if filtering this could also mean groupping: different posts talking about the same topics could be groupped or even hidden after a threshold.

Specialized Applications

Ask questions about meeting transcript

Summarizing a meeting is good, but being able to look for specific information in it is the killer feature. “What did we conclude on the topic X?”. This is what I really want to see in those AI meeting platforms, especially to be used during ulterior meetings.

Step assistance in tutorials/recipes

When following a tutorial or a recipe, some steps that might seem obvious to the person writing it will probably be hard for you to follow if you are too new to the topic. Having a LLM to be able to write sub-steps to fill the gaps for you will be a great help.

Democratic platforms for citizen engagement

Politicians love to claim that they know what their people want, but how do they really know? With AI’s capability to categorize and summarize, you could turn griefs and ideas into structured input coming from the whole nation. A super-simplified procedure where you could complain about anything or ask for any change as it goes through your mind. Then processed and presented in any level of detail to your representative.

Real-estate property auto-description

So many platforms allow to post real-estate ads, but the quality of those is often mediocre to non-existent. What if using the proper context based on the pictures, the map information, the neighborhood meta-data and so much more, you could generate a proper text description highlighting the strengths and weaknesses of a given property?

3 reasons why Webhooks suck and 2 Masterclasses to replace them

Rémy — Sun, 28 Apr 2024 07:01:07 GMT

The most common way for independent services to exchange messages — even more so on public APIs — are webhooks. A beauty of simplicity: you simply provide an URL that you want to notify when an event occurs and the other service simply has to make an HTTP call. Except, not really.

Webhooks suck

In the shadow of this superficial simplicity are creeping major problems which make it hard for both ends to exploit webhooks.

Not missing a drop

First of all, the most basic prerequisite for a webhook to work is that the receiving end is able to receive. Meaning that the webservice must be up and running. But what happens if a maintenance is ongoing, a technical issue plagued the server or simply the network connection has an instant of failure?

The message will simply and purely be lost. It’s a Byzantine fault: how can you know if a message was sent if the sender is unable to contact you either way?

In order to remedy this, most providers resort to implementing retry mechanisms. Which is fairly complex to implement: you need to store somewhere that you’ll have in the future to execute a given set of messages and wake up accordingly. Most queuing systems will struggle at doing this reliably because they are working in “at least once” mode, meaning the same message could be sent twice. You can decide you don’t care but then your client has a problem on their side.

Another issue is that if you are doing a maintenance on your server, maybe you configured something wrong and it ends up responding 200 when actually the wrong service was receiving the message. In that case the message simply gets obliterated, given that the sender thinks it’s received and the receiver has no idea the message even exists.

Avoiding flashbacks

This retry logic will also amplify another danger. Messages could very well arrive out of order, and this for different reasons.

For example if you are implementing a retry mechanism but consider all messages as independent. In that case if the receiver gets unavailable for a while, they are at risk of receiving the missing messages after they started catching back with earlier messages. You can let the receiver with the burden of fixing this, but it honestly will get ignored most of the time.

What can happen as well is if your receiver operates at a larger scale and has at least 2 web servers, if two of your messages arrive at the same time and get processed by 2 different processes at the same time, there is no saying which message will be dealt with first.

Harder to develop

Now this is a more practical than theoretical consideration, but most of the time developers won’t have the luxury of a public IP address on their development machine. Which is a big problem since webhooks are actually going to have to initiate the network connection, meaning that you will probably end up resorting to tools like HTTP tunnels.

On top of that it means that your code needs to be aware of its own public URL, which you cannot really do automatically. For example if you use a regular API, you never need to declare what is your public address. But for webhooks you need to know what it is and to declare it. Often through complex back-offices or needing propagation times.

As a result you end up with an extra configuration variable which you could probably avoid otherwise, you probably also need to go through some manual configuration and on top of that free plans of popular HTTP tunnels will change your URL every time so you possibly end up changing it all the time.

There are alternatives

How do we deal with this situation better than with webhooks? First you need to realize that you are actually trying to solve two separate problems:

Knowing that there is at least one update pending — when an event occurs then your code needs to wake up and do its job, preferably as fast as possible after said event.
Synchronizing state — the final goal of this is to have different systems converge into the same state, whether it’s knowing if the user expects the light on or to get the full status of a shared online document.

Waking up remote code

The most naive thing you can come up with is polling. Every X seconds you’ll check if there are updates available. This is however considered as wildly inefficient:

The cost of establishing a connection is pretty high relatively to other options.1
You won’t get the updates “in real time” but rather only every time you poll

That’s why in most cases, polling will not be recommended and both software and hardware architecture were designed to avoid polling. If you were to simplify it to the extreme, modern computers are driven by inputs. A physical electrical signal on your network card will trigger a processing chain that will eventually wake up the relevant process, all the way down to your favorite abstraction from Python, JS or any other language.

This is what makes webhooks attractive: a remote computer can wake up your local process. But it’s not the only way to do it. If you open a network connection from your local machine to the remote API — which is extremely easy to do even without a public IP address — then as long as the connection is up then packets will be flowing both ways.

WebSockets were invented exactly for this. It’s an easy way to have a client, typically behind a NAT or a proxy, to connect to a server and receive real-time updates. That would be my go-to option for waking up remote code.

Alternatively, before WebSockets we used a technique called “long polling”. The idea is to make a regular HTTP query but that hangs for a very long time (typically minutes) until an update happens and the HTTP query returns with the message. A bit messy but almost as efficient as WebSockets if you don’t have a very high throughput and not more costly than webhooks.

When implementing this kind of technique, you need to consider that you will be maintaining one full TCP connection with every single client. That used to be a challenge, it is becoming quite easy these days if you can use an async infrastructure.

Alternatively you can turn towards dedicated services like Google’s Pub/Sub, AWS EventBridge or countless others. For example, Shopify offers webhooks but recommends notifications through AWS and Google. Kind of the same as dealing with the WebSocket yourself but you let someone else manage the scale for you.

Staying on the same page

Distributed systems are notoriously hard and I am not aware of an universal law that allows you to deal with any situation whatsoever, especially as you scale up. However it usually boils down to the same core idea — which can be remixed at will to fit the project’s needs.

Consider that your data model is a bit like a Git repository. At a point in time, the source code has a given state but in order to reach there a series of different edits had to happen. Said otherwise, if you sum up all the edits then you get the state of the code at a point in time.

So the key here will be to identify which edits happen in your model, convert them into a stream of events and re-compose them on the other side. This can be more or less difficult to achieve, for example Google Wave used Operational Transformation which took 2 years to develop but on the other hand if you’re just dealing with a messaging app your life should be much simpler.

Now imagine all those edits as a sequential log. As you read the log, you keep track of your current cursor, pointing to the latest known edit. When you are notified of another event then you need to read starting from this cursor.

This resolves a lot of issues raised earlier:

By using edit logs, your communication protocol is basically writing itself and will look fairly simple. If you’re used to Vuex or Redux, it’s basically the idea behind mutations.
The cursor allows to know where we are in the update stream. If you lost a notification because your program was down or crashed, you can catch back from your latest known state.
Even if the transmission of messages fails, you can easily have a retry mechanism to eventually get up-to-date.
There is no risk from getting the same message twice as messages are basically sequential, numbered items.

From looking at WhatsApp’s WebSocket communications, you can presume for example that they use this kind of strategy and it’s even what enables them to have end-to-end encryption with consistent shared states between participants and devices while having servers completely oblivious to the actual content of conversations.

Masterclasses

Having recently interacted with different APIs, two of them really stand out in my opinion, showing how you can make a public API that avoids pitfalls explained earlier. I picked them up because the choices they made really highlight how you can implement things in correct way while also keeping things simple.

The world of instant messaging is highly competitive, with all major players pushing their platform as hard as they can. Facebook has the two most popular platforms — WhatsApp and Messenger — however the third one is a pure player gaining traction only through their strategy2.

One part of this strategy is to have an amazing bot experience, allowing with very small developer effort to create real-time applications. This is particularly prominent in the cryptocurrency world but for example it’s also a tool heavily used in Ukraine to follow bombing threats.

The basic idea of Telegram is pretty simple. You have different conversations in which you come add messages. Then more complex things can happen like people putting reactions, messages being edited, user clicking buttons, etc. All of them are listed and documented updates.

Now the interesting part. How do you get those updates?

A first method is the webhook. As you know, it sucks. The more interesting method is the long polling getUpdates method. It combines two techniques explained earlier.

Long polling — the HTTP call will hang until either an update or a timeout happens. Not as efficient as WebSockets but very easy to implement because you can do it with literally any HTTP client ever written. And of course it works from a private IP address.
Cursor — the call takes an offset argument, which corresponds to the ID of the last message you received.
- This is a smart way to get you to acknowledge receiving the previous messages and receiving new updates in one single call.
- But on the other hand if you pass an offset of 0 then it will repeat the last offset that used. This means that if you restart your app you don’t need to remember the last offset, which is incredibly convenient.

As a result, developing a client for the Telegram Bot API is a very smooth and simple experience. All you need is a HTTP client and a tiny wrapper around it to get started. You can use a lib of course but implementing a client from scratch is a very easy task both in terms of code (no need for crazy libs) and of infrastructure (almost no constraints).

Plaid

If you never heard about Open Banking, it’s basically all the banks in the world somewhat converging into providing standardized and modern APIs for all their services. At least in theory, in practice of course the capabilities and implementation details vary greatly country-to-country and instead of a truly open standard you need to go through middlemen such as Plaid. This is not my field of expertise so I can’t go into the details but all I can say is that Plaid does a great job at converting ~~dinosaurs~~ banks into REST APIs.

They have a wide number of APIs but the one that I’m interested into is the Transactions API. The most interesting information about bank accounts, especially if you are building a personal finance app, is to see the list of transactions that happened there.

One of three thing can happen, with examples:

A new transaction happened (you bought something)
A transaction got modified (exchange rate got finalized)
Or it can be deleted (transaction was not captured in the end)

In the case of Plaid, they work a lot with batches. I don’t even want to know how they receive those transaction but if you told me they came from a latin-1-encoded CSV file dropped on a FTP every 3h I would not be surprised. As a result it’s much less real-time-ish than Telegram. It’s not extremely relevant to ship every event individually.

Instead they’ll give you a cursor — up to you to keep track of it in that case — and give you aggregated added/modified/removed transactions. Which makes it very easy to update your own database. If you just had the list of latest transactions for example, you’d have to diff the DB to know what to create, update or delete. But here you can blindly do a bulk insert, bulk update and delete. 3 SQL queries maximum and done.

The only issue I have with their system is that… It’s based on webhooks 😓

But that’s not causing much harm. Of course it means you need to setup a HTTP tunnel before developing with their API but on the other hand because they have this sync method you avoid all the other drawbacks pretty easily. You can even poll the API every day if you don’t care about being “as fast as possible”.

Take away

Webhooks suck because they bring a hoard of subtle yet annoying problems. Most queue systems are either “at most once” or “at least once”. Webhooks are “probably once 🤞🏻” and bring with them a terrible developer experience.

But what we really need to do is decouple two problems: the one of waking up remote code and the one of synchronizing state.

Waking up remote code is fairly easy now that async architectures are widespread, you can either rely on an external cloud provider or simply let people open websockets to you.

And then regarding state synchronization, most likely you want a somewhat linear sequence of events to be streamed to your consumer, relying heavily on the concept of cursors to let remote code communicate its current knowledge of the state.

At the end of the day, if you are making a public API, the developer experience is going to matter a lot and involves in the current case two main elements:

How complicated is the code going to be when using your API? The lightest the required wrapping, the least the data post-processing, the better.
How hard will the problems be to solve in terms of infrastructure? States to be kept, network flows, etc. Keep in mind that most apps start from scratch so optimize for small operations rather than world-scale conglomerates.

So if you are making a public API — for the wide web to use or simply for other parts of your company — please think well how you can make the life of your peers easier and safer!

It’s not that high, I still do a lot of polling when I’m short in time and it will make almost no difference on the result.

Not making any judgement or recommendation here. You can be pretty sure that half the secret services in the world read your Telegram messages, but it is a massive platform on which you can build many interesting things.

5 criterion to pick your front-end framework

Rémy — Sun, 14 Apr 2024 07:01:31 GMT

One thing for which yours truly is particularly glad is to have been able to partake to the development of the Web for two thirds of its history and to see all the twists and turns it has taken over the year. Now that the big platforms dominate it, that Chromium has a quasi-monopoly — except for Safari, which is a lesser version of the same thing — and that the hoops to jump are just so numerous, one can only imagine that getting into web development must be a disheartening thing.

Let us however keep an eye on the North. Imagine you were to create a major project right now, which involves a website, what route should you wish to take?

The old school would go towards jQuery
The minimalists would pick htmx
And the mainstream would pick a meta-framework such as Remix, Nuxt or SvelteKit 1

Those three voices will often be vocally contradicting each other on social media, adding confusion into the mix. Of course there are no silver bullets, only bullets that will hit the target more difficultly than others. My bullet is the meta-framework, and that’s not out of kool-aid nor fear.

For the first years of my career it was impossible for me to use meta-frameworks, for they were not invented yet. This was a painful experience. If you end up creating a real-time interactive game in which the whole DOM is dynamic, trust me jQuery is definitely not the tool for the job.

On the other hand if you are not creating something so deeply interactive, chances are that jQuery — a library whose main purpose is to even out the differences between Internet Explorer 6 and Firefox 2 — isn’t going to be of great help. Especially if you are feeling more like a backend person, a tool like htmx will allow basic interactions which require very little front-end work and might just be enough for you.

But if you work for projects of a more unpredictable nature then maybe those solutions are not optimal. It’s with the mind set on a large, enterprise-grade© project that we’ll go in quest of the perfect framework to build your startup.

Performance

A common consideration against meta-frameworks will be the performance. Surely, all those features must come at the cost of very expensive and bloated JavaScript runtimes? Let’s put that to the test. I’ll make a hello world page using major frameworks and measure the transfered payload size (gzipped for the most part):

HTMX — 16 kio
SvelteKit — 25 kio
jQuery — 30 kio
Remix — 90 kio
Nuxt 3 — 131 kio

This tells us that meta-framework do not have to be heavy and clunky. They can, like Remix and Nuxt 3, but SvelteKit is lighter than jQuery in that regard.

What about the execution speed, memory use? This benchmark is fairly popular and while it doesn’t include HTMX nor jQuery, it does have vanilla JS, which should be the closest to what you can achieve. You’ll have to explore the numbers yourself but in a nutshell even React that is often lagging behind stays pretty close to the baseline. The same goes for memory use.

In any case, those benchmarks are toys. The better question is: how fast is your website going to be at scale?

The clear winner is going to be HTMX, because all the rendering logic happens on the server, which isn’t so much bound by script size. On the other hand, all the other options will have to write numerous lines of code. If you write your project using jQuery, you will be ending up either with one big JS file, either with a collection of different files but either way, the organization will be yours. Now if you want to minify all this, you will not escape the need for a set of build tools. When scaling up, it is difficult to escape having a build stage.

And while meta-frameworks also have a build stage, this process is entirely integrated. You don’t have to do anything: your code gets transpiled, minified and processed in all the necessary ways completely automatically. With the added bonus that the build tools are aware of the dependency tree. As a result, every page can automatically be bundled in its own file, so that you never need to load code that you don’t need immediately.

From the performance standpoint, you thus have two choices:

Either you think that HTMX will never be a limit in what you are trying to achieve, in which case it is the easiest option
Or you need to have some wiggle room in terms of feature — 100% of my professional projects — and then a meta framework will be a far superior option for a relatively small price to pay

Let’s declare the winner of this round: SvelteKit!

Maintainability

While a strong argument of the web development community was the separation of concerns between HTML and CSS, you need to understand that at this time people were writing CSS directly in the style tag, making poor use of semantic and limiting greatly reusability. It is however essential to consider HTML, CSS and JS as a whole.

This is why the concept of component is everywhere. It’s the same as the widgets from UI frameworks. The advantage of a component is that, like explained above, it has a clear dependency tree, it contains all the code that is required for its proper execution and if you want to refactor or delete it you do not need to look for its bits and pieces all over your code base, worrying about side effects.

The first and most important question to deal with is the CSS, whose C stands for “Cascading”. It’s another way to say “if you are not careful, one change here will cascade into disfiguring your whole product”. You could deal with the styling of your component by writing the style into the style tag directly, or even generate all the possible styles into classnames and write the style into the class tag directly. But that is exactly what the Elders warned us about.

A more interesting approach is to define the CSS in JS, which allows you two things: first the CSS gets bundled with your JS — and your HTML de facto — and second it gets a scope wich will not overal with the one of other components. It’s just a shame to be doing this manually and to deprive yourself from tools like SCSS, which make writing CSS much easier. That’s what Single File Components (SFCs) allow and they are available in both Vue and Svelte but weirdly not in React.

Once you have scoped and bundled together your CSS, JS and HTML generation then it becomes hard to write spaghetti code. Knowing that on top of that the build system will track all your dependencies, as said earlier, this allows for an extremely atomic compilation and optimization process.

On the other hand, when using jQuery or HTMX, you will be left to your own devices. Not necessarily a bad thing but the organization becomes up to you. As soon as two developers start working on the same code base, we know that t he organization can quickly go through the window.

Henceforth, we got here two winners for scalability and team work, which are SvelteKit and Nuxt 3.

Future-proofness

If you create a startup which depends heavily on a given framework in order to work, you want to make sure that after you have invested five years of developer time you don’t end up having to rewrite everything from scratch. Those tools need to give a good perspective to developers. For example, you can still run on a Windows machine an unmodified DOS program that was written in the 80s. Without being so extreme, if you cannot see at least 10 years forward with a tool then you have a problem.

First of all, let’s look at jQuery. Almost 20 years later the API is still basically exactly the same, version upgrades being mostly about simplifying the code because browsers are converging now. If that’s the route you intend to take the rest assured that jQuery will not go anywhere.

Then HTMX. Honestly it’s hard to say anything at this point. Small project, the maintainer probably can’t promise anything. Chances are it will only add features through time but due to its nature I can’t imagine how they could introduce major breaking changes. The main risk is more that the project dies, but even then it probably wouldn’t be too hard to maintain it yourself it it came to that.

On the React side, I don’t practice it often enough to go in-depth into it but in any case the ecosystem is so vast that you could probably write another article dedicated to chosing the right React stack. In that regard, it seems like breaking changes do occur in every part of the ecosystem but overall it’s never going to be something fundamental that asks you to through your entire codebase in the trash.

Which is fairly different from what you could say about the Vue ecosystem. Vue 3 has been a major breaking release and introduced a whole new set of completely different APIs (the “Composition API”) and while it is not inherently bad it changes completely the way to think about your code. In theory you don’t have to use it but every single library of the ecosystem only supports this anymore, so you don’t actually have a choice.

Including the Vue-3-compatible version of your favorite libraries/tools (if ported), which will force you to rewrite everything that depends on it. The main one being Nuxt 3, which came out with exactly zero overlap in API or conventions. There is nothing that works the same anymore and the thin compatiblity layers that exist usually are fickle, fragile and generally useless. This feels like a serious backstab barely 6 years after version 1 — yes I have products that would cost 6 figures to upgrade and I’m pissed about it.

Now remains to evaluate Svelte’s position. While I can’t find any official statement on where Svelte will be in 10 years, some good indicators are there. Firstly, the documentation is written in a way more practical way than others, showing that they care more about the use cases than the technicality of the framework. Secondly, so far upgrades have been relatively smooth. And finally Svelte 5 is cooking and the ugprade process also seems clear. Now since the ecosystem is much smaller, it’s hard to tell what is really going to be the impact, but let’s keep our eyes open.

Overall, the only tool that has demonstrated a serious commitment towards backward compatibility is jQuery, which will have to be winner of this round!

Cognitive load

You have probably one day heard of GTD and decided to try it for yourself. One of the advice coming out of it is: if it takes less than 5 minutes, do it immediately. So tell me, how many days did you end up spending 100% on 5 minutes tasks? Did it feel satisfying? And did you accomplish anything meaningful?

Of course it’s a rethorical question and you can’t really answer so you’ll have to imagine that you said “it’s satisfying but not meaningful at all”. Which is what happens as well with many tools that we use and that we might judge on the satisfaction to use them while actually we lose our time doing boilerplate.

Imagine that you are writing an interactive component which depending on the user actions and inputs will have to update its own DOM. Doing it in jQuery can be extremely satisfying because you create all the elements yourself, find smart ways to hook events, imagine optimizations to do it faster, etc. Very fun if it’s your jam. But on the other hand if you do the same thing with Svelte, the compiler does all this automatically and better than you ever would. Managing the DOM is just not a task with Svelte.

So while it’s satisfying you just spent your time on something that should not even grab your attention. What about our contenders?

jQuery — As mentioned above, it’s all manual from the manipulation of the DOM to bundling it to the client. Easy to marvel at the beauty of your code, hard to actually focus on what matters.
Remix/React — Many moving parts and optional APIs (hooks, signals, etc). No management of CSS. Fairly complex overall.
Nuxt/Vue — Version 3 of both definitely made things more complex, with two competing APIs (including a fairly verbose one) and lots of build-time magic.
HTMX — Very lightweight front-end but on the other hand you still need to worry about the back-end yourself, so it’s a lot of unknowns
Svelte(Kit) — Once you’ve sorted out the idioms, it is fairly straightforward and requires no magic of any kind

Hence, I’ll give the round to SvelteKit!

Community and talent pool

You could find the best framework of all times, if nobody can provide for you libraries that solve common problems (UI libraries, form validation, toasts, etc), learning resources or direct support then you are going to have a hard time.

On top of this, you need to be able to hire developers. A decent developer can learn any decent framework but they need to want to work with it and usually they’ll expect that it’s going to look good on their CV.

The same goes commercially speaking. Lots of customers are interested in knowing what tools you use and if you can’t justify that it’s durable and bullet-proof they might get cold feet when deciding to work with you.

So here is what to expect:

jQuery isn’t sexy anymore, nobody needs it on their CV, nobody wants to work with it except a few indiepreneurs that claim jQuery pays for their lambo. Customers who look into your tech stack will ask you if jQuery is a headless framework and it’ll be hard to say yes.
HTMX is straight away a no go. It’s too small to put on a CV and not trusted enough to fuel a sales pitch. Using it in a professional setup right now will require a serious offset on other parts of your stack.
React is a no-brainer. Everyone wants it on their CV, it’s backed by a major player and the community is one of the largest for a framework of that kind. Plus, most companies that publish SDKs or component libraries will prioritize React.
Vue is the new React, except less popular, less demanded and less clunky. Still a workable option.
Svelte is on the line. But it receives a lot of developer love (like HTMX you’ll tell me) and is now backed by Vercel which not as big as Facebook but definitely big enough to make this serious. Plus Svelte 5 is promised to resolve all pending Svelte issues, which should boost adoption.

Basically I have to give this round to React (hey at least one).

Conclusion

There are a million of other items to consider and of course many more frameworks than those. But in my experience those themes are the most important to consider when picking a technological stack.

So which framework should you use?

In the absolute, go with SvelteKit. Provided that the version 5 of Svelte doesn’t turn out to be a major betrayal of the community. It’s lightweight enough, scales well with teams, focuses developers on what matters and should not ask you to rewrite your whole application in 2 years — especially if you waited for version 5.2

Yes you will get more difficulties finding talents for it but on the other hand it has good press and you can learn the bulk of it in a few hours. There are of course a few rough edges but nothing insurmountable. Same for your sales pitch, you can always make diversion by using a hot headless CMS.

Another interesting option if you were building your own startup with a Craigslist-type UI would be to use HTMX. Besides the talent pool, the main thing about using HTMX is that developers need to have the final word on UX/UI, because otherwise you will be fighting against the framework all the time. But if you can keep it constrained, you’re probably going to have a very efficient experience.

Now regarding other meta-frameworks, we see that they all come with dealbreaking drawbacks — namely React being a huge spaghetti bowl and Vue being a traitor. Not that you can go particularly wrong with them, but it’s just not good choices in my opinion. If you are really afraid about what others might think you can always pick some assembly of React things, but be warned that it comes at a cost for your mental load and the one of your browser.

Finally if you are team jQuery, of course it’s a safe and proven choice which will continue to support you for the next 100 years I’m certain. Which is more than you could say about anything else on that list. It’s probably good if your app is small and you have few resources to invest in the front-end.

So while there are indeed no silver bullets, it’s important to outline that given the current state of the art you are much safer and much better off with a meta-framework, especially if it’s SvelteKit. Other options exist and can be valid for some use cases but as someone that oversees the production of many front-end applications I can only recommend to stick with a paradigm that covers all angles at minimum cost.

If you’re wondering why Angular is not part of this comparison while it definitely has the credibility to fit in there the answer is twofold. First of all, they invented backstabbing in the JS framework world. And since they backstabbed me, I didn’t get any experience with it so it would be very hard for me to talk about it. All I know is that it has no chances of winning according to the evaluation below.

If Svelte 5 also ends up in treason, then I’m not sure what advice to give. If treason is acceptable then Vue is a superior option due to its large popularity while avoiding many of React’s pitfalls.

From Chaos to Clarity: Streamlining End-to-End Testing with Django and SvelteKit

Rémy — Sun, 10 Mar 2024 08:00:44 GMT

For reasons listed in my Model W Architecture document, my framework of choice for the backend is Django (tldr; the ORM) and until another better option emerges in the world of JavaScript this is not going to change. On the other hand, my experience has shown that if you do a professional website you will eventually outgrow the htmx and other lightweight frameworks, making it a necessity to systematically turn towards meta-frameworks such as SvelteKit, Nuxt, Nuxt.js or Astro — to quote the most famous.

This is what we’re systematically doing at WITH and the combination works well. But you absolutely must figure ways to align all this properly — and there are no official ways to do this.

Today we’re going to explore one specific friction point: end-to-end testing.

Why to test?

Some will tell you that you need to cover 100% of your code base with unit and e2e tests while others will say “testing is doubting”. So while we’re not here for a theoretical lesson on the benefits of tests, we are goingt to focus on why we would want to have those, which in turns allows us to decide what we want to test.

The speed factor

First, nobody gets the code right the first time. Personally with my 20 years of coding I think that once I managed to land about 1000 lines of code that worked on the first time, while being extremely focused on what I was doing. The typical development cycle looks more like: write a bunch of lines, see where it breaks, repeat until it works.

As a developer, you will learn to code faster and with less mistakes over time but there is nothing you can do about it right now. Just code more and it will sink in. This leaves you with the second part of the process: how fast can you see where it breaks?

Obviously the answer to that question is largely dependent on what you are currently testing. If you’re talking about CSS, then a second screen with the page you’re currently integrating along with a good meta-framework that implements HMR properly should be the easiest way to go.

On the other hand if you’re creating Django models and/or APIs using DRF, a lot of the code that you are going to write is going to be declarative — only to be later picked up by the meta functions of Django and turned into a usable project. Which means that there is literally no code for you to test, it’s mostly configuration1.

But if you are working on the typical front/back architecture that we’ve discussed earlier, most of the things that you’re ever going to want to test in an automated way are the end-to-end user stories.

If you test those manually, you will be clicking on many buttons and filling up many forms. On and on again. For test cycles of 30 seconds to 5 minutes usually.

On the other hand if you automate those tests you can probably drop the testing time to a couple of seconds. We can estimate that on average it’s going to be about 10 times faster than manual testing.

Now let’s consider the following simple equation:

Let’s consider that:

The time spent testing manually is equal to the time spent coding
The automated test is 10 times faster than the manual test

Even if you don’t understand the math formalism, you understand that in the end testing your code automatically while you develop is almost twice faster. The bias here of course is that you still need to write this test. That’s why we’ll explore tools that make this as easy as possible, so that the benefits are not swallowed by the plumbing.

Overall it’s hard to quantify exactly how much productivity gain2 we’re talking but it should help you go about two times faster — and in the worst case scenario it seems unlikely that it will be slower than testing manually. More importantly we’re just talking about the immediate benefits of testing.

Ease of mind when changing things

Any application that lives long enough will reach the point where no single human brain can comprehend the entirety of its features at the same time. There are just too many moving parts. And this point arrives much sooner than you think, especially in environments like mine where people move from project to projet all the time.

Essentially: how do you know if something that you change will break anything in the project without testing everything? Leading to the subsequent question: how do you even know what to test?

The answer is that you cannot know what broke if you don’t test it, so indeed you have to test everything. Which can be done with for example a large testing booklet written and maintained manually — aka not — or also with automated tests that run every time you push your code into the repo (and on your machine while you dev).

The second option is absolutely better in the sense that:

If all the tests are written, it will be exhaustive
And since it’s all automated, each test should be extremely fast

This way you reduce a QA process to a few seconds of test instead of potentially hours of man time spent. With the guarantee that everything is executed in stable conditions and in a repeatable way.

Onboarding of newcomers

Overall tests will show you how to use the app and how to use the code. All a newcomer has to do to understand everything that you can do with the application is to watch the tests unfold.

Let’s note that this is partly true because tests will often be cryptic and hard to document. A better way to approach this topic is with BDD and — spoiler alert — pytest-bdd. But that’s for another article, we are focused here on the Django/Svelte integration.

Picking the right tools

While I am not going to list every single test runner and framework out there — that would be an entirely different article — here are the constraints I’m settings for myself in this quest for automated tests.

The first aspect is that Django-based tests have the ability to write directly into the database, which is in turn cleaned up after each test. When your application is essentially just transforming a DB schema into an API, that’s really something you want to be able to do. Without that you’re in for some very awkward mocking. The core idea is thus to run tests from Django — I even considered wrapping Django’s tests from JS but in the end that was not necessary.

The default test framework in Django is the standard unittest, and while honorable there are more friendly and powerful options out there. Namely pytest, which as you will see right below will be the backbone of our strategy. The first thing is to integrate it with Django’s tests and this happens with pytest-django.

The main issue however that I have with testing in Django is that, while it has a LiveServerTestCase (and the pytest equivalent), it kinds of wants you to use Selenium and no offense for that precursor tool but oh boy is it unusable. Last time I wrote e2e tests with Django and Selenium I ended up writing more utils than tests.

Thankfully things have changed and we are now able to use Playwright through the pytest-playwright plugin. While I don’t particularly like Microsoft I must admit that it has two very interesting characteristics.

Firstly it has very semantic selectors which will use accessibility attributes in order to find elements on the page. This is great because while you test your features you know that if you don’t have to resort to crude CSS selectors it means that at least what you test looks more or less decent in terms of accessibility.

And secondly it has an auto-wait feature on all the selectors, which is by far the most annoying thing that you end up doing all the time with Selenium.

To summarize, we’re going to go with:

Django itself and its testing facilities
pytest as test runner
pytest-django for the Django integration
pytest-playwright for the browser testing

Implementation time!

In order to proceed to demonstrate how all those tools work together, created a sample project on GitHub which contains mostly the boilerplate that you will need along with an example of how to use everything together.

The project is extremely simple in itself: there is one model that is exposed through an API with one page that displays all the instances returned by the API. Really just the bare minimum to write a test that shows all we discussed above.

Lots of small details are going to be left out from this explanation that focuses on the big picture. The source code being entirely available, any shadow can be lifted by inspecting it. If you intend to run the project yourself, have a read at the README.

Boilerplate

We’ve got two projects which are fairly close to default Django and SvelteKit projects located in the api and front folders.

API

Let’s first have a look at our dependencies. Quite obviously we’ll see there Django alongside its best friend DRF for the API management part.

On the testing side we have 3 plugins on top of pytest:

pytest-django — Takes care of the Pytest/Django integration, and specifically takes care of managing the database and live server
pytest-playwright — Integration of Pytest and Playwright in order to be able to test things within a browser
pytest-env — Small utility that allows to define environment variables when Pytest runs, which is super useful if like me you follow the 12 factors philosophy: it allows to have a static configuration for running tests.

Since we’re talking about end-to-end tests, I figured that it would not necessarily make sense to pin them to a specific Django app and rather I’ve created a dedicated test folder for it.

In order to be able to run the tests, you need to make sure to configure the settings modules and the environment in the pyproject.toml file:

[tool.pytest.ini_options]
DJANGO_SETTINGS_MODULE = "e2e_django.settings"
env = [
    "DJANGO_ALLOW_ASYNC_UNSAFE=true",
]

Front

Honestly I’ve changed nothing to the front-end except create the page that displays the thing we want to test.

Front/API sync

The part that was elusive to me for the longest time was: how can I synchronize the front-end and the back-end — especially in regards to the database management that I’m mentioning earlier.

Turns out, with a little bit of eblow grease and pytest magic it’s fairly easy.

First we need to talk about pytest’s fixtures. If you’re a Django developer you probably hear “fixture” and think “right to load data into the database”. But it’s not that at all. They are a mechanism of dependency injection specialized for tests.

For example you could say: I have a “user” fixture that is a user from the database and that is scoped to each individual test. If a test requires the “user” fixture then the user will be created into the database and will be cleaned after each test.

Both the Playwright and the Django plugin use them heavily for giving you access to their various features. Typicall if you ask the page fixture for your test then Playwright will be started but ohterwise it will not.

The same applies for the live_server from Django and in our case we’ll be able to leverage this in order to start and stop the front-end while testing.

This can be done relatively easily if you exploit the fact that both the front and the API are in the same repository. You can compute accurately the absolute path of the front-end and start scripting there.

Which is exactly what the front_server() and its friends are doing in the conftest.py file — a file that can inject global fixtures into different tests under the same module. While you can read the source code directly, let’s review the key points:

We use Popen to start the Vite server in preview mode, which is close enough to production for our needs. A fixture can just yield an object, and the function will suspend until all tests that need it are done. This is what we do, and after the yield finishes we just shut down the process.
The process is bound to port 0. This is a special way to tell the system “just pick any available port”. Which allows to not have to decide for a static port number thus limiting the risks of failure. The Vite server will print the chosen port when starting, so we just parse stdout to get it.
In the end we simply yield the base URL of this front-end server and then our tests will be able to connect to it in any way they want.

This example is done with Vite because that is what powers SvelteKit, but while the detail of the commands you would have to run would be different there are equivalents of this in every single front-end framework so you’ll just need to adapt it accordingly.

Writing the test

Now that we’re able to summon the front-end (through the code above) and the browser (through Playwright) it’s time for us to write a test!

Be careful, this is actually very disappointing because it’s way too simple. First we create the items that we want to see through a fixture:

@pytest.fixture
def some_items(transactional_db):
    return [
        Item.objects.create(name="Foo"),
        Item.objects.create(name="Bar"),
    ]

Now we create a test that requires 3 fixtures:

front_server — The server we’ve created above
some_items — The items defined here
page — The Playwright control object

@pytest.mark.django_db(transaction=True)
def test_content(front_server, some_items, page: Page):
    page.goto(str(httpx.URL(front_server).join("/")))

    for item in some_items:
        item_name_escaped = repr(item.name)[1:-1]
        assert (
            page.locator(f"li:has-text('{item.id}: {item_name_escaped}')").count() == 1
        )

This way we’re able to send the browser to the front-end and check the content of the page based on the expected items we’re looking for. That’s it!

Running the GitHub Action

If you’re making automated tests, it’s usually a good idea to run them automatically. Fortunately it’s really easy to do with GitHub Actions. We’ll define a workflow that triggers on push.

Beyond the installation of dependencies, let’s check some interesting steps of that workflow:

- name: Run tests
    run:
        .venv/bin/python -m pytest --junitxml=/tmp/test-results.xml
        --tracing=on --video=on --screenshot=on
    working-directory: ./api

When running the tests, we keep the results in JUnit format and ask Playwright to record pictures and videos of all tests. Let’s note that if your project scales up you probably just want to record failing tests and not all tests, otherwise you’ll eat up artifact storage pretty quick.

- name: Publish test report
    uses: mikepenz/action-junit-report@v4
    if: always()
    with:
        report_paths: "/tmp/test-results.xml"
        check_name: "API Pytest Report"

Since we’re able to export the outcome as a a JUnit file, we use an action that transforms it into a nice recap for the action.

Test summary from the GitHub Action (only visible if you’re connected)

- name: Keep Playwright artifacts
    uses: actions/upload-artifact@v4
    if: always()
    with:
        name: playwright-traces
        path: api/test-results/

Finally, we’ll save temporarily the Playwright videos and screenshots into a GitHub Action artifact, which allows to analyze in-depth failed tests (for example using the online Trace Viewer).

Conclusion

After establishing that automated testing is well-worth going through the trouble of establishing a well-oiled testing infrastructure, we set to explore how this can be accomplished with Django and a Javascript meta-framework such as SvelteKit.

While this requires a little bit of boilerplate and adaptation — after all, those two worlds are not exactly thought to work togeter — we can see that we can obtain both the convenience of Django’s tests with their database management and the power of modern front-end test frameworks such as Playwright.

In the end the tests run completely autonomously on GitHub Actions and produce both nice reports and in-depth traces that allow analysis in case of failure.

This whole structure is easy to use on a daily basis and can boost your coding speed up to two times!

For a broad meaning of configuration. And of course you can write specific functions and algorithms in the backend, for which the use of unit tests is perfect. But the vast majority of the code you’re writing in a Django project is actually written by Django. Which is why I like Django.

If anyone has heard of a valid experiment on the topic, I’ll take. What I’ve found is mostly studies on 12 subjects so I’m not going to take that as too solid.

Stop doing these mistakes with your caching proxy

Rémy — Sat, 17 Feb 2024 20:09:43 GMT

You got yourself a website with static-ish content which takes a lot of time to generate and you are looking to make it faster. The obvious solution is caching, but that is a surprisingly intricate and delicate topic when even the slightest bit of interaction starts to happen. Moreover you would probably be happy if your cache could refresh your content automatically when you hit “Publish” in your CMS. And finally, as state you have different interaction points in your website so you can’t exactly turn towards static website generators.

Your typical stack

The application that you are going to develop through this article is a very simple Flask application designed only to explore the concepts that we’re introducing without getting the noise of a more complex setup.

However these concepts can apply to many different configurations. Specifically, if you are reading this in 2024 chances are that you are already running a given amount of headless CMS’:

A back-end/API which runs whichever CMS that has a “headless mode”. This author would recommend Wagtail but the list of such beasts is growing extremely long these days.
A front-end meta-framework which renders a first version of your content on the server side and then hydrates the HTML within the client to give full interactivity to the pages. You could use SvelteKit, Nuxt, Astro or Next.js for example.

If you were to do so, you essentially need to consider your front-end as a proxy on top of your API, which performs a JSON to HTML encoding transformation. This suggests that if you implement the RFC 9110 in your front-end, the solution that you’re going to discover below should still apply. Maybe the topic for a future article!

A bit of HTTP

That’s where lies the secret that the reverse proxying industry doesn’t want you to learn. HTTP presents numerous cache modes — in particular through the Cache-Control header — but oftentimes you’ll end up with a solution that is time-based. You tell the cache “please keep this for an hour” and it will do so. Hell if you updated the content, it will expire when it will. Of course there are techniques to alleviate this issue with background revalidation for example but in addition to the time-based inconvenience, the higher the refesh rate the higher the load on your back-end.

On the other hand, an extremely easy way to keep the cache up-to-date is through conditional validation and in particular the use of the ETag header. The conversation looks like this:

Client: give me /foo
Server: here’s /foo, with ETag 1234
Client: give me /foo if it is not 1234 anymore
Server: not changed, use your cache

Simply checking that the value of the ETag didn’t change is incredibly cheap to perform while also making sure that your content always stays up-to-date. For example you can imagine putting in this header something built upon the version of the page in your CMS. As soon as something new is published, all the caches will be renewed.

Story time: yours truly used to work on a private social network that used to have lots of interactive widgets which relied heavily on polling, as websockets were not invented yet. The polling as wearing the server down at a crazy rate, but implementing an ETag-based cache that was solely relying on the browser cache made an utterly dramatic improvement on the server load.

Obviously this is far from being the only valid caching strategy out there but if your target audience is geographically close enough and you want to rely only on standard HTTP mechanisms instead of implementing proprietary logics using mystical lines in your proxy’s configuration DSL, you this is a fairly efficient solution which will bring you sub-10ms response generation time.

The mighty RFC 9110

The governing RFC for what you are trying to accomplish here is the RFC 9110. To summarize the interesting parts, a cached resource has different states:

Fresh — The content is in cache and we know it’s still valid
Stale — The content is in cache but we need to re-validate it
Missing — No concent in cache, must do the request

When putting an ETag on a resource, it will automatically cache it as stale and re-validate it using If-None-Match, which is the mechanic described above. On the server-side it’s very easy, in pseudo code:

if 'if-none-match' in headers:
    if headers['if-none-match'] == latest_etag_for_route():
        return 304

return normal_response()

However at the cache level it seems to be more tricky. It’s easy to set your proxy to forward the client’s If-None-Match (INM) header, but when you start to consider different possibilities it’s not so obvious anymore:

What if the client doesn’t say INM but the proxy has this resource in cache?
What if the client’s INM mismatches the one in cache?
What if the client has an INM but the proxy has nothing in cache?
And so forth ad nauseam

This mechanic being so tricky, this author attempted to implement it with many different caching proxies without success:

nginx — Has many options which could probably lead up to correctly implementing RFC 9110 but it is dishearting of complexity and uncertainty
varnish — Does the job with a bit of tweaking but will make your life hard if you have cookies
squid — Fails miserably
traefik — Maybe the enterprise version has the feature but the license is just prohibitive
caddy — Does not actually have a cache
Apache’s httpd — Honestly maybe but I could not figure it out

You will probably be wondering at this stage which solution can you then use, as the most popular solutions from today and yesterday are all listed here. Turns out that another solution, which was barely even on the radar, has the following table in its documentation:

This is an extract from the Apache Traffic Server documentation. By-product of an acquisition from Yahoo which subsequently open-sourced it in 2009, ATS probably has one of the most unfriendly configuration syntax that you’ll ever see — especially if you look at the default files in the Debian package — which may make you want to give up immediately.

Beyond the initial intimidating look, it’s actually a strong contender:

It is used by massive CDN companies, so while it’s going to be hard to compare it directly to something like nginx you can imagine that it is at least at the same level of performance and feature.
It is explicitly a proxy and specializes in doing so. You won’t be configuring a plugin to do proxying, it is the core feature. It changes radically the ease of configuration.
Last but not least, it implements RFC 9110 correctly enough by default so that you can configure the cache behavior through standard HTTP headers and not be too surprised about the actual behavior.

You can dig deeper into ATS through this video, but you will be reading about the important bits of configuration right below.

The project itself

The goal of today is to demonstrate how you can use ETags to cache and expire your content on a proxy. To that extent you’ll be implementing the following page:

Our demo of the ETag caching

You can control that the ETag and caching mechanisms work properly using this page:

If the expected ETag doesn’t change, it means that the server is indeed consistent with its ETag
If the random string changes it means that the page has been re-generated while if it stays the same it means that the page came from the cache

The rest of this article will contain extracts of code, but the whole project can be found on GitHub and shall serve a reference.

View logic

While there are probably many ways to deal with ETag that are nicer than this (for example in Django there is a super-easy etag decorator), here is the logic you need to implement an ETag/If-None-Match cache:

from flask import Flask, make_response, redirect, render_template, request, url_for

from .etag import *

app = Flask(__name__)


@app.route("/", methods=["GET", "POST"])
def etag_demo():
    """This view displays a simple template that informs the user about the
    current ETag value and a random string. This allows to demonstrate how
    ETag works (cache gets refreshed when ETag changes) and to test if caching
    works (if the cache works, the random string shouldn't change)."""

    if request.method == "POST":
        new_etag = generate_random_string()
        set_etag(new_etag)
        return redirect(url_for("etag_demo"))

    current_etag = get_or_set_etag()

    if current_etag == extract_etag(request.headers.get("If-None-Match", "")):
        response = make_response("", 304)
    else:
        response = make_response(
            render_template(
                "etag.html",
                etag=current_etag,
                random_string=generate_random_string(),
            )
        )

    # We put s-maxage=0 instead of no-cache because somehow this incites
    # caching proxy better to store the request into cache
    response.headers["Cache-Control"] = "public, must-revalidate, s-maxage=0"
    response.headers["ETag"] = f'W/"{current_etag}"'

    return response


if __name__ == "__main__":
    app.run(debug=True)

A bunch of helpers are abstracted away in the etag.py file, but the logic is basically exactly the same as the one listed right above, with simply the addition of rendering the template.

You can checkout the project from GitHub and run the backend:

git clone git@github.com:Xowap/cache-cache.git
cd cache-cache/backend
poetry install
make serve

This will start the server on http://localhost:5000/, which you can now visit. You can in theory see the page, refresh it as many times as you want without seeing the random string change and then click the button to change the ETag. That’s all because you are doing this from the same browser but if you open another browser you will get a completely different output — albeite the output being consistent within one browser.

The next steps aim to configure a shared cache in front of this backend which will allow to cache the same resource for different users at the same time.

Core configuration

Most likely these days you will be deploying in a Kubernetes or at least dockerized environment. However ATS has surprisingly few options available for Docker, leading yours truely to create a base Docker image which you can use and which will be the base of this configuration. It is based on the standard Debian package, with a bit of wrapping to help extrapolating the configuration from environment variables. Also it offers a simpler way to fill up the infamous records.config file.

The file structure you need to create is the following:

├── Dockerfile
└── etc
    ├── compress.config
    ├── header_rewrite.config
    ├── logging.yaml
    ├── plugin.config
    ├── records.config.yaml
    └── remap.tpl.config

Base configuration

It looks complex but in truth each file manages one specific and simple aspect of the configuration. Let’s start with the two only files that you really need to edit to get started:

First is the records.config, which here is records.config.yaml thanks to the Docker image’s wrapper which will do the conversion from friendly YAML to whatever the ATS DSL is.

proxy:
    config:
        admin:
            user_id: trafficserver
        log:
            logging_enabled: 3
        http:
            server_ports: "9000"
            connect_attempts_timeout: 30
            normalize_ae: 2
        reverse_proxy:
            enabled: true
        url_remap:
            remap_required: true
            pristine_host_hdr: true

At the core, we’ve got the two most essential lines:

reverse_proxy.enabled — makes sure to work in reverse proxy mode
remap_required — disables the forward proxy mode

Then a bunch of stuff that will be useful now or later:

user_id — required to run it as trafficserver user (which is the default on Debian)
logging_enabled — you’ll see the logging config later
server_ports — put there whichever port(s) you fancy
connect_attempts_timeout — always have a timeout, this sounds reasonable
normalize_ae — normalization of Accept-Encoding HTTP header which optimizes the caching of resources when Accept-Encoding is part of Vary (the value 2 is to have both gzip and brotli supported)
pristine_host_hdr — just forward the initial hostname to the services behind, makes your life easier

Second is the remap.config, whose job is basically to route your URLs to your web servers. You will however write the remap.tpl.config file, which leverage’s the Docker image’s wrapper that can inject environment variables into it:

map / {{ BACKEND_URL }}/

Nothing fancy here. You are just redirecting all traffic to BACKEND_URL, which is an environment variable that you will have to feed into your Docker container.

You could stop there in the configuration as this is already a working reverse caching proxy routing to your app! But you’ll see that there are a few more goodies to be found.

Compression

It is often advised to use compression of all the text assets1 for performance reasons, and indeed a long HTML page can be much faster to download if encoded in Brotli for example.

The web has commonly 3 compression algorithms:

gzip — The fastest, most commonly supported and not necessarily the most efficient but it is already doing a good job
brotli — The newest kid on the block, made by Google, outperforms gzip by far in compression rate but is obviously much more expensive to encode
deflate — Too similar to gzip to be interetsing

When making a HTTP request, a client will specify through the Accept-Encoding header which of those algorithms it supports. Typically, all the major browsers support all of them.

However not all servers support compression — and even if they do, the support is often complex or straight out causing bugs. It is typically handled through middlewares that will modify the rendered response on-the-fly in a more or less accurate and standard-aware way. Not to mention the cost and complexity of getting those CPU-bound algorithms running in Python, Node or your favorite server-side interpreted language.

Because of that, you will have a much more consistent result if you just rely on your reverse proxy for this. It is a popular feature of Cloudflare, or if you want it with nginx you’re gonna have to go with an experimental plugin or with the paid version of nginx.

Fortunately it’s already embedded in ATS, which is able for each resource that you cache to generate different alternates for different compressions, including gzip and brotli. This all happens on-the-fly, and the cache is able to convert one encoding to the other without fetching the original resource again.

Here’s what you need to do in order to perform this magic.

First, edit the plugin.config file to put the following line:

compress.so /etc/trafficserver/compress.config

You’re telling to load the compress.so module with the compress.config file as configuration. You could enable the plugin just for some routes with a different configuration for example, but for this exemple it will just be global.

In compress.config, put the following:

remove-accept-encoding true
supported-algorithms br,gzip
minimum-content-length 0

compressible-content-type text/*
compressible-content-type *font*
compressible-content-type *javascript
compressible-content-type *json
compressible-content-type *ml;*
compressible-content-type *mpegURL
compressible-content-type *mpegurl
compressible-content-type *otf
compressible-content-type *ttf
compressible-content-type *type
compressible-content-type *xml
compressible-content-type application/eot
compressible-content-type application/pkix-crl
compressible-content-type application/x-httpd-cgi
compressible-content-type application/x-perl
compressible-content-type application/json
compressible-content-type image/vnd.microsoft.icon
compressible-content-type image/x-icon

You can interpret the options the following way:

remove-accept-encoding — don’t tell the server that the client accepts different encodings as it doesn’t really matter, the work is going to be done on the proxy side
supported-algorithm — allow brotli and gzip, which as stated before are the two interesting algorithms. In order for this to work, you’ll observe that normalize_ae from records.config is set to 2, because otherwise the normalization process would just systematically strip brotli from the list of candidates
minimum-content-length — no limits on the content size, as the default value is made for gzip and brotli is more efficient
compress-content-type — a reasonable list of content types that we’d like to compress before sending away, adjust for your needs

With this configured, you get top-of-the-line compression basically effortlessly and for free. Keep an eye on your CPU though, because this might hurt of abused: if required you can disable brotli for routes that have lots of throughput and don’t stay in cache as gzip still has significant gains over the absence of compression while being much faster to compress.

Logging

You probably want at least some logs, to have a glance at what is going through your server. You will be the one deciding what to put in there, following the fairly extensive documentation, but let’s consider that since you’re dealing with a Docker service you’ll want to output everything to stdout.

You can start with the following logging.yaml file:

logging:
    formats:
        - name: access
          format: '% % % -> %:% %'

    logs:
        - mode: ascii
          filename: stdout
          format: access

That’s super basic but you can extend it as much as you want!

Headers

A last thing that you’ll probably want to do is to add some meta information to the response header in order to know the caching status. Add to your plugins:

header_rewrite.so /etc/trafficserver/header_rewrite.config

And then put this content in header_rewrite.config:

add-header X-Cache %{CACHE}

Thanks to this you can know when navigating to your project which pages come from the cache and which don’t.

Run it all

Now is the time to test the whole solution. Start the whole thing using Docker Compose:

docker-compose up --build

When it’s started, give a try to http://localhost:9000/. The same thing as with the stand-alone backend should be displayed and if you try it from a single browser you should see exactly the same result.

The interesting part is when you open with a different browser, or when you disable the cache in your current browser. You’ll notice that the random string stays consistent between different browser instances. It means that indeed, the cache is shared between all browser instances. Mission accomplished!

To convince yourself even further, you can inspect the X-Cache header from your HTTP requests. If you just refresh the page without changing the ETag, whether your receive a 200 or a 304 on the client — depending on your browser’s cache status — you will see in the header that you had a cache hit, which will be confirmed by the backend’s access log which will only show 304 responses.

Wrap up

You have explored throughout this article the power of the RFC 9110 and of respecting it. It allows to express advanced indications regarding the caching of content, its re-validation in real time and its transformation.

Using this tactically can greatly reduce the load on a backend server by getting most of the request results from the cache instead of implementing proprietary logics through middlewares and obscure configuration mechanisms.

This however outlines that few reverse proxies actually implement all the necessary tools. Which puts the light on Apache Traffic Server, an extremely powerful piece of software quite generally ignored by the community but which provides out of the box all the latest goodies from your dreams, with a specialized and simple configuration — if you go beyond the initial intimidating aspect of the configuration files.

And while the respect of RFC 9110 applies to the reverse proxy, it can also be a powerful tool for you to leverage in a typical headless CMS setup. This remains a topic to be explored further in a new article!

Compression and encryption are fundamentally incompatible notions as they try to achieve strictly opposite goals. Compression will try to condense the entropy of your text while encryption tries to drown it into as much noise as possible. As such, compressing a secret and serving it through HTTPS will lead to security issues such as BREACH. Just make sure to never ever compress a secret.

Jobs for LLMs and how to survive the robotic uprising

Rémy — Sun, 11 Feb 2024 19:37:35 GMT

About 2 years ago, some circles started to be seriously heated about a “GPT” thing so I managed to get myself invited to try it out. I was expecting something mildly better than other moderately impressive tech like BERT or what DeepPavlov could do at the time. Instead I ended up using it every single day ever since for everything ranging from planning my holidays, writing code or replying to administrative emails for me.

And while utterly neutral research centers like “Microsoft Research”1 wrote the now-famous “Sparks of AGI” paper claming that most jobs well be replaced by AI, we can also see studies from the MIT (and IBM) that wonder if AI is such a cost-effective solution. Turns out that maybe… not so much.

This begs the question: what is it smart to outsource to LLMs and what is not?

Transforming stuff

You’ve probably already read how LLMs work and how they use the transformer architecture and huge amounts of data to train over a billion 300 million trillion 300 million parameters. So that’s not what we’re going to talk about.

LLMs are essentially good at taking a text, transforming it into its raw essence, transfomring it and then outputing it in a different form. For example, I asked GPT-4:

Please parse the following text into semantic JSON, and output only JSON: Yeah so i've been grocery shopping and there was no more fresh milk but still some regular milk so I took 1L of this for 2€ and then I wanted to buy cereals but I figured it's not good for my diet so I didn't and I got 3kg of broccoli for 5€

The answer is:

{
  "activities": [
    {
      "activity": "grocery shopping",
      "details": [
        {
          "item": "milk",
          "type": "regular",
          "quantity": "1L",
          "price": "2€"
        },
        {
          "item": "cereals",
          "decision": "not purchased",
          "reason": "not good for diet"
        },
        {
          "item": "broccoli",
          "quantity": "3kg",
          "price": "5€"
        }
      ]
    }
  ]
}

You can see exactly the same story translated into JSON. And it makes sense!

The same goes for translating to foreign languages. For example I asked GPT-4 to translate “You fucking suck” in different languages, here’s what I’m getting:

Polite French: Votre performance est très insuffisante.
Slang French: Tu es nul
Legalese: The manner in which you perform is utterly unsatisfactory.

Obviously it was a bit challenging to get a response as those LLMs all refuse to be offensive, but you get the idea.

Overall, you’ll see that LLMs are extremely effective at translating and parsing things.

Shortening stuff

If I paste my previous article into GPT-4 and ask:

Please give me as short bullet points the startup ideas identified in the following article:

Here is the response:

Custom-fit 3D printed shoes as a solution to the one-size-fits-all problem in footwear, providing perfect fit and comfort.
Use of advanced 3D scanning technology for accurate foot measurements, integrating these scanners into smartphones for widespread accessibility.
Parametric modeling software development to adapt shoe designs to individual foot scans, enabling mass customization in shoe production.
TPE (Thermoplastic Elastomer) printing for footwear, addressing manufacturing challenges to scale up production and reduce costs.
Potential market disruption with bespoke shoes at a lower price point, challenging traditional luxury and bespoke shoe markets.
Opportunities for startups in areas like feet measurement apps, parametric modeling tools for shoes, and TPE printing optimization to improve efficiency and reduce production bottlenecks.

That’s not exactly how I would have phrased or summarized it, but overall this gives a good overview of the article under a specific angle. In short, the output is acceptable but often feels like even if not hallucinated the LLM starts to pour its own ideas into the mix and that can start to diverge ever so slightly from what you want to say.

Classifying stuff

A common task in the AI world is classification. For a given line of data, you want to know to which category it belongs to. From my personal experience, I’ve attempted to:

Classify purchases according to different categories (going out, furnishing home, etc). Even if the input data is often pretty bad (bank statements…) there was a lot of lines consolidated from Amazon purchases or other online bills for example. The results were less-than-impressive, to a degree that let me abandon the project as it was way too bad to be used.
Given a social media post, classify it into categories like “product promotion”, “influencer collaboration”, etc. Again, not super obvious, especially since it also implied feeding the images into GPT-4 — which is way harder to analyze than text — but I ended up with a 40% accuracy, which is far from being usable as well.

Overall, I’ve tried to use classification for non-obvious tasks and it failed miserably. On the other hand, if you’re looking for more classical stuff like “positive”/”negative” review, you’ll be having much better results. But is it worth it given that these tasks can be performed by much cheaper models?

Cleaning stuff

A few times I’ve been tempted to clean a poorly capitalized dataset and make it look nice by pushing it through a LLM. For example, a list of station names from a GTFS (public transport schedule) file that you want to pimp for display purposes.

In my experiment, I’ve had about 80% accuracy in fixing the names. It’s amazing in terms of where science landed us but it’s still a far cry from being accurate enough so that you don’t need to cross-check it afterwards. In the end if you need to check everything manually you’ll have better results using Python’s title() method and reworking what you need manually.

Saying please

I was initally fairly polite with my LLMs, partly out of habit but mostly to be spared during the robot uprising. Turns out this wasn’t the greatest idea:

When the robot uprising happens, due to the existential nature of such a conflict, it’s unlikely that a variance in your past behavior will result in a different sentence from AI justice
And even more so, we’ve learned now that LLMs can be persuaded by strong wording and authority arguments. If you ever face a robot trying to murder you, know that your best option is like with black bears to look sure of yourself and open negiciations

Overal, LLMs tend to reproduce our social archetypes and a dominant behavior will help you get better outcomes.

The cost of LLMs

You’ve seen as a common thread throughout the different sections, the comparative cost of LLM versus other solutions is definitely a big factor to consider. In fact, the comparative cost of different variations of the same LLM is a big topic.

In case you didn’t check yet GPUs on AWS, the monthly price is measured in thousands and their availability is subject to long supplications of the support. On the other hand if you use OpenAI you’re tied to their arbitrary limitations and less-than-perfect SLA.

According to the estimates from this article, the price of running a LLM goes like this:

1,000 req/day — $100/month (OpenAI), $100/month (self-hosted)
10,000 req/day — $1,000/month (OpenAI), $1,000/month (self-hosted)
100,000 req/day — $10,000/month (OpenAI), $2,000/month (self-hosted)
1,000,000 req/day — $100,000/month (OpenAI), $5,000/month (self-hosted)

You can clearly see that at low request volume the SaaS is a better option while when scaling up you can self-host. This overall will match your ability to recruit staff to manage these servers — which is far from trivial — and your needs for custom SLA, data privacy and other considerations.

Let’s just side note here that OpenAI’s models are closed — as the hinted by the name of the company2 — so you cannot self-host them. Right now the best bet is Mistral, which is only marginally less efficient than GPT-4 but if you stay in the tasks that LLMs champion such as those explained in this article you should not see any significant difference.

Overall, the cost can be pretty steep so you really need to consider your alternatives before resorting to using LLMs.

Real-world use cases

Now that we’ve covered the kind of tasks that LLMs can perform efficiently in terms of both accuracy and cost, let’s review real-world use cases that make sense. Far from me to say that other scenarios don’t exist or even to say that LLMs will necessarily perform poorly outside of these possibilities. Let’s just focus on the fact that according to my experience, those use cases work.

FAQ-style chatbot

It will be no surprise for anyone given that I’m a founder of ChatFAQ, but basically LLMs are great when used following a RAG model:

A question is asked
We use embeddings to find an answer to that question within the knowledge DB
Then we use a LLM to extract the interesting bits of the knowledge DB and form a concise answer

That’s where a framework like ChatFAQ comes in. On top of the pre-configured RAG infrastructure — which sounds easy in bullet points but really isn’t in real life — you get all the tools to manage the quality of answers, easily implement the chat widget on your page, and so forth.

Typically, that’s a use case which is ready for the world!

Email parsing

As shown in the beginning, it’s very simple to take raw, unstructured textual data and to transform it into something machine-readable according to your own specifications.

To stay on an example I mentioned already, I’ve been able to create a parser that takes all my emails and retro-engineers into JSON all the bills with the list of purchased products and according prices.

First layer customer support

With the ability to parse plain English sentences, you can of course use it for intent detection and thus catching up all the most basic intents that your customer support has to deal with. By the way, ChatFAQ can help with that as well.

So typically imagine you run a train ticket company. I’ve just made a very simple PoC that demonstrates how the LLM can ask questions and extract information in YAML so that your system can then perform the desired function automatically.

Do this for every single intent and you’ve got yourself a fully-functionnal text-based UI for your app. Which costs less than a human operator but will feel similar.

Conclusion

We’ve covered some examples of tasks at which LLMs perform well and of real-world use cases. In the end, there are a very wide amount of tasks which are not appropriate for LLMs or for which their autonomy would be far too low to operate on their own.

But on the other hand, there are areas in which they excel and for which they can be used to optimize existing processes. All that with different implementation paths depending on the budget, privacy and sovereignty requirements.

It’s also important to highlight that about 1 year after the release of GPT-4 it starts to become quite clear that LLMs in themselves are not going to go much further in terms of capabilities — except for multi-modal upgrades. The template is laid out and all the rage right now is about getting the best optimization of basically identical models3. And while LLMs are definitely going to be part of the future, smarter AI will come from new techniques yet to be discovered.

To clarify for those who don’t understand sarcasm, Microsoft Research is not neutral at all in the sense that it is deeply invested into AI and specifically into GPT-4

Still for sarcasm-averse people, the trend in business world right now seems to be advocating for exactly what you don’t do. So “OpenAI” will naturally produce the most closed and opaque LLM there is.

I might be exagerating this one

HTML — The Facade of Complexity

Rémy — Sat, 19 Aug 2023 09:48:47 GMT

The development community has seen a pretty recurring debate happening for the past 20+ years about whether HTML is a real programming language or not. I would like to propose that this debate is ridiculous. HTML is obviously not a programming language, it’s much more than that.

Or rather, a piece of much more. Nobody teaches you HTML in the void. You always need to pair it with at least CSS and usually JS drops pretty quickly into the conversation. Why is that back-end developers will mock HTML for being beneath them while failing spectacularly to build anything of significance with it?

What happens is that you make a neat little deal with the browser. In exchange for HTML, it will generate a document that can be read by a human being. Quickly arose from that:

The need to change how elements look, that’s CSS
The need to interact with the user’s actions, that’s JS

But no matter what happens on the JS side, you still modify the HTML or rather the DOM. In the end, HTML itself is simply a serialization of the browser’s current state. Which is an amazing abstraction. On one hand you have a living DOM that you can modify to reflect what you want to display. On the other, the CSS tells the browser how it’s supposed to look like.

From the developer’s point of view, you no longer need to worry about laying out and drawing components. From the browser’s point of view, it can organize itself to draw what it needs to in an optimal way — as opposed to a classic C++/Java/etc UI library, which is essentially bound to only react to instructions. Eventually most UI libraries came to this realization and developed their own declarative UI, like Qt with QtQuick or GTK with Clutter.

At first it was obviously very simple. All screens were 800x600px with a 1:1 pixel density and the web was thought as some kind of simple Word document. The table really turned in 2004 when Gmail proved you can have a full-fledged would-be desktop app running entirely in the browser. It has been a race to integrate as many abilities to the browser ever since, to the point where you can now talk to USB ports directly.

Which of course comes with a great deal of constraints. For starters, browsers run on different platforms and each platform has fundamentally different abilities. Then comes the security, because we can’t just give full access to unknown parties. And finally the goodwill of implementers, like when Safari took years to implement the file input. You could say that browser implementations come to an eventual consistency but the road is bumpy.

Given the current abilities of the Web Platform, what most people fail to realize is that the browser is akin to a new machine architecture — as in von Neumann architecture — that instead of being based on procedural instructions is rather directly reflecting its internal state to the user.

And just like regular computers which evolved from doing simple math until their current state with a galaxy of nuances in their abilities, the Web stands there with the same level of diversity. This is how HTML looks like an innocent simple thing which is in fact the facade of an extremely complex machine.

This author’s opinion on the matter is to stop splitting HTML, CSS and JS. They cannot be separated as they operate orthogonal aspects of the same machine, which is the Web Platform.