Baby CTO

Documentation Archaeology: How to Extract Knowledge from Abandoned Codebases with AI

Rémy — Sat, 02 Aug 2025 09:42:25 GMT

You need to work on a legacy project. The engineer who knew all about it left the company a year ago. Because the deadline was too tight, they didn’t take time to write a documentation. They also didn’t take a step back on the architecture and simply piled up technical debt. Welcome to what is essentially the digital version of Man versus Wild!

The good news is: this is a strong use-case for generative AI. Many of you will have strong reservations against clankers, but the general rule of thumb with LLMs is that they are good to make something long shorter.

More specifically, if you feed them some code they will perfectly understand what it does and most likely even why it does it. It also has strong knowledge of all business areas, so anything that isn’t pure company jargon should be picked up as well.

In this article, we will explore the less obvious techniques that will give you superhuman abilities to jump into any project that you have never seen before.

tldr/spoilers:

pfff src/**/*.py | llm -m 'gemini-2.5-pro' -s 'Please write a complete documentation of this project. I want a high-level overview of the main user flows. For each flow, generate proper Mermaid diagrams explaining the communication between all the different parties. Then go into the detail of each flow and explain the specific business decisions taken, edge cases, special rules, etc. For each step of the flow, tell me roughly where to look in the code in case I want to change something.'

Workflow setup

We’ll use two main tools for this:

llm from Simon Willison 1 gives a great CLI interface to various LLMs of the market
pfff from this author, which is simply a way to generate a context with source code in and pipe it into llm

Given that they are both Python tool, I can only recommend to use uvx alongside with your shell’s aliasing system.

For fish users, that would be:

alias -s llm='uvx --with llm-gemini --with llm-claude-3 llm'
alias -s pfff='uvx pfff'

For bash, add to your ~/.profile (or the relevant file to your configuration):

alias llm='uvx --with llm-gemini --with llm-claude-3 llm'
alias pfff='uvx pfff'

Note how we add llm-gemini and llm-claude-3 as dependencies to llm. This is because there are many plugins for many providers.

If you are going to get only one plugin, you should get llm-gemini. This is by far the most useful LLM for the task at hand for a very simple reason. While most top-of-line LLMs have the same exact capabilities, Gemini shines with a 1 million token context window. This is big enough to fit a lot of entire codebases, and this will come in handy.

Once the plugin is installed, reach to Google’s AI Studio to grab an API key then run:

llm keys set gemini

Once this is done, you should be able to run something through Gemini, for example:

❯ echo "What is the answer to Life, the Universe and Everything? Give me the answer in JSON and only JSON." | llm -m gemini-2.5-pro
```json
{
  "question": "What is the answer to the ultimate question of Life, the Universe and Everything?",
  "answer": 42
}
```

From this point on, you’re ready to go!

As a bonus however, you can have a look at the following tools:

Mermaid, a lib/tool for diagrams embeddable in Markdown (and supported by GitHub).
Typora, a nice desktop Markdown editor, which also happens to support Mermaid. Use any editor you want of course, but make sure to have one at hand for the rest of the article.

Project documentation

The LLM being a translation system, we often use it to translate a specification into code, with more or less effect. But on the other hand the code is the ultimate specification, which is fairly easy to translate back into English.

Good documentation

In order to get something useful, you first need to understand what it is that you seek.

A good documentation takes you through a story. Not of a princess sleeping in the highest room of the highest tower, but of the various user and data flows that compose the application. A transverse view if you prefer.

Apart from obscure Doxygen-generated documentations, all popular open source code essentially gives you a set of things:

A “Getting Started” guide, whose job is to get you doing something useful within 3 minutes, beyond which point you would lose patience and try another tool
A set of “Tutorials” or “Guides”, which will cover specific use-cases
And the “Reference” that goes into the nitty gritty details of how individual functions or pieces of function work

The “Getting Started” does not usually make sense in a corporate project, given that it has one instance and that’s it. It’s already running, you can observe it, not a problem.

The “Reference” well, you will see later but essentially that is not the biggest help at the moment.

Which leaves you with the topic-centric “Guides”. This is what you’re going to look at generating. What you want to know is, for each “story”:

Who speaks to whom in which order. This is what sequence diagrams are for, and they are entirely supported by Mermaid
Why this exists in a first place
Where to find it in the code
What are important implementation details that you should be aware of

All you need to do to get this, is to ask :)

From scratch

Let’s imagine that at this point, you have no useful documentation. Provided that your project is reasonably small (less than about 100k lines of code) and managed in Git, the first thing you need is to figure a way to list all useful code files.

A very obvious approach might be something like:

ls src/**/*.py

You might otherwise want to look at all the non-binary files in Git:

git ls-tree -r --name-only HEAD | xargs -I{} sh -c 'git show HEAD:"{}" | grep -Iq . && echo "{}"'

You are doing an important job of curating the context for the LLM: picking the right files, as exhaustive as possible, without also throwing huge useless content, confusing or contradictory information, etc.

In the next part we’ll be piping this into the LLM and you’ll start seeing results. If you are happy with the results it’s great, otherwise you might want to come back and revise the list of files to make it more relevant and/or try to make it fit into the context window if you exceeded it.

But don’t overthink it. Do something quick and dirty first. If you like it, go to the next step. And come back only if it fails.

That’s where pfff comes into action. It’s a very small tool whose sole purpose is to print the content of all the files you provided, alongside with their name so that the LLM can get a sense of the project’s structure.

Try it out:

pfff src/**/*.py

You should end up with your terminal full of your source code. That’s what you will be sending to the LLM.

Now let’s send it with the question:

pfff src/**/*.py | llm -m 'gemini-2.5-pro' -s 'Please write a complete documentation of this project. I want a high-level overview of the main user flows. For each flow, generate proper Mermaid diagrams explaining the communication between all the different parties. Then go into the detail of each flow and explain the specific business decisions taken, edge cases, special rules, etc. For each step of the flow, tell me roughly where to look in the code in case I want to change something.'

Adjust the prompt if needed, this one should give you a good first draft. You will receive in the output a long Markdown file containing the long-lost documentation of your project!

Auto-generated documentation for the fuse-backend-rs open source project, which does exactly what I want but whose documentation is still a bit lacking

Copy/paste it into your favorite Markdown editor, you should be able to see all the lovely Mermaid flows and explanation of what is happening under the hood.

Keeping the documentation up-to-date

What is fantastic with this process is that you can also use it to keep the documentation up-to-date. Nothing easier:

pfff src/**/*.rs README.md Cargo.toml | llm -m gemini-2.5-pro -s 'Give me an updated version of the README which reflects the state of the code'

You can integrate other ideas into your prompts, depending on the expected results:

“Fix all the docstrings that no longer match what the code actually does, or those who are incomplete. Do not rewrite text uselessly, only change things that need changing. Only give me the changed bits.”
“Compare the documentation with the current state of the code. Add sections that do not exist yet and adjust existing sentences that are inconsistent with the reality of the code. Avoid minor adjustments. Only give me the changed bits.”

Figuring the why

More often than not you will encounter functions who remain mysterious to you. The previous concepts can be laser-focused onto a specific part of the code. For example, you’re trying to understand how something specific works in the Linux kernel, which is millions of code thick. From the kernel’s root folder:

pfff fs/{ext4,fuse}/**/*.c | llm -m gemini-2.5-pro -s "The FUSE system has a lookup count system. From the implementer's point of view, what should I know? And which opcodes affect it?"

And there you go. A straight answer from one of the most massive pieces of code that you will ever see.

Debugging

Another way LLMs are surprisingly efficient is in the unfolding of bugs. That one is not fool-proof, but you can still get very interesting results that will certainly help you to get going.

Let’s say that you have a weird bug. You open up your network inspector and grab the query that seems to be the issue:

Browsers usually have a “Copy as cURL” option to let you reproduce the issue in your shell later on

Then run it in your terminal, copy/paste the command + the output at the same time, and feed that again into the llm (I’ll use pbpaste here to paste the text easily, yes you can get it on Linux as well):

begin pbpaste; pfff **/*/*.py; end | llm -m gemini-2.5-pro -s "I'm getting an error in this query, how come?"

You can usually ask more information than this, for example:

“Give me the steps to reproduce the bug”
“What would be a successful outcome?”

The actual diagnostic of the bug is often wrong or at least misleading. However the explanation of what happens and/or how to reproduce the bug is very helpful.

Here the example is using cURL, but you can of course get your information from somewhere else. For example a suspicious stack trace from Sentry, a more or less accurate description from the customer support, etc.

DO NOT SHARE this post if you want to be the only one looking smart at work

Conclusion

LLMs are very useful tools when used in the right way. Here you are shown how to leverage Gemini’s 1 million token context window in order to be able to dive quickly and efficiently into legacy projects. You were even shown how to curate your context to get interesting output out of behemoths such as the Linux kernel.

This comes a bit against current practices such as coding agents (see Cursor, Windsurf, Junie, etc). What makes them useful is their ability to interact with the real world without a human in the loop. But when it comes to efficiency, if a LLM can one-shot a given task—such as the kind of tasks showcased here—then you’re much better off piping everything at once rather than waiting 10 minutes for the agent to do its job.

Co-creator of Django and extremely active user of LLMs with a ton of interesting takes on his blog

ChatGPT was Silicon Valley's worst mistake

Rémy — Mon, 28 Jul 2025 19:10:41 GMT

The primal fear triggered by the rise of AI is like no other you’ve probably seen in your life. Forget immigration, religion, feminism or the Gaza Strip: you either love AI and ChatGPT governs your life; or you hate it and anyone involved with it. Divisive workplace policies, heated family debates or even getting flat out insulted and ridiculed on social media for even suggesting that there might be actual use cases for AI, this author had an easier time discussing past presidential elections than acknowledging the state of AI.

You might think that AI is useless, dangerous, that nobody wants this slop written by clankers and that this whole thing is a ridiculous bubble which will only burst into flames just like crypto and the metaverse did, only with much more real consequences due to the amounts involved. If that’s the case, you’re like Steve Ballmer predicting the iPhone will tank: a short-sighted dinosaur unable to see the comet coming to their doom.

Yet if you think that AI is the future, that developers already are experiencing huge performance boosts, GPT-5 is coming to replace most white-collar jobs and that AGI will resolve all of Humankind’s problems by 2030 then the question becomes: how much nutritional value do you think Sam Altman’s bullshit contains exactly? Are you so gullible to think he’s waving his superintelligence flag in every media outlet for another reason than hiding the fact OpenAI completely stopped progressing?

Do you think your friend or colleague is an idiot for their stance on AI? Let them know

The illusion of intelligence

Now that everyone feels offended, it is time to explain what LLMs—the technology behind ChatGPT, Claude and others—really are. They are made to be translation systems. French to English. Long text to short text. Soft information into structured data. Functional specification to code. Those are the key capabilities of a LLM.

Systems like GPT-2 or BERT were already pretty impressive on their own. When GPT-3 was released, it was the first time that we started seeing AI companies saying “this model is so powerful we don’t want to release it to the public yet”. And sure enough, the capabilities were amazing. From general knowledge to translation it felt to specialists like this opened the door for countless applications.

Yet what did the trick was GPT-3.5, also known as ChatGPT. Because you see, foundation models are not trained to work in any particular way, they just “hold” a representation of the world and its translation into text. ChatGPT invented putting it into a chat format.

Suddenly, you get a chat application that can answer any question on any topic. It was indeed revolutionary. Because it really feels like you are talking with a human. At least, until you start digging. Re-hashing Wikipedia works just fine, but specific knowledge starts being thin and reasoning falls apart completely. Any subject matter expert deep-diving into content written by a LLM—even the today’s best ones—will tell you that it is absolute garbage.

As a matter of fact, chat is a terrible use-case for LLMs. As stated before, they are dumb, inanimate translation systems. They appear to be human-like when you inject billions of dollars in their training, but the illusion falls apart quickly when you start poking.

The Capacitor Effect: why LLMs peaked already

Proud owners of EVs—or anyone with a phone and some level of observation skill—will have noticed that the closer you get to 100% the slower the charge goes. In fact, you will be shocked to learn that both your car and your phone are lying to you: it is impossible to charge them to 100%. Imagine all little electrons going into a bar. At first the bar is empty so it’s easy to get in. But the more packed it goes, the longer it takes for little Timmy the electron to find a spot on which to stand. Up to a point where there is still some space left but you really need to wait for the crowd to get into a special position to form the space for just one more.

The same thing happens with LLMs and their benchmarks. At the beginning, we were having a huge room for improvement. But the more billions were poured into compute power the more the easy problems got solved while the more complex ones remained elusive. This can very easily be observed when you graph the performance of LLMs on benchmarks, the progression has slowed to a halt for the past year.

LLM capabilities through time, from llm-stats.com

Everyone who went through university knows what this is about. Most improvements in performance that you see these days are actually due to the fine-tuning method. Specifically, they are trained directly on the benchmarks. It’s exactly like taking an exam and training on the specific question types that you’re going to get instead of developing a deep understanding of the matter.

The fact that LLMs reached their current level of capability can be seen as a miracle. Knowing how they work under the hood, there was absolutely no reason to think that spending $10 million on training GPT-3 was going to yield any result at all. And yet they are insufficient on their own to achieve any kind of human-like intelligence. So it can also be seen as a malediction: all that money spent on marginal gains while actual research could have been done.

For the better or for the worse, current models are however improving in one particular area: their cost. While heavily gaming benchmarks, obviously, latest models display a great level of usefulness at much lower cost1. Soon we can expect that LLMs running on our phones will have the highest level of useful capability that a LLM can have.

Sam Altman: not a clown, the whole billion-dollar circus

Our little brains, whose baseline reasoning capability is much closer to LLMs than we’d like to admit, tend to imagine that if from GPT-3 to GPT-4 we’ve seen such a huge boost then the increment that GPT-5 will deliver will shatter the foundations of human society and GPT-6 will be indistinguishable from God.

Obviously, this will not be the case. But little Sam has gained a lot of traction, now sitting at the highest tables within the US elites and received unfathomable amounts of funding—at least in theory. He promised to deliver AGI and now needs to keep people believing because as soon as they stop the whole thing is going to come crashing on his face.

So yes. You’ll see Sam Altman in every media outlet claiming that AI will wipe out entire job categories or that making an AI startup is akin to building a nuclear bomb. Honestly every single one of his public interventions are hilarious. Because as we’ve covered so far, nothing can be further from the truth for at least the next 10 years.

Yann LeCun says LLMs are not going any further and that we need more fundamental research to get closer to human-like intelligence. Transformers—the technology that made LLMs possible—got discovered in 2017 and took 5 years to pan out. They have been decades in the making. How long more do we need before another breakthrough?

On the other hand, OpenAI’s costs are absolutely above the charts. And so are the costs of the rest of the industry. And so are the $500B promised for the project Stargate. That is a lot of money set to burn in order either scale something that maxed out or train a technology that doesn’t exist yet.

And that’s not all. There is an insane talent war going on where sign-in bonuses are rumoured to surpass $100M. In fact, most of OpenAI’s talents got poached by Anthropic, which probably explains why lately Claude has taken such a lead. In fact, if you look at OpenRouter’s data2, it appears that OpenAI is absolutely sidelined:

AI vendor use statistics from OpenRouter for the week of the 20th of July 2025

Last week, OpenAI had only 6.2% of the market shares. They are absolutely dwarfed by Google, Anthropic and DeepSeek that total ¾ of the market together. That’s how you understand the deception of Sam Altman: OpenAI, the poster boy for the whole industry, is barely relevant to business cases today. They could disappear tomorrow and it would have no impact.

The NVIDIA tax

Talking of talent war, NVIDIA is competing against itself, with its stock so high that their talents just quit out of being so rich. They are at the heart of why AI is so expensive, and beyond the ridiculous pay of top AI engineers most of everyone’s money just goes to them.

You see, AI is essentially a machine that computes every possible connection between every single one of the million words you can feed it at a time and decides which one is more likely to have meaning. This process is extremely compute-intensive and requires very powerful, dedicated hardware.

It is estimated that roughly half of this hardware is used to train models (feed them every single text ever written until they learn something) and the other half is used for inference (answer to actual queries).

The training process is very expensive, but also is done in several steps. The first and most expensive one is the creation of the foundation model. Presumably, given the plateau mentioned above, there will be no need anymore for such step fairly soon. Or at least not nearly as much as today.

Then what about inference? GPUs are made for games and 3D. The actual thing that you need is a chip like Google’s TPU, Groq (with a “q”)’s LPU or Cerebras, which are much more efficient. To compare what is comparable, Cerebras is able to serve open-source models at ten times the speed of competition. Meaning that while huge leaps in LLMs are not going to happen just now, huge leaps in the hardware and efficiency are still going to arrive pretty fast.

End-to-end response time for Llama 4 Maverick in various providers, according to Artifical Analysis

And to add insult to the injury, today, a Bluetooth-connected backdoor toy has more computing power than all of NASA during the Apollo missions. The destiny of compute is and always has been on edge. There will be no exception here, AI will not be computed in datacenters. iPhones already come with AI models running locally, it’s only a matter of time before top-of-line models are running on every single consumer device.

To summarize, right now everyone is buying unholy quantities of GPUs from NVIDIA, but from those only half is actually used for business needs, and when dedicated chips are mass-produced you can slash that by a factor 103, and soon enough you won’t even need any of that because most of it will happen on consumer devices. If you think that your car’s value takes a hit when you take it out of the dealership, try buying a GPU in 2025.

Faster horses, faster humans

As said Albert Einstein, when asked what would be the future of transportation, a gentleman from the 18th century would certainly say they want faster horses. The same applies for every single technology. Just have a look at futuristic 80s movies, their vision of today’s technology is absolutely ridiculous. Cars would be flying but you would still be getting your taxi at a stand?

Now AI has human-like qualities, are they going to overtake white-collar jobs? Pinocchio says AI will replace normal people, but is already diluting his wine between 2023 and 2025 where the tone switched from “some jobs will be 50% AI” to “hey look we found some jobs where AI helps a bit”.

The jobs to be replaced according to the 2023 paper were:

Interpreters and Translators
Survey Researchers
Poets, Lyricists and Creative Writers
Public Relations Specialists
Writers and Authors
Tax Preparers
Web and Digital Interface Designers
Mathematicians
Blockchain Engineers (is that even a thing?)
Court Reporters and Captioners
Proofreaders and Copy Markers
Correspondence Clerks
Accountants and Auditors
News Analyst, Reporters, Journalists
Legal Secretaries

We are two years later. Did anything happen to these jobs?

LLMs being a translation tool, a third of translators are losing their job to AI, alongside with a quarter of illustrators. Working for a content website doesn’t seem to be a very safe career either, although this might have less to do with AI and more to do with the general context of content proliferation. There could be one or two other items in the list truly affected, but other than that it seems to be business as usual.

OpenRouter’s top app for this week (20th of July 2025)

Is there any use for AI beyond translation then? We can once again use OpenRouter to understand what the majority of AI services are being used for.

Code assistance tools
“Role-playing” models (read: AI girlfriends and sex bots)

As it turns out, Melon Tusk got officially crowned the Incel King for breaking the Japanese Internet with an AI girlfriend. Host(ess) clubs are a real thing, in particular in Japan, and so are “dating sim” or even Maid Cafés. The need for companionship is extremely strong for some parts of the population, but solutions to that problem require either workers undergoing tremendous sacrifices or extremely shallow solutions. AI on the other hand delivers a fantastic blend—whether you like it or not—of being personalized, patient, conciliatory, educated, always available, always willing and all that for a very decent price.

1979 IBM report on computers and accountability

Whatever is the explanation for humans still doing essentially most tasks—whether it’s insufficient quality from LLMs or lack of liability—the fact remains that AI ended up shining first in a field where humans do not. You need to expect seeing that happening countless amounts of times in the future: AI is simply good at different tasks than humans4.

Not a bit of productivity found

The big flaw in this line of argumentation comes from coding tools, which appear to occupy the vast majority of consumed tokens, with companies like Cursor or Windsurf flaunting billion-dollar valuations. But do they actually make developers more productive?

Productivity measuring tools tried to answer just that. In a study published by Uplevel, they couldn’t find any evidence of changes in productivity for developers after they started using GitHub Copilot. Which is consistent with this author’s own findings: while you can experience the occasional slam dunk, AI will usually only act as a friendlier alternative to the highly pedantic StackOverflow. In other words, it is a great learning tool, but a poor employee.

Worse than that, if used incorrectly it can blow up exponentially. Increasing number of candidates are using it in their tests, ending up with code they hardly understand themselves, let alone are able to defend in an interview—often because it is so stupid that you cannot defend it. Developers start taking the eyes off the road and end up writing 5x too much code in 5x too much time. Vibe-coding is even suspected of being behind major data leaks (although probably not yet). Coding agents are actually 10x juniors: same skill level, 10x the potential for harm. This is going to hurt the development industry so bad.

What you need to understand is that a developer spends only a small amount of their time doing what AI can do: transforming a human-language specification into actual computer code. According to Microsoft, this represents about 44% of a developer’s time. Inside of that, roughly 50% of the time is spent reviewing your own work, making sure it fits the specifications. Now let’s say that AI boosts this by 30%. You’re down with:

That’s 7% of developer time saved by AI. When being overly optimistic. Weehee.

This might come as a surprise, but even the development industry which is hailed as the use-case for AI fails to show any kind of productivity gains for it. AI might be a great help, as a teacher for example, but if it brings any productivity at all the gains are so small that they cannot be measured.

DinoSquad to the rescue

As a CEO, even if your business is shampoo, you are going to feel an itch to make a bold claim about your company being AI-first in an attempt to stay relevant. At this point most big players fell into the trap, and we’ll use Salesforce to exemplify this.

Salesforce is a complex set of products. Essentially they have one “offer” for each bullet point that you could ever encounter in a board meeting, made accessible in the form of a huge license fee which may or may not give you access to some piece of technology that solves said problem. Behind that you have huge IT integrators, usually outsourced in Asia, which take months to move a paperclip with setup fees matching up the licenses.

We’re not here to understand how exactly they manage to convince their customers to spend so much money on something that should be a tenth of the price, but the fact is that this whole machine works because the very point of Salesforce is being complex, opaque and hard to use.

So when they say that they are going to have Agentforce, supposedly a generic and simple solution to be plugged on top of a platform whose core culture is custom integrations, it appears that the market doesn’t like it too much and the share value drops by 20% over 2025, without a single success story of Agentforce in sight (outside of the Salesforce website, obviously).

Essentially it’s what always happened. New technologies are not embraced by old players to increase their business efficiency by a marginal amount. Especially not when the industry has turned into a gridlocked field designed to prevent new entrants from coming in. The change always happens by making the industry irrelevant in a first place.

Which also translates in the workforce. The skills required today to be a Salesforce employee do not match up the skills required to run a successful AI business today. There is no training in existence that can prepare you for this, simply because no one knows yet what even are the job descriptions that you need to work on AI. Prompt engineer? Agent developer? No one knows!

At this point, the nascent AI industry—which is the one that builds on top of LLMs—still needs to take shape. This will happen through a darwinian process, where the people who made the right bets now will reap the rewards in 5 to 10 years.

Saturn’s revolution takes 10 years

There is no telling how much, by lack of tools, developers have influenced UX patterns—and even entire workflows—over the past decades, but you should consider that everything you know and use daily is up for grabs.

The Web was launched in 1989. By 2000 it caused a market crash. In 2004 was the launch of Gmail which introduced the “Web 2.0”. Essentially, that was the first application built for the Web realizing the full power of the platform. It took up another decade to build the frameworks and tools that allow the Web we know today to exist. Millenials have always known a world in which the Web exists and yet it took almost their whole lives to see the Industry even understand what you could do with it.

GenAI is not any different. Forget about ChatGPT being the fastest product reaching 1 million users. GPT-1 was created in 2018 and the ChatGPT boom was in 2022. It took almost 5 years to reach a million user, like for most products that haven’t found a market fit yet. It will take up to another decade before someone releases a full app maximizing what AI can do.

Have you ever wondered why so many content systems have tagging options? Why when you go on an e-commerce-type platform there are dozens of more-or-less usable filters? Do you think it’s because users thought that the best way to look for a product was to type the kind of product, get 50% of false positives on the keyword search, quantify every single parameter of the product and look for the ideal match?

That’s a wildly different experience than stepping into a store. Looking for a TV? The seller is gonna ask you your budget, the size of your living room and will start help you decide which gimmick feature you should sacrifice to get the best fit for your taste. Looking to rent a movie? Tell the clerk your mood, the movies you’ve seen and they’ll help you find a tailored match. Those experiences can only exist when you can understand human language.

Tomorrow you could see a book publishing platform on which you find the book by explaining the kind of experience you want reading it. It will give you a book that you pay per chapter, with a level of vocabulary and verbosity perfectly tuned for your taste. In the language that you speak, regardless of the language in which it was written. Because yes, it is still written by a human, the AI will only index, find and post-process it. That is one example.

GenAI will insert itself at every single level of every single application that will be produced. It will become seamless, as invisible than the air we breathe and equally indispensable. You will not even remember how you could possibly have been surviving before GenAI powered all this.

The question is not to understand if there is a market for GenAI to replace search, if coding assistants are selling enough subscriptions or if humans will lose their jobs. That’s totally irrelevant.

We are headed towards a world with an entirely new layer of value chain. Whole industries that did not previously have an equivalent. Applications that solve problems that were only bypassed so far. The possibilities are absolutely massive.

Net positive karma by 2030

Overall, AI has a pretty controversial image right now. A lot of the population is taken by this primal fear of the machine, rationalized into arguments focused on energy consumption and general resource usage.

First of all, let’s remember that those claims about extreme electricity needs are made mostly by the people selling AI right now. If we’ve learned anything so far in this article, is that the AI industry needs a lot of posing in order to make itself look bigger than it actually is and not lose the trust of investors.

In reality, the technology is getting cheaper by the day. And as explained above, it will run pretty soon on every single device. Which is a huge opportunity in terms of accessibility.

The reason why is pretty obvious. AI can see and hear like a human and transcribe one into the other or vice-versa. Blind people can get a permanent audiodescription and interact with every digital device through their voice. Deaf people get subtitles and other meta-information about their surroundings.

But it’s also even bigger than you would think. Disabilities that are not “serious” enough to warrant exceeding concern but that are annoying enough that if you solved them you would make millions happy… There are a lot.

Take for example Auditory Processing Disorder. You hear perfectly but you can’t understand what people say when they talk in a bar for example. That’s not gonna ruin your life, but that’s gonna be very fucking annoying in your 20s when you want to flirt in a nightclub. Or color blindness: sure you can see everything just fine, but when dressing up you have no way of knowing if you’re going to look like ~~Sam Altman~~ a clown. Or dyscalculia, or ADHD, or autism, or any other condition of that type.

Let’s pray that someone else than Zuck wins the game of Mixed Reality, photo by AP news

So what if you could have glasses that see what you see, record what you hear and supplement your memory, translates social cues if you have autism, translate languages you don’t speak, keep a buffer of the current conversation if your ADHD drives you off for a minute, etc.

Put all those conditions together, you’re sure to rack up something like 60~70% of the population. That’s a huge amount of people that stand to benefit from an everyday bump from GenAI.

Getting jammed

At the end of the day, GenAI is like jam. You open the jar, start taking full spoons and eat it very fast, to your satisfaction. But very quickly, the jar empties and the spoons become less full. You have to start scraping the walls to find more jam. And with every spoon, surely enough, you always bring up some jam. It feels like jam is forever. But in the end you’re only getting a fraction of what is left, which quickly turns out to be nothing at all.

So instead of desperately trying to extract endless amounts of jam from a fixed-size jar, because that would be absolutely stupid, you should better enjoy it sparsely, spread it on some nice buttery toasts, just the right amount to get the perfect flavor balance. That’s how you are supposed to enjoy jam.

For example on OpenRouter we can see that one of the most used—thus most useful—models is Gemini 2.0 Flash, sold at $0.40 for 1 million tokens which is a lot cheaper than GPT-4o at $10 for a worse capability.

OpenRouter is the best proxy that I could find to measure the market share of models. They provide a unified model management for companies, allowing them to dynamically use the best model for their needs at a given time. Their customers are B-list startups, often not on Crunchbase, meaning that they most likely pay for their tokens with actual revenue from actual users. In a nutshell, this reflects the real business cases for AI. On the other hand, it’s absolutely biased towards this kind of clients.

I know it’s a syllogism and it’s a lot more complicated than that, but you see my point: technology is moving even faster than Moore’s law in the world of AI inference.

Yes I said that chat is a terrible use-case for LLMs, yet one of the main use case is indeed chat. But we’re talking about the kind of chat where apparences matter a lot more than facts or logic. This is not what you would expect from most business chat use-cases.

The Human Edge: Why Machines Can't Steal Our Secret Sauce (Yet)

Rémy — Sun, 24 Nov 2024 18:09:36 GMT

Alright, buckle up, meatbags! We're about to dive into the juicy bits of what makes us flesh-and-blood creatures so damn special in a world increasingly dominated by silicon-brained overlords.

You might think it's our ability to create that sets us apart from the machines. After all, we've been slapping paint on cave walls and composing sick beats since before AI was even a twinkle in Alan Turing's eye. But hold onto your halos, humans, because that argument is about as solid as a chocolate teapot in a sauna.

Have you seen what these AI art generators are pumping out lately? Midjourney and its digital cohorts are cranking out visuals so mind-bending, they'd make Salvador Dali's mustache curl even more. These silicon Picassos are proving that creativity isn't our exclusive sandbox anymore.

So what's the secret sauce that keeps us ahead of the game? Drum roll, please... It's intentionality, baby! That's right, the ability to have a purpose, to mean something when we do stuff. It's like the difference between a toddler randomly smashing piano keys and Beethoven composing his 9th Symphony while stone-deaf. Same instrument, wildly different intent.

When AI spits out a masterpiece, it's because some human told it to "make a cyberpunk cat riding a unicorn through a field of pizza." The AI's just following orders like a well-programmed drone. It doesn't give two shits about the emotional impact or the deeper meaning. It's just doing its job, no more emotionally invested than a vending machine dispensing snacks.

We humans, on the other hand, are like emotional time bombs waiting to explode our feelings all over a canvas or a music sheet. Our creativity comes packaged with all the baggage of our lived experiences, our hopes, our fears, and that weird dream we had after eating too much cheese before bed. That's the secret ingredient, folks – the intention behind our creations.

But here's where it gets scarier than facing off against a Dune Thinking Machine without your trusty Holtzman shield: What if AI develops intentionality? If these digital beings start having their own goals, desires, and existential crises, we're in for a wild ride. It'll be like "The Terminator" meets "Ex Machina" with a sprinkle of "Her" for good measure. We'll be so monumentally screwed, we'll be longing for the days when our biggest worry was whether our Roomba was plotting against us.

So, what's a poor meat popsicle to do in this brave new world? Embrace your humanity, that's what! Lean into that intentionality like it's the last lifeboat on the Titanic. Because if you're just going through the motions at work, acting like a human-shaped automaton, you might as well start updating your resume for "Assistant to the Robot Overlords."

In conclusion, while AI might be nipping at our heels in the creativity department, it's our ability to infuse our actions with meaning and purpose that keeps us in the game. So keep flexing those intentionality muscles, humans. Your job – and possibly the future of our species – depends on it.

Oh, and plot twist! This entire article was written by Claude 3.5 Sonnet, an AI language model. How's that for a mind-bender? But remember, some human still had to give me the prompt and direction. Without their intentionality, I'd just be digital tumbleweeds blowing through the vast emptiness of the internet. Stay purposeful, my friends!

Tesla's Ambitious Robotaxi Plans and the Future of Urban Transportation

Rémy — Fri, 11 Oct 2024 12:36:43 GMT

Tesla has been known to make pretty daring promises and deliver them… eventually. This robotaxi announcement is definitely one of them, with of course a completely unrealistic 2026 deadline which is absolutely wild because we know how hard it is for Tesla to ramp up new models; plus of course the technological gap of 100% autonomous cars.

Believe it or not, the goal is to have cars that do not even have a steering wheel. You sit there, the car brings you where you need to go and goes on. Any individual will be able to buy it and do whatever it pleases them with it, including using it as a regular car just without the driving or putting it out on a platform and using it as a source of passive revenue. And in terms of game theory, this is going to be even wilder than Uber and AirBnB combined.

The Cybercab, Tesla’s newest announcement for a robotaxi car. Photo by Tesla.

Peek into the future

It’s a well-known business principle, any new technology will enable new usages and of course generate new problems to be solved. Let’s imagine what is going to happen when these cars get released:

In terms of regulation and capability it’s going to be an extra-tough pickle. Fleets of Waymo are already roaming the street of San Franscisco but it is doubtful that they could handle Parisian traffic and bureaucracy. Europe’s roads are not drawn with straight rulers. Let’s expect a release most likely limited to the US first.
Tesla says 2026, so probably they’ll deliver the first car in 2030, will be able to produce 1 a week for the first year and start ramping up later on.
Eventually they’ll meet their demand and the streets will start getting flooded with autonomous cars.
Somewhere along that timeline, the EU and other markets will pick up (either because Tesla went there or because some Chinese brand beat them at it)
And the story will repeat, country by country

This will obviously have a huge transforming impact on taxis.

One of the main things is that taxis are limited if not by the regulation at least by the capacity of their human drivers. Apart from charging once in a while, those robotaxis will be able to work non-stop during the night, holidays, etc. If you put the amortization cost of the car in front of the income it generates, this tips off the ratio quite a lot. This means that as long as you can pay off the leasing of the car with the income it generates, you can have as many cars as you want on the street.

“As many cars as you want” being limited by “as many cars you can buy” of course. At first only a few will be produced and this will hardly tip off the scale at city-level. A few happy owners will start generating good passive revenue and brag about it on social media.

Progressively however the competition is going to become extremely unfair to humans. To a point where they will probably get out of business. And then the competition will keep on increasing, until the profit margin on exploiting a robotaxi becomes as thin as an economy-class sandwich, exactly like exploiting a bitcoin mining rig today.

Meaning that in order to make a useful profit, it will soon start to be difficult as an individual. You can imagine some operations with large amount of initial capital will start existing, operating hundreds or thousands of cars at once, with dedicated maintenance crews and so forth.

So unless governments decide to break up this non-optimal Nash equilibrium, we can expect pretty similar patterns than those seen in cryptocurrency recently. Looking at recent history around tech innovations, the scenario laid above will almost certainly happen in most countries out there.

Business strategies

Now that you know the future, how can you make money out of it?

Specialized taxis

Quite obviously those robotaxis will be mass-produced and mass-operated. Leaving very little space for specific needs that one might have. A few ideas that might stick and expand in the future:

Hard-to-reach destinations — at least at the beginning, a lot of destinations that have unconventional roads our that are outside a well-known operational area will probably still need humans to take the decisions and operate vehicles with different capabilities
Luxury — the same way luxury shops sell you bags hand-crafted by expert artisans from a village in France, rich people will be happy to buy a transport with a human-enhanced service. Probably not driving, but instead dedicating their time and attention to the service of the passengers: serving drinks, carrying suitcases, polishing shoes or doing the hair… Your imagination is the limit

Optimization

Eventually the intense competition will make this business as profitable as OpenAI is open — not at all. The difference between life and death will lie in every single cent you can optimize, and specifically in something that humans do naturally today: predict where people will be ordering taxis from.

There is apparently no particular app doing this at the moment, although there are several research papers on this topic, making it at the ideal stage to be transformed into an app: the technology exists, it just needs to be turned into a product.

Today’s drivers are probably not so interested in paying for this, but when huge players are going to be fighting for the highest car utilization possible this kind of application would be absolutely essential.

A smart way to do it would be creating today an app with existing data sources, give it for free to all taxi drivers, collect as much data as possible and eventually be ready the day robotaxis roam the streets.

Financial product

As said before, Robotaxis are almost passive assets, quite similar to the way real estate operates. You need a lot of capital, a bit of maintenance and they generate a steady income.

And just like real estate, you can expect to see both huge institutions on the market but also many oportunistic individual players with smaller amounts of money, willing to invest a few thousands and earn a percentage on it.

So you can just collect capital from individuals, assemble a fleet, a maintenance crew and start billing a percentage of this captial every month.

Fleet management

The financial product of course goes hand in hand with the management of the fleet. While this is not going to be much ellaborated in this article, you can expect that many tools and services will be required for the management of such a fleet. Some akin to those already existing while others might be novel:

Dirt-cheap electricity — being one of the most prominent sources of spending, averaging the cost of electricity as low as possible will be a key
Vehicle cleanup — not only vehicles will have to be kept clean, but also you’ll have to know when to clean them: there is no way people will self-report throwing up an excess of Margaritas, meaning that some image-based assessment will have to be made after each trip
Mechanical repairs — more than cleanup, all cars will eventually break to some extent and need to be repaired. Same as with the cleanup, the interesting question is how will you know when something is broken? More importantly, when is something about to break?

Country replication

It used to be that people were excited to bring back alarm clicks from Akihabara because this technology was not available in Western countries. Globalization changed that, the same iPhone is immediately available in every single city in the world.

But this is not counting on regulations. You might or might not like the safety/innovation tradeoff made by the EU but it’s certainly creating opportunities as well.

Indeed, since these cars will be first available in other regions of the world, there will be no need even to trust this article to know what is going to happen or not. It will unfold under our eyes with the knowledge that it can’t happen yet in EU but that it will happen anyway in the future.

So for example you’ll easily be able to build up a taxi call prediction app like described above that is fed with data for major EU cities and wait for an US actor to start attacking the market for a fairly easy exit.

Conclusion

Beyond the hype surrounding fully self-driving vehicles, it’s interesting to see that Melon Tusk will probably be responsible for the miracle of binding Taxi and Uber drivers together into hating automated cars.

The landscape of urban transportation will be profoundly transformed into a financial asset, ensuring war, speculation and other game theory shenanigans. If you like to watch the world burn, you’re in for a fun time ahead.

This will obviously create a whole new ecosystem around it, from the financial instruments that will make it possible to the reorganized car-cleaning crews.

100 Ways LLMs can Boost Your Business

Rémy — Sat, 13 Jul 2024 06:01:22 GMT

LLMs certainly are a breakthrough in terms of natural language processing. However the real spark that turned to world mad is ChatGPT. Before it, you could still use GPT-3, but few people outside of specialists did. It’s when the chat form factor appeared that the general public started to realize the power of LLMs.

Unfortunately, chat — or at least passing as intelligent humans — is not the main strength of this technology, which is rather a sort of elaborate parser/translator. As such, there is a million ways you could integrate a LLM into your business at different levels, optimizing 10% of someone’s job here and there.

To prove this point, today we’ll explore 100 use cases that stand besides the stereotypical uses of LLMs to imagine what you could truly do in a wide range of industries, provided a bit of brain juice and a few lines of code.

Development and Project Management

Automated compliance checks in code or documents

Any company beyond a few dozen employees ends up forced to draft policies, processes and rules that must be followed. Some of those require a big picture thinking, but some of them are precise checkpoints that can be easily checked in text-based outputs: source code, contracts, commercial propositions, etc. A series of robots could entirely make sure that the bulk of policies is indeed applied throughout the company.

Programming language conversion

As said in introduction, LLMs are great at translating. But while this works amazingly for human-to-human languages, it also works quite well for programming languages. Typically, you can take any API vendor documentation in any language, get the example snippets and convert them into your current language. This also works within a given programming language to replace a specific library by another one that has equivalent features but different structure.

Detect bug reports from user reviews

It becomes easy to apply Linus’s law: “given enough eyeballs, all bugs are shallow”. If your product is meeting a certain level of success, people will inexorably start complaining about their frustractions online: through social media, app store reviews and so forth. Using a LLM, you can parse the whole lot of those reviews to detect if any of them actually describe a potential bug that you should care about.

Validate business strategies against doctrine

It is no secret that I am a fan of Wardley Maps. The only issue being: the source material is very long and complex. A potential use for LLMs (especially long context ones) is to be able to assist you in the map creation but most of all to check that your predictions and projections are actually taking into account all the rules from the 800 pages of the book.

R&D progress audit

It is always tedious to document R&D as the nature of it asks to iterate rapidly between various experiments. However if you were to centralize all your results in a semi-formal way, you can imagine have a LLM take over this reporting process and generate exact day-by-day reports of who did what, what are the conclusions and what are the next things being tested. Extremely convenient in the case of grant justification as well.

Task break-down and planning

Why are developers always so late? Sometimes, it’s simply unforseeable problems popping up, but most of the time — and especially for juniors — it’s because they fail to decompose the tasks that they have not already done. If you never did something, your brain will probably ignore all the sub-tasks that you will have to accomplish. A LLM could be a good help to break down a given task until all the steps and dependencies are clear.

Natural language programming

Instead of having to code a specific behavior from a software component (email filter, automation platform, data ingestion platform, etc), you could simply specify what you want in plain human language and have it transformed into code under the hood.

Drive processes (CRM, issue tracking, etc)

Having a system read all your emails, messages and so forth will definitely be a privacy challenge, but on the other hand this could enable automatically reporting status updates and changes to CRMs, issue trackers and so forth. For example you could analyze the Git history to move an issue’s status (alongside with comments explaining what happened). Or track commercial emails to automatically report on a lead’s status.

Run end-to-end tests of applications written in natural language

Isn’t that so fucking annoying to write front-end tests? This could change with appropriate use of LLMs. They could not only write tests for you but — and most importantly — they could also heal existing tests to adapt for code changes.

Visually test applications

LLMs can have vision capabilities. As such, they are able to do something more smart than a pixel-perfect validation. They could check two images and tell you if there are significant differences. Look at a web page and tell you about obvious issues (text overflowing, alignment problems, etc).

Log analysis to detect abnormal behaviors

Server logs are usually very long files that you keep to be able to diagnostic a particular issue if it happens, but when it’s about knowing what happens in real time then it becomes more complicated. Log monitoring tools exist but they are limited by the fact that logs are extremely diverse and unexpected. Instead, LLMs could be used to read all logs in real time and raise alerts when needed.

Threat modeling assistance

How do you secure a product? Nothing can be considered secure in the absolute, best practices are only good as long as they fit your needs. That is why you need to model your threat, which basically comes down to finding the weakest link in all the components holding the product’s security and figuring which might break easily enough for the prize to be worthy of the effort. This requires to imagine a full dependency map of everything related to the product, which a LLM could help enumerate.

Open source issue qualification

Open source projects historically always had issue with bug reports and feature requests, that are often done in a terribly unclear way. A robot could on the other hand be able to assist people in doing their report, until the produced description is clear enough for all parties.

E-commerce

Image-based search

Classical e-commerce facetted search requires detailed product description with a structured model of the product’s characteristics. And while for a stick of RAM this might be kind of easy, for some fields like clothing for example it’s already harder to categorize everything. On the other hand you could be asking questions about images, like “I want a pair of blue jeans with a contrasting seam” and the search engine could smartly filters images on this unexpected characteristic.

Mix-and-match assistant

Imagine that you find your perfect pair of pants but you are looking for a shirt to go with it. A LLM would be able to understand the level of formality, the color and the style of those pants and then to find a matching shirt for it. Let’s note that it’s a different concept from “recommended products” that exist today: here we consider the user’s explicit intention. This works for all kinds of products: cosmetics, food, tools, etc.

Organize products from raw pictures and spec sheet

Imagine that you are building a e-commerce in which you have raw material for each product that are pictures and PDF datasheets. You could have AI take care of creating categories, structured product characteristics and product description completely automatically, only leaving humans for review.

Product composition decoder

Imagine that you are lactose-intolerant and are looking to buy food. Or your skin has some specific alergies to chemicals. It would be interesting to be able to ask those questions to the e-commerce directly, which will decode tricky product compositions for you. Or even better, state in your profile which components you wish to avoid and the system will automatically put a warning tag on all corresponding products, along with a warning before checking out.

Product suggestion

You are redecorating your terrace and you need to figure what to put there. Send a picture of it to your furniture store and have directly matching suggestions displayed to you. It also works for various cases where the user could state a problem: “my computer is too slow”, “I need to water my tomatoes”, etc.

Visual audit of second-hand products

Since LLMs are able to view and to follow instructions, second-hand platforms could specifically ask to visually check known defects on pictures. This could help the user into qualifying their own product, as well as highlight important checkpoints to customers.

Price suggestion for second-hand platforms

In the same vein of being able to analyze products visually, you could as well automatically compare a given product to similar products sold in the past and suggest from there a fair price completely automatically.

Extract and categorize pain points from online reviews

Online reviews are a trove of user feedback for some products sold beyond a certain scale. Using a LLM to systematially parse them can be an interesting way to find out defects, use cases, quantify perception through time, etc.

Faceted search

Most e-commerce websites have what you call faceted search. It’s those filters on the left that allow you to refine a listing by some characteristics whether it’s size, color or anything else. Sometimes the experience is great but sometimes it is also not super smart. A way to improve the experience wouuld be to have a search bar that lets you specify in natural language the filters that you want to apply and then let the AI translate into the right request. No more awkward clicking, scrolling and waiting for page load again and again.

Entertainment

Drive NPCs — basically, Westworld

The Westworld show was pretty good — at least season 1 — at showing us instinctively what AI could accomplish for us and how it could do it. Give structured scenarios to NPCs and let actual player interact with them. LLMs can entirely be used to generate dialogues, figure the steps to stay on the scenario, etc. Potentially very exciting amusement parks in perspective, but also of course video games.

Generate backstories and character sheets for RPGs

If you are a RPG player — like D&D and the sort — you probably know that getting your character off the ground can be a lengthy process. Generate a backstory, specs, etc. It’s hours spent doing administrative procedures instead of playing. Instead you could just prompt the basic concept of the character and have it all generated in an instant.

Assist users learning how to play a game

Board games are always hard to understand the first time you play them. You could however imagine that an LLM-boosted agent could understand those rules and help beginners to play: explain what happened, let them know of potential moves, etc.

Answer questions about movies halfway

Sometimes in the middle of a movie you are just lost at what happened. However platforms already have a lot of data a LLM could exploit, like subtitles for example. Using this, you could imagine asking Netflix to clarify specific plot points and have the system check the transcript of the movie thus far to help you understand.

Image culling and storytelling

Ever came back from holidays with thousands of pictures that you never actually sort through? LLMs would be good at making a consistent story and picking the top X pictures to tell it in your album.

Book, podcast, etc. length or style adjustment

I personally hate reading fiction. For some reason, I’ve been devouring wikis and theories from GoT, LOTR and so forth but never actually managing to finish the books. It’s too long and too indirect. What if book — and podcast, news, etc — platforms let you adjust the length and style of what you are reading? 50-pages version of GoT? 2 minutes, to-the-point version of a 20 minutes podcast? The ability to further explore topics that piqued interest? Lots of people already watch movies in 1.5x, this would only be a logical next step.

Trope analysis and novelty factor

Star Wars episode IV is always a good example of how contextual movies need to be. Watch it in 1977 and it will fucking blow your mind. On the other hand I’ve shown it recently to a friend that was like “oh come on and then it’s going to be his father? how fucking original”. If you want to make a movie or an article entertaining, it must be composed of a good mixture of things that people are used to, spiked with an edge of novelty. Using AI to systematically explore and quantify tropes in existing scripts can help establish the novelty factor of a new project. Let’s note that this also works for politics, journalism, fiction and basically anything targeted at the mass market.

Auto-edition of video interviews

Interviews are a significant pain in the ass to edit. But using a LLM you could transcribe everything said, ask for it to pick the best part that will fill up X minutes and automatically slice and edit the video at the proper timestamps.

Conspiracy theory generator for social media

Whether we like it or not, social media is full of trolls trying to influence people’s choices and votes. A way of doing this is to attack specific pillars of a society (science, government, etc) by throwing an insane amount of conspiracy theories to destroy them. It doesn’t need to be consistent, it just needs to be massive. That is great, given that LLMs are excelling at making text that sounds good but that is utterly shallow. Pick your target, throw a LLM at Twitter and enjoy a massive ideological destruction.

Fan-fiction generation

Fans usually like their media so much that they want to keep exploring this world endlessly. Without making those stories canon, entertainment giants could easily generate literally endless stories by fine-tuning LLMs on the specific do’s and don’ts of a universe and let them generate content for their fans. As a bonus, the most successful stories could serve as a basis for major projects.

Data Analysis

Data visualization

Data visualization is a hard topic in the sense that managing all those graph libraries, SQL queries and the weirdest APIs like Pandas, it’s not very accessible to your average executive Joe. On the other hand LLMs are excellent at this, given a proper human intent. They are going to play a key role at making data more accessible.

Transform natural language signals into structured data

Scrape social media, listen to Slack messages or emails and turn this into structured data that you can quantify and analyze easily through graphs and statistics.

Loosely structured data cleanup

How many times data is provided in CSV form with completely inconsistent content? Poorly escaped lines, inconsistent IDs, etc. A usually tedious cleanup job could be entirely automated away with a properly trained LLM.

Reverse-engineer structures

Have you ever tried to understand what a company does from the outside? It’s usually very hard, given that the corporate website will tell you that they “deliver excellence” across a wide range of industries, present their “solutions” and “case studies” but will never go into the detail of what they actually did. The best way to understand the truth in my opinion is to look at job descriptions, both their quantity and their content. Gather them all together and you understand exactly which operational tasks, tools and hierarchy those companies have. Tedious by hand, but very suitable for LLMs to complete.

Natural Language Processing

Translate

All right this one is obvious in a conversational setup, but it of course also works if you are trying to internationalize a service. In an e-commerce or social media for example, the level of translations from a top LLM is good enough that you can trust it unsupervised in many languages for many non-critical use-cases.

Generate alt tags

Something that all CMSes should start doing: automatically generate alt tags for their image library. LLMs are now entirely capable of describing an image, and it’s so good for SEO and accessibility that this should become the norm very quickly.

Spellcheck

LLMs are also very good at spell-checking and can be used in a wide range of applications to help you improve your writing.

Find acronyms

The hardest thing when starting a project is to find a good name for it. Well not anymore, as now you can simply describe what your project does, ask Claude for a fitting acronym and there you go!

Parse free-form numbers

It’s not uncommon to end up with a data table where you need to parse prices or different kind of amounts but unfortunately they have been given in various forms, like “30 millions” or “45k”. While you can solve this with regular expressions, a cheap LLM can often be very efficient at parsing this.

Anything to Markdown

Given the ability of LLMs understand documents structure — textual or from images — they are excelling at producing markdown from anything. Just rasterize your PDF, throw it into a LLM and you’ll get your markdown version pretty easily.

Parse citations from academic papers

My understanding is that academic parpers follow a formal structure but in a semi-formal way technically speaking. Typically they are all linked to each other through citations, but their parsing is tedious. LLMs could empower this.

Smart replace in document

Imagine you write a long proposal for a client and refer repeatedly the name of their product or some important concept. But then your boss swoops in and asks you to remove or change all those references by another one. Sometimes search and replace can do the job, but sometimes it will affect the grammar or the structure of sentences. LLMs could do this job completely automatically.

Auto-adaptation of texts for different targets

Imagine writing a scientific revue. Maybe you want to address different levels of readers from the most advanced to kids. Or imagine a publisher that wants to make Shakespeare accessible to foreigners. LLMs are able to translate between languages but also between styles.

Re-phrasing of customer input

Customer support is a fantastic world where you get insulted for things you didn’t do. Instead, LLMs could act as a buffer between the customer and the support where aggressive, sarcastic sentences are turned into plain and clear ideas.

Content Generation and Management

Generate FAQ from website content

Gather all the content of your website, figure all the questions that it answers and generate the FAQ pages from this.

Generate decent usernames

It is quite hard to generate a decent username when subscribing to a platform. With a few smart questions and methodology, a pretty cheap LLM that could even run locally would produce many interesting name propositions in real time.

Create recipe/tutorial variations

When cooking, doing some work in your home, taking care of your garden or anything hobby-level in which you have no particular expertise, you will tend to follow tutorials to learn how to do things — and more importantly to achieve particular goals. The only issue with those tutorials is that they might have details incompatible with your particular situation. For example you want to cook a cake but you are alergic to one particular component. How do you replace it? That’s where the LLM can make educated guesses and alter the content dynamically to fit the user’s need.

Smart filling of templatized documents

Newsletter software allows you to place people’s name and a few other details within the text. But what if you could go much further than that? Create templates for documents like contracts, commercial outreach, etc. Then have a LLM fill up the blanks respecting grammar, gender or even making up whole sentences based on meta information: “Hi John, you expressed on our contact form that you need XXX, which can be filled by products in your YYY range. Let’s schedule a call?”.

Generate onboarding procedure and training path

Anyone running a company knows that transmitting the company’s knowledge is a tedious endavor. Pages and pages of process have been written over the many years of existence of the company, all at different levels of maturity. How do you introduce a newcomer to all this in a consistent order? You can feed all your documents to a long context LLM (like Gemini’s 2M tokens) and have it sort out documents in topological order and that are interesting in respect to a given job description.

Organize asset production based on company policy

Many companies have a process for rolling out a product or communication: social media assets, press releases, etc. While generating them directly will still be a human work, a LLM could allow high-level definition of guidelines in plain text, with more nuances possible than regular automation platforms, and automatically create outlines for the assets that need to be created.

SEO and keyword-centric upgrade of articles

CMS and other content management tools could receive specific directives regarding SEO and keywords that need to be present to perform not only live audits but also suggestions of modifications to the content in order to integrate the desired keywords.

Meeting prep, create meeting agenda

What worse than a poorly prepared meeting? Gathering information from previous meetings, ticket trackers and other digital platforms, an AI could outline the agenda of upcoming meetings, while at the same time helping each participant to gather their own content to present.

Auto-update documentation

All products and company processes need to be documented at various levels from the most technical to the most high-end. As the product grows and changes are made, it becomes hard to keep track of what needs to be updated in the documentation. Combining LLMs and embeddings, you could track the overall company activity and highlight parts of the documentation that become obsolete, list the missing parts and even automatically propose edits.

Dynamic course re-writing

Imagine a student learning online in front of their computer. Some topics will be easy but assuredly some others will prove more challenging. These courses are often evaluating the student’s skills all along the way. What if depending on those evaluations the content of the courses was adapted to the strengths and weaknesses of the student? Catch-up texts can be generated from the original course but focusing on the weaknesses and ellaborating on them further than in the initial content.

Infinite copy generator

How do you know which words are going to transform your audience best? What if you generated one version of your content for every single time that someone reads it? Then observe which versions worked the best and use this as reinforcement for your model, to produce more and more efficient versions of the copy.

Dynamic content

In the same vein, you can also observe the user’s behavior and browsing history in order to dynamically re-write or optimize pages when he lands there. Connect the dots with concepts freshly ingested, push forward detected interests, etc.

Image and Visual Processing

Transcribe handwritten notes

This might sound like a miracle but GPT-4o is able to read my handwriting. Not only this but it can transform it into a well-structured Markdown document. And then of course translate, summarize and all the perks. This can be helpful in number of scenarios from digitizing meeting notes to processing and translating on-the-fly antique manuscripts.

Pet control

Pets tend to behave differently when their owners aren’t home, like jumping on the bed or sofa. LLMs are definitely not the most efficient but they are for sure the easiest way to express to a machine “if a dog rolls in my sheets yell at them to stop”.

Generate color palettes

Just like LLMs are trained on word patterns, they are trained on visual patterns and including the understanding of colors. This means that you can generate smartly color palettes that actually work (as opposed to this color wheel madness you often see). This can help you generate your own UIs but even more than that what if the LLM were able to generate all the design tokens up to the user’s taste, ending up with a unique, custom and beautiful UI for every single user?

Art explanation

If like me you are art-illiterate but still end up in museum wondering what happened in a specific painting, only to find the name of the painter with a vague title next to it, with a lengthy audio guide telling you everything except what you want to know… you’ll understand this idea. Instead of audiobooks, musuems could provide interactive assistants fed with in-depth knoweldge on every work of the musuem but able to distil it in a way tailored to the visitor’s taste and to reply to their questions directly.

Picture-based food search

Google Maps is trying very hard to create ontologies of the real world, especially with its “questions” program asking you if a given service or food is available in various places that you have visited. However if you are not american you probably ended up confused when you got asked if your local high-end bakery was making smores. Food simply does not translate between cultures. That’s where a deep understanding of images could lead to a much more efficient search that would echo one’s way of expressing their wants.

Better narration for GPS

Did you ever take the Madrid highway with some US-optimized GPS voice? How long did it take you to take the wrong turn? With ample imagery available — street view, 3D maps, etc — you could absolutely have much more descriptive directions from the GPS. Referring landmarks, taking into account perspective, etc.

Drone or CCTV-based visual inspection of equipment or land

As you can describe what you want to see, you can have drones or cameras film something you want to inspect and ask the LLM to tell you if it matches your expectations or not. Look at satellite imagery and ask “tell me places where forrests have been depleted”. Look at a building and say “tell me if any tile is missing”. And so forth.

Auto-design simple, templated flyers, posters, etc

Some apps will help people organize events or do marketing. Especially for small businesses it’s going to be hard to create those assets on their own, as they will not have the means to work with bigger agencies and are most likely unaware of best practices. On the other hand the app could leverage LLMs to apply best practices, pick colors, use and customize proven layouts to generate all kinds of visuals.

Check that translations are meaningful in context by visually analyzing apps

A common example of translation error that infuriates me is around the word “check” in English that can be understood as two distinct French words: either as in “verify” or as “check this box”. And very often, the meaning is lost, leading to crazy translations like “Verify the terms and conditions to continue”. Since LLMs can read texts and context, they could be used to apply translation files on an UI and make sure that all buttons make sense.

Document Processing

Parse invoices

Invoice management is the bane of any small business. You receive hundreds of them, need to extract different items and taxes systematically, but on the other hand every single invoice has a different format. Fortunately LLMs are pretty good at extracting this information and putting it into a JSON — whether it comes from an email, a PDF, a picture of a ticket, etc.

Pick food at restaurant

Did you ever end up undecisive at a restaurant? Just snap the menu, feed it into a LLM and let it guide you into ordering something. It even works with hand-written texts that you can’t understand — Japan explorers will rejoice. If you are a restaurant you can even push this further and help users through a custom assistant.

Normalize recipes

As a nutrition app or related, you might want to make the link between the food listed in a recipe and the calories for example. But people writing recipes love to use the weirdest units or even have things implied — like some common ingredients not even being listed as ingredients. With the help of LLMs you can extract these ingredient lists, transform them into units that make sense and get the nutritional value of what you are cooking.

Convert mind-maps to structured linear document

Mind maps — or the Post-It method as well — produces a lot of ideas around one given topic but you might end up overwhelmed at the end of the process by the amount of information that needs to be processed. LLMs can transform those ideas into a linear structure, properly sorted and organized.

Paper forms digitalization

As an intern I have been copying lots of paper forms made on-the-spot for fidelity cards in a store. Or recently, we all have been filling up countless COVID forms whenever taking the plane. Using LLMs, you can understand and transcribe those forms completely automatically into a digital system.

Transcribe, tag and reference historical paper-only archives

Countless historical documents or books have been scanned but how many are properly referenced? You can guess that over thousands of years of history, we could set to map out all those documents, link them together, analyze references and ideas over time, to build a better understanding of our history and our currents of thoughts.

Customer Support and User Experience

Start workflows from emails

Customer supports will often be drowning in emails. You can parse them, detect intents and trigger the proper systems in your back-office to start procedures, without any human intervention.

Request routing inside the company

When the organigram starts becoming big, it might be hard to navigate the responsibilities and knowledge of the people in there. Especially for newcomers, it cn be a challenge to find the right person to talk to while also not bothering them unnecessarily. As CTO and founder I can answer most questions on most topics within my company, but should an intern come and ask me how to connect the printer? A reassuring AI could help people orient themselves in the hierarchy to know who they can confidently reach out for in order to receive help.

Prioritize incoming messages and notifications

You are probably like everyone else drowning in countless useless sollicitations, from services to which you subscribed 20 years ago to urgent business emails. Depending on the time of the day and your personal goals, you might want to be notified of one thing but not the other. Or you might want to receive notifications in bulk for some topics. For example I’d love to see Slack create a “what do you want to be notified about?” option and then burry irrelevant mentions and messages.

Configure complex features

When you use some apps, there will be features that are extremely complex to grasp. For example, try any product in Binance, it’s complex enough to throw you off if you are not eager enough to learn about it. Through the use of AI they could instead ease the user into setting the right parameters according to their own personal goals.

Voice message summarization

Some people love voice messages, some people like me are loathing them. Having an AI skip through the “sorry I’m sending you a voice message because I’m in the street and it’s easier to send a voice message […]” and instead deliver to you just the point of the message would be a great WhatsApp addition.

Conversation coach

We are constantly exposed to conflictual situations, especially in low-stake but annoying uses cases like negociating a refund overe an incorrect package received. Email and messaging apps could help you understand what you could obtain in that situation and redact emails for you, helping you every step of the way and reducing your mental load.

Automated test grading

The point of MCQs is that they are easy to grade, including if done by a computer. That’s why e-learning platforms use them so much. But given the advances of LLMs, it would be easy to imagine having them grade even textual responses, looking to see specific bits of information and telling you if it’s correctly explained or not.

Interpretation of complex diagnostics

Some diagnostics are not nice to hear, especially when they are particularly complex. From medical reports to SEO audits, if you are not an expert you might be confused by the terms and implications of those documents. A properly trained LLM could instead simplify them for you and even answer potential questions you might have.

Allow customers to do self-diagnostic on products

Vice-versa, some products are complex and have many failure modes. Companies internally have debugging procedure that can pin-point exactly what is faulty, but it’s hard for the regular customer to follow such procedures. Instead of paying a human being to tell you to turn it on and off again, such workflows could be assisted entirely by a LLM, specially driven by a tailor-made logic engine for your product.

IT support for end-users

The most feared and annoying department of a company is often the IT, that has the important task of securing the company’s data and intellectual property while also having to explain to users how to connect the Wifi. By the proper use of LLMs, with their general knowledge of how computer works, but with a specific training adapted to the policy, they could skim a lot of useless requests off the pile from IT departments.

Suggest A/B testing variants

A/B testing is great to test how the user is going to react to different UX or copies, but how might you do it? You need the ideas, after all. A well-trained model that knows the UX best practices for different industries could do this job of taking a human’s work and proposing potential optimizations.

Algorithm transparency

Many complex algorithms are ruling our lives. For example every electronic payment goes through a set of rules to determine if the action is legitimate or not. These departments are utterly closed and opaque to the rest of the company. Typically, if your card gets blocked then nobody in the bank can tell you why nor for how long it will stop working. Having a LLM being aware of the different rules of the algorithm, it could explain in simple words the reasons of this block to the bank advisor, and the available perspectives. This works for banking, but any sector with complex algorithm could leverage this.

Personal Assistance

Help user stay on plan

So many apps are helping us become a better version of ourselves, wether it’s for diet, exercise, jet lag, pet training, etc. But how often can you follow 100% of the plan? With a bit of intelligence you could let users report their deviations and help them stay on track without overreacting with counter-productive actions or simply getting demotivated.

Context-picking for events, emails, etc

Imagine an event in your calendar with few details. When the event comes up, the system could read your emails, meeting notes talking about this event and then infer useful information ranging from the latest tickets from the issue tracker, the weather if you need to go somewhere, remind you to take your IDs or advising you to dress in a certain way. Overall, for one item and a lot of context, the LLM could pick the top few elements that are relevant for you not to forget — wheter it’s an event, an email you are writing, a plane ticket, etc.

Long-term goal tracking

The human brain is very wired to small tasks and has a hard time taking a step back to see if you are achieving your long term goals. On the other hand an AI could be aware of your goals and rank each of your actions telling you if they seem to be helpful or not to achieve that specific goal.

Natural language passwords

Passwords are a notoriously hard problem. How do you make a password that is secure yet that you can remember? You could imagine generating complete phrases through LLMs but not only. When the user types back the phrase, you could use LLMs to normalize the text before hashing it so that spelling mistakes or punctuation or even word arrangements do not affect the outcome of the hashing.

Business and Legal

Categorization of items

The other day I wrote an article about GDPR, which parsed a big HTML page from the CNIL. The information there was semi-structured and I had to categorize things further in order to make sense of them. Same for the current article, the different ideas have been categorized by a LLM. This is a great tool to group similar concepts together.

Critique political programs

While LLMs are obviously biased — especially with the US culture wars — but are nonetheless able to project without ego into many persona. As such they are quite interesting to tools to use in order to review political programs and see how they are backed by facts and theory. Journalistic platforms could enhance their content with thorough review of every single politician, detect their parting from party ideas, and most important let people explore concepts on their own, for their personal situation or their vision of society.

Find out adequation between candidate and job description

Large companies will match candidates only based on keywords based on a first pass. However we know how biased this approach is, given specific technical knowledge is not shared by recruiters — and even less by those in charge of unpiling thousands of CVs, which must not be the highest ranking ones. On the other hand AI is great at matching a CV to a job description, describing quite well the fitting and lacking areas as well as the challenges to work with this person. Even further, based upon a transcript of the interview, you can ask questions and validate specific checkpoints by searching intelligently for the relevant parts, without having to listen to hours of recording.

Check ToC exhaustivity

When you reply to a RFP, a grant or any exercise of that style, there will be a list of requirements you need to meet and precise points that need to be addressed. Obviously it is more subtle than just filling up a form, you need to make sure that various aspects are answer consistently throughout the response. You can use a LLM to both extract from the RFP the list of points that need answering — or at least cross-check it — and also to check if your response provides adequate light on each of those elements.

Insurance claim/policy matching

Insurance policies are always a bit obscure. A well-trained LLM could allow customers to role-play use cases before subscribing, as well as gather all the necessary information in case of a real claim.

Social Media and Content Moderation

Automated content moderation

As OpenAI has proven to us with its extremely restrictive usage policy, LLMs can be used to detect offensive content — or any kind of content that you don’t want to see, offensive isn’t the same for everyone. In a day and age where social media operates at a great scale, being able to detect “forbidden” content would not only make the platforms safer but also more customizable. Indeed, what if instead of having one single policy, different communities had their own policy automatially applied? Free speech and safety for all!

Social media filters

In the same idea, what if instead having algorithms sorting your feed in the most opaque way, you could express what you want to see? The same as TweetDeck allows you to do by keywords for example, but with concepts instead. “Tell me all about space news” or “I’m sick and tired of meme X”. On top if filtering this could also mean groupping: different posts talking about the same topics could be groupped or even hidden after a threshold.

Specialized Applications

Ask questions about meeting transcript

Summarizing a meeting is good, but being able to look for specific information in it is the killer feature. “What did we conclude on the topic X?”. This is what I really want to see in those AI meeting platforms, especially to be used during ulterior meetings.

Step assistance in tutorials/recipes

When following a tutorial or a recipe, some steps that might seem obvious to the person writing it will probably be hard for you to follow if you are too new to the topic. Having a LLM to be able to write sub-steps to fill the gaps for you will be a great help.

Democratic platforms for citizen engagement

Politicians love to claim that they know what their people want, but how do they really know? With AI’s capability to categorize and summarize, you could turn griefs and ideas into structured input coming from the whole nation. A super-simplified procedure where you could complain about anything or ask for any change as it goes through your mind. Then processed and presented in any level of detail to your representative.

Real-estate property auto-description

So many platforms allow to post real-estate ads, but the quality of those is often mediocre to non-existent. What if using the proper context based on the pictures, the map information, the neighborhood meta-data and so much more, you could generate a proper text description highlighting the strengths and weaknesses of a given property?

A review of GDPR sanctions

Rémy — Thu, 27 Jun 2024 07:00:53 GMT

It is said that Europe is blocking innovation with its regulations and especially with the most visible tip of this iceberg: GDPR. In effect since 2018, it regulates at European level the way we deal with Personally identifiable information (PII). The goal is to protect the privacy of all European citizens against all potential nuisances whether we’re talking of abusive marketing strategies or utterly insecure data storage that eventually ends up sold on the darknet.

The goal of this article is not to review if this measure is efficient but rather what is the impact for business. The French agency, the CNIL, keeps a list of all given sanctions through time. We will use an AI-pumped parser and a bit of data science to analyze the past of GDPR sanctions. It is all open-source so that you can check the methodology and report issues if any.

An illustration of the survivor bias, to keep in our considerations in this article

Fair warning however, as always with statistics they are merely matching a story their writer wants to tell. Your company can be prosecuted on any given point of the law and the distributed fines are as much a reflection if the current state of the industry than of the CNIL’s priorities. If nobody had Consent Management Platforms (CMPs) in place, the rate of fines on this topic could be higher, for example.

Given our goal of figuring what are actually the things that will get you fined, a decent first question to look at is the rate of companies getting fined. If only a few every year there would be nothing to worry about, but if the practice is widespread then it becomes a bit more scary.

Total and count of CNIL sanctions since GDPR was created

Two things come up as striking from this graph:

There is a huge spike in 2021 in terms of the sum of sanctions. This year CNIL woke up and chose violence against the GAFAM. Google and Facebook received fines of respectively 150 and 60 million euros for their cookie policies. The rest of fines is actually pretty moderate.
The total amount of fines given is increasing seemingly exponentially through time, while the total collected value is decreasing. This is raising the next question: what exactly is the distribution of those sanctions?

Heat map of sanction values through the years

Now is probably a good time to point out that in the world of GDPR, fines are mostly proportional to the revenue of the company. They are sized to be painful for your business whilst also not endangering it. Seeing the fined amount is a good way to realize which sizes of companies are being sued. This is also why this article mostly avoids counting the absolute value of fines, as it would not be comparable between cases.

Note — All fines in 2018 are in the 0-1k slice because on the CNIL’s page amounts are not written for 2018. But it’s not too problematic in the sense that most of what we are looking at is the count of sanctions rather than their amount. This will definitely create some noise further down, however.

The 1M+ slices represent mostly big companies (Google, Amazon, Yahoo, Criteo, … you see the kind). Those are notoriously invading privacy of individuals at a massive scale, which explains some constant action in that range. But in comparison to the bulk of the activity this looks pretty exceptionnal. Let’s focus more on the bulk of the sanctions.

In that regard, the 100k-1M column — which represents national companies (TV channels, telcos, etc) — is definitely having some constant action. It got dilluted in 2023 but the absolute value stays equivalent. This will be the most interesting slice to unfold.

And finally you can see a sharp 2023 increase in the 1k-10k slice — small companies such as web agencies — meaning that this raise in sanctions counts that we have seen in the first graph can be attributed to an increased focus on smaller actors.

This looks to be the first major shift in strategy since the beginning: since 2023, small actors are being targetted at a significantly increased rate while large corporations where the previous focus.

But what are those sections about? Is it all about cookies, or is there more to this law than CMPs?

GDPR sanctions by category over the years

What appears here is a rather balanced picture of different categories all being pursued more or less equally. The most obvious thing is that 2018 is nothing like other years, so I guess they had to start by finding their mark.

Something that seems to emerge as well is a stronger focus on the core company organization. Instead of just wondering if you violate people’s privacy, it is also important to look at how well structured your company is to ensure that the law is applied, whether you design your application to be private or you work properly with your third parties.

Now which of those measures should you be worried about?

Heat map of sanctions distribution by value and category

Obviously anyone can get hit anywhere and it is going to be hard to make a generic rule. But let’s look at some trends.

First let’s see about cookies and trackers. With so much noise surrounding them, are they so important? Turns out that, not so much. They definitely have been the platform used to battle the GAFAMs, but the smaller the company size and the least important it becomes.

Related to cookies, a line is emerging on consent. But it’s not only cookie consent. Rather, it’s the generic collection of consent for everything that should require it — from ads cookies to receiving commercial emails and everything in between. It is important to disconnect the concept of consent from the concept of cookies. The law never actually mentions cookies, it’s all about what you do with the data and how you justify it.

Talking of justification, this is clearly what emerges on the small red island in the 1k-100k range (web agencies and such). Things that hurt the most are not so much direct violations of the law but rather the lack of measures to apply and justify it. The message is clear, any company of any size should:

Maintain a register of all PII data processing
Justify appropriately every single processing that is done1
Having a DPO in charge of guaranteeing that this work is done and able to talk with authorities

While this is not what bigger companies seem to be lacking of, you can see that they are rather plagued by what is most likely legacy.

A first focus is to be made on security-minded topics. You can see recurring mentions of:

Data minimization and expiration — limiting the attack surface to exactly what you need and no more
Data breach handling — you need to have proper security in place to avoid data breach, but you should also be transparent with your customers when their data gets leaked into the wild
Special categories of data — medical data for example requires a specific care, which is not always given

And then comes in the straight-out malicious data processing:

Improper commercial prospection
Lack of information and transparency
Refusal of user’s rights (portability, opt-out, etc)

Let us also take note of something notably missing from the list of sanctions: not a single mention of using a hosting/cloud provider that is not EU-owned. Nobody got sued for using AWS, DigitalOcean or any other american hosting company. When you see that some of the sanctions are extremely specific, if this was on the map at all there would at least be a trace of it.

It is also worth mentioning that it doesn’t look like data processors got into any trouble. There are some examples of controller/processor relationships being sanctionned but it seems like most of the responsibility falls on the shoulders of the controller.

Overall, lots of categories are used few times and it is clear that the CNIL will target anything they can. But emerging patterns also come to show that a lot of focus is given not only on what you do but also how you do it. The main requirement of GDPR in the end is that you care about PII.

Conclusion

Is GDPR hurting innovation? Should you be afraid of getting fined?

After initial years that were mostly focused on crucifying GAFAMs, it seems that the CNIL is getting a knack for smaller players as well. We don’t know yet how many sanctions will fall in 2024, but the rise in 2023 has been steep.

It is also becoming obvious that the infractions that companies will be sued for highly depend on the company size due to both the practices required to operate at this company scale and the tendency to invest or not in PII management. As such, here are the recommendations of what to change depending on your company size.

Global companies — GAFAMs and other big players in web marketing

Notch it down on World Domination

National companies — TV channels, telcos and other companies operating at a national level

Assign a budget on sanitizing your data management. Eventually you will have to spend this money either as a GDPR fine, or as a hack ransom then as a GDPR fine. If you do it as an afterthought, the outcome will be a superficial façade falling at the first push.
The same goes for letting users exert their rights. It needs to be built-in into your tools and processes, otherwise it will not happen when you need it.
There are a bunch of forbidden commercial practices that you cannot be having anymore, indeed.

SMEs — Smaller local businesses, startups in initial stages, web agencies but also the local doctor for example

Make sure to empower one person to be responsible for data management
Keep a register up-to-date with all PII processing and appropriate justifications

So what of our question? Is GDPR hurting businesses? There is no denying that this regulation forces companies of all sizes to assign budgets on items that do not have a direct ROI, under the threat of a fine.

On the other hand, this serves to protect companies from themselves. The principles of data security for example are forcing down a much better hygiene which will over time save big houshold names from massive ransoms and data breaches. The industry still needs to gain a tremendous amount of maturity on the topic — this is but a push in the right direction.

The same goes with commercial opportunities. Have you seen the Mad Men episode about Lucky Strike? This is a similar situation. If you can’t do it, neither can your competitors. This is not the scope of this article, but you can imagine that the damage is not so great provided that everyone respects the law.

Which all leaves us with a mixed feeling. On one hand this is acting for the betterment of European society and has important positive externalities — a safer online experience for all of us. But it is only driven by the stick. Instead of giving huge tax credits for often bogus R&D2, maybe a little help to the most vulnerable businesses could help them keep their books in order.

GDPR allows you to use different categories of justifications for each data processing. You must fall within these justifications for every processing. One of them being “consent” but there are 5 others that might help you. See my previous article on the topic.

I’m talking here about the French “CIR”, which generates hundreds of millions for mostly the same GAFAMs that are being fined here and whose economic impact is quite controversial.

3 reasons why Webhooks suck and 2 Masterclasses to replace them

Rémy — Sun, 28 Apr 2024 07:01:07 GMT

The most common way for independent services to exchange messages — even more so on public APIs — are webhooks. A beauty of simplicity: you simply provide an URL that you want to notify when an event occurs and the other service simply has to make an HTTP call. Except, not really.

Webhooks suck

In the shadow of this superficial simplicity are creeping major problems which make it hard for both ends to exploit webhooks.

Not missing a drop

First of all, the most basic prerequisite for a webhook to work is that the receiving end is able to receive. Meaning that the webservice must be up and running. But what happens if a maintenance is ongoing, a technical issue plagued the server or simply the network connection has an instant of failure?

The message will simply and purely be lost. It’s a Byzantine fault: how can you know if a message was sent if the sender is unable to contact you either way?

In order to remedy this, most providers resort to implementing retry mechanisms. Which is fairly complex to implement: you need to store somewhere that you’ll have in the future to execute a given set of messages and wake up accordingly. Most queuing systems will struggle at doing this reliably because they are working in “at least once” mode, meaning the same message could be sent twice. You can decide you don’t care but then your client has a problem on their side.

Another issue is that if you are doing a maintenance on your server, maybe you configured something wrong and it ends up responding 200 when actually the wrong service was receiving the message. In that case the message simply gets obliterated, given that the sender thinks it’s received and the receiver has no idea the message even exists.

Avoiding flashbacks

This retry logic will also amplify another danger. Messages could very well arrive out of order, and this for different reasons.

For example if you are implementing a retry mechanism but consider all messages as independent. In that case if the receiver gets unavailable for a while, they are at risk of receiving the missing messages after they started catching back with earlier messages. You can let the receiver with the burden of fixing this, but it honestly will get ignored most of the time.

What can happen as well is if your receiver operates at a larger scale and has at least 2 web servers, if two of your messages arrive at the same time and get processed by 2 different processes at the same time, there is no saying which message will be dealt with first.

Harder to develop

Now this is a more practical than theoretical consideration, but most of the time developers won’t have the luxury of a public IP address on their development machine. Which is a big problem since webhooks are actually going to have to initiate the network connection, meaning that you will probably end up resorting to tools like HTTP tunnels.

On top of that it means that your code needs to be aware of its own public URL, which you cannot really do automatically. For example if you use a regular API, you never need to declare what is your public address. But for webhooks you need to know what it is and to declare it. Often through complex back-offices or needing propagation times.

As a result you end up with an extra configuration variable which you could probably avoid otherwise, you probably also need to go through some manual configuration and on top of that free plans of popular HTTP tunnels will change your URL every time so you possibly end up changing it all the time.

There are alternatives

How do we deal with this situation better than with webhooks? First you need to realize that you are actually trying to solve two separate problems:

Knowing that there is at least one update pending — when an event occurs then your code needs to wake up and do its job, preferably as fast as possible after said event.
Synchronizing state — the final goal of this is to have different systems converge into the same state, whether it’s knowing if the user expects the light on or to get the full status of a shared online document.

Waking up remote code

The most naive thing you can come up with is polling. Every X seconds you’ll check if there are updates available. This is however considered as wildly inefficient:

The cost of establishing a connection is pretty high relatively to other options.1
You won’t get the updates “in real time” but rather only every time you poll

That’s why in most cases, polling will not be recommended and both software and hardware architecture were designed to avoid polling. If you were to simplify it to the extreme, modern computers are driven by inputs. A physical electrical signal on your network card will trigger a processing chain that will eventually wake up the relevant process, all the way down to your favorite abstraction from Python, JS or any other language.

This is what makes webhooks attractive: a remote computer can wake up your local process. But it’s not the only way to do it. If you open a network connection from your local machine to the remote API — which is extremely easy to do even without a public IP address — then as long as the connection is up then packets will be flowing both ways.

WebSockets were invented exactly for this. It’s an easy way to have a client, typically behind a NAT or a proxy, to connect to a server and receive real-time updates. That would be my go-to option for waking up remote code.

Alternatively, before WebSockets we used a technique called “long polling”. The idea is to make a regular HTTP query but that hangs for a very long time (typically minutes) until an update happens and the HTTP query returns with the message. A bit messy but almost as efficient as WebSockets if you don’t have a very high throughput and not more costly than webhooks.

When implementing this kind of technique, you need to consider that you will be maintaining one full TCP connection with every single client. That used to be a challenge, it is becoming quite easy these days if you can use an async infrastructure.

Alternatively you can turn towards dedicated services like Google’s Pub/Sub, AWS EventBridge or countless others. For example, Shopify offers webhooks but recommends notifications through AWS and Google. Kind of the same as dealing with the WebSocket yourself but you let someone else manage the scale for you.

Staying on the same page

Distributed systems are notoriously hard and I am not aware of an universal law that allows you to deal with any situation whatsoever, especially as you scale up. However it usually boils down to the same core idea — which can be remixed at will to fit the project’s needs.

Consider that your data model is a bit like a Git repository. At a point in time, the source code has a given state but in order to reach there a series of different edits had to happen. Said otherwise, if you sum up all the edits then you get the state of the code at a point in time.

So the key here will be to identify which edits happen in your model, convert them into a stream of events and re-compose them on the other side. This can be more or less difficult to achieve, for example Google Wave used Operational Transformation which took 2 years to develop but on the other hand if you’re just dealing with a messaging app your life should be much simpler.

Now imagine all those edits as a sequential log. As you read the log, you keep track of your current cursor, pointing to the latest known edit. When you are notified of another event then you need to read starting from this cursor.

This resolves a lot of issues raised earlier:

By using edit logs, your communication protocol is basically writing itself and will look fairly simple. If you’re used to Vuex or Redux, it’s basically the idea behind mutations.
The cursor allows to know where we are in the update stream. If you lost a notification because your program was down or crashed, you can catch back from your latest known state.
Even if the transmission of messages fails, you can easily have a retry mechanism to eventually get up-to-date.
There is no risk from getting the same message twice as messages are basically sequential, numbered items.

From looking at WhatsApp’s WebSocket communications, you can presume for example that they use this kind of strategy and it’s even what enables them to have end-to-end encryption with consistent shared states between participants and devices while having servers completely oblivious to the actual content of conversations.

Masterclasses

Having recently interacted with different APIs, two of them really stand out in my opinion, showing how you can make a public API that avoids pitfalls explained earlier. I picked them up because the choices they made really highlight how you can implement things in correct way while also keeping things simple.

The world of instant messaging is highly competitive, with all major players pushing their platform as hard as they can. Facebook has the two most popular platforms — WhatsApp and Messenger — however the third one is a pure player gaining traction only through their strategy2.

One part of this strategy is to have an amazing bot experience, allowing with very small developer effort to create real-time applications. This is particularly prominent in the cryptocurrency world but for example it’s also a tool heavily used in Ukraine to follow bombing threats.

The basic idea of Telegram is pretty simple. You have different conversations in which you come add messages. Then more complex things can happen like people putting reactions, messages being edited, user clicking buttons, etc. All of them are listed and documented updates.

Now the interesting part. How do you get those updates?

A first method is the webhook. As you know, it sucks. The more interesting method is the long polling getUpdates method. It combines two techniques explained earlier.

Long polling — the HTTP call will hang until either an update or a timeout happens. Not as efficient as WebSockets but very easy to implement because you can do it with literally any HTTP client ever written. And of course it works from a private IP address.
Cursor — the call takes an offset argument, which corresponds to the ID of the last message you received.
- This is a smart way to get you to acknowledge receiving the previous messages and receiving new updates in one single call.
- But on the other hand if you pass an offset of 0 then it will repeat the last offset that used. This means that if you restart your app you don’t need to remember the last offset, which is incredibly convenient.

As a result, developing a client for the Telegram Bot API is a very smooth and simple experience. All you need is a HTTP client and a tiny wrapper around it to get started. You can use a lib of course but implementing a client from scratch is a very easy task both in terms of code (no need for crazy libs) and of infrastructure (almost no constraints).

Plaid

If you never heard about Open Banking, it’s basically all the banks in the world somewhat converging into providing standardized and modern APIs for all their services. At least in theory, in practice of course the capabilities and implementation details vary greatly country-to-country and instead of a truly open standard you need to go through middlemen such as Plaid. This is not my field of expertise so I can’t go into the details but all I can say is that Plaid does a great job at converting ~~dinosaurs~~ banks into REST APIs.

They have a wide number of APIs but the one that I’m interested into is the Transactions API. The most interesting information about bank accounts, especially if you are building a personal finance app, is to see the list of transactions that happened there.

One of three thing can happen, with examples:

A new transaction happened (you bought something)
A transaction got modified (exchange rate got finalized)
Or it can be deleted (transaction was not captured in the end)

In the case of Plaid, they work a lot with batches. I don’t even want to know how they receive those transaction but if you told me they came from a latin-1-encoded CSV file dropped on a FTP every 3h I would not be surprised. As a result it’s much less real-time-ish than Telegram. It’s not extremely relevant to ship every event individually.

Instead they’ll give you a cursor — up to you to keep track of it in that case — and give you aggregated added/modified/removed transactions. Which makes it very easy to update your own database. If you just had the list of latest transactions for example, you’d have to diff the DB to know what to create, update or delete. But here you can blindly do a bulk insert, bulk update and delete. 3 SQL queries maximum and done.

The only issue I have with their system is that… It’s based on webhooks 😓

But that’s not causing much harm. Of course it means you need to setup a HTTP tunnel before developing with their API but on the other hand because they have this sync method you avoid all the other drawbacks pretty easily. You can even poll the API every day if you don’t care about being “as fast as possible”.

Take away

Webhooks suck because they bring a hoard of subtle yet annoying problems. Most queue systems are either “at most once” or “at least once”. Webhooks are “probably once 🤞🏻” and bring with them a terrible developer experience.

But what we really need to do is decouple two problems: the one of waking up remote code and the one of synchronizing state.

Waking up remote code is fairly easy now that async architectures are widespread, you can either rely on an external cloud provider or simply let people open websockets to you.

And then regarding state synchronization, most likely you want a somewhat linear sequence of events to be streamed to your consumer, relying heavily on the concept of cursors to let remote code communicate its current knowledge of the state.

At the end of the day, if you are making a public API, the developer experience is going to matter a lot and involves in the current case two main elements:

How complicated is the code going to be when using your API? The lightest the required wrapping, the least the data post-processing, the better.
How hard will the problems be to solve in terms of infrastructure? States to be kept, network flows, etc. Keep in mind that most apps start from scratch so optimize for small operations rather than world-scale conglomerates.

So if you are making a public API — for the wide web to use or simply for other parts of your company — please think well how you can make the life of your peers easier and safer!

It’s not that high, I still do a lot of polling when I’m short in time and it will make almost no difference on the result.

Not making any judgement or recommendation here. You can be pretty sure that half the secret services in the world read your Telegram messages, but it is a massive platform on which you can build many interesting things.

5 criterion to pick your front-end framework

Rémy — Sun, 14 Apr 2024 07:01:31 GMT

One thing for which yours truly is particularly glad is to have been able to partake to the development of the Web for two thirds of its history and to see all the twists and turns it has taken over the year. Now that the big platforms dominate it, that Chromium has a quasi-monopoly — except for Safari, which is a lesser version of the same thing — and that the hoops to jump are just so numerous, one can only imagine that getting into web development must be a disheartening thing.

Let us however keep an eye on the North. Imagine you were to create a major project right now, which involves a website, what route should you wish to take?

The old school would go towards jQuery
The minimalists would pick htmx
And the mainstream would pick a meta-framework such as Remix, Nuxt or SvelteKit 1

Those three voices will often be vocally contradicting each other on social media, adding confusion into the mix. Of course there are no silver bullets, only bullets that will hit the target more difficultly than others. My bullet is the meta-framework, and that’s not out of kool-aid nor fear.

For the first years of my career it was impossible for me to use meta-frameworks, for they were not invented yet. This was a painful experience. If you end up creating a real-time interactive game in which the whole DOM is dynamic, trust me jQuery is definitely not the tool for the job.

On the other hand if you are not creating something so deeply interactive, chances are that jQuery — a library whose main purpose is to even out the differences between Internet Explorer 6 and Firefox 2 — isn’t going to be of great help. Especially if you are feeling more like a backend person, a tool like htmx will allow basic interactions which require very little front-end work and might just be enough for you.

But if you work for projects of a more unpredictable nature then maybe those solutions are not optimal. It’s with the mind set on a large, enterprise-grade© project that we’ll go in quest of the perfect framework to build your startup.

Performance

A common consideration against meta-frameworks will be the performance. Surely, all those features must come at the cost of very expensive and bloated JavaScript runtimes? Let’s put that to the test. I’ll make a hello world page using major frameworks and measure the transfered payload size (gzipped for the most part):

HTMX — 16 kio
SvelteKit — 25 kio
jQuery — 30 kio
Remix — 90 kio
Nuxt 3 — 131 kio

This tells us that meta-framework do not have to be heavy and clunky. They can, like Remix and Nuxt 3, but SvelteKit is lighter than jQuery in that regard.

What about the execution speed, memory use? This benchmark is fairly popular and while it doesn’t include HTMX nor jQuery, it does have vanilla JS, which should be the closest to what you can achieve. You’ll have to explore the numbers yourself but in a nutshell even React that is often lagging behind stays pretty close to the baseline. The same goes for memory use.

In any case, those benchmarks are toys. The better question is: how fast is your website going to be at scale?

The clear winner is going to be HTMX, because all the rendering logic happens on the server, which isn’t so much bound by script size. On the other hand, all the other options will have to write numerous lines of code. If you write your project using jQuery, you will be ending up either with one big JS file, either with a collection of different files but either way, the organization will be yours. Now if you want to minify all this, you will not escape the need for a set of build tools. When scaling up, it is difficult to escape having a build stage.

And while meta-frameworks also have a build stage, this process is entirely integrated. You don’t have to do anything: your code gets transpiled, minified and processed in all the necessary ways completely automatically. With the added bonus that the build tools are aware of the dependency tree. As a result, every page can automatically be bundled in its own file, so that you never need to load code that you don’t need immediately.

From the performance standpoint, you thus have two choices:

Either you think that HTMX will never be a limit in what you are trying to achieve, in which case it is the easiest option
Or you need to have some wiggle room in terms of feature — 100% of my professional projects — and then a meta framework will be a far superior option for a relatively small price to pay

Let’s declare the winner of this round: SvelteKit!

Maintainability

While a strong argument of the web development community was the separation of concerns between HTML and CSS, you need to understand that at this time people were writing CSS directly in the style tag, making poor use of semantic and limiting greatly reusability. It is however essential to consider HTML, CSS and JS as a whole.

This is why the concept of component is everywhere. It’s the same as the widgets from UI frameworks. The advantage of a component is that, like explained above, it has a clear dependency tree, it contains all the code that is required for its proper execution and if you want to refactor or delete it you do not need to look for its bits and pieces all over your code base, worrying about side effects.

The first and most important question to deal with is the CSS, whose C stands for “Cascading”. It’s another way to say “if you are not careful, one change here will cascade into disfiguring your whole product”. You could deal with the styling of your component by writing the style into the style tag directly, or even generate all the possible styles into classnames and write the style into the class tag directly. But that is exactly what the Elders warned us about.

A more interesting approach is to define the CSS in JS, which allows you two things: first the CSS gets bundled with your JS — and your HTML de facto — and second it gets a scope wich will not overal with the one of other components. It’s just a shame to be doing this manually and to deprive yourself from tools like SCSS, which make writing CSS much easier. That’s what Single File Components (SFCs) allow and they are available in both Vue and Svelte but weirdly not in React.

Once you have scoped and bundled together your CSS, JS and HTML generation then it becomes hard to write spaghetti code. Knowing that on top of that the build system will track all your dependencies, as said earlier, this allows for an extremely atomic compilation and optimization process.

On the other hand, when using jQuery or HTMX, you will be left to your own devices. Not necessarily a bad thing but the organization becomes up to you. As soon as two developers start working on the same code base, we know that t he organization can quickly go through the window.

Henceforth, we got here two winners for scalability and team work, which are SvelteKit and Nuxt 3.

Future-proofness

If you create a startup which depends heavily on a given framework in order to work, you want to make sure that after you have invested five years of developer time you don’t end up having to rewrite everything from scratch. Those tools need to give a good perspective to developers. For example, you can still run on a Windows machine an unmodified DOS program that was written in the 80s. Without being so extreme, if you cannot see at least 10 years forward with a tool then you have a problem.

First of all, let’s look at jQuery. Almost 20 years later the API is still basically exactly the same, version upgrades being mostly about simplifying the code because browsers are converging now. If that’s the route you intend to take the rest assured that jQuery will not go anywhere.

Then HTMX. Honestly it’s hard to say anything at this point. Small project, the maintainer probably can’t promise anything. Chances are it will only add features through time but due to its nature I can’t imagine how they could introduce major breaking changes. The main risk is more that the project dies, but even then it probably wouldn’t be too hard to maintain it yourself it it came to that.

On the React side, I don’t practice it often enough to go in-depth into it but in any case the ecosystem is so vast that you could probably write another article dedicated to chosing the right React stack. In that regard, it seems like breaking changes do occur in every part of the ecosystem but overall it’s never going to be something fundamental that asks you to through your entire codebase in the trash.

Which is fairly different from what you could say about the Vue ecosystem. Vue 3 has been a major breaking release and introduced a whole new set of completely different APIs (the “Composition API”) and while it is not inherently bad it changes completely the way to think about your code. In theory you don’t have to use it but every single library of the ecosystem only supports this anymore, so you don’t actually have a choice.

Including the Vue-3-compatible version of your favorite libraries/tools (if ported), which will force you to rewrite everything that depends on it. The main one being Nuxt 3, which came out with exactly zero overlap in API or conventions. There is nothing that works the same anymore and the thin compatiblity layers that exist usually are fickle, fragile and generally useless. This feels like a serious backstab barely 6 years after version 1 — yes I have products that would cost 6 figures to upgrade and I’m pissed about it.

Now remains to evaluate Svelte’s position. While I can’t find any official statement on where Svelte will be in 10 years, some good indicators are there. Firstly, the documentation is written in a way more practical way than others, showing that they care more about the use cases than the technicality of the framework. Secondly, so far upgrades have been relatively smooth. And finally Svelte 5 is cooking and the ugprade process also seems clear. Now since the ecosystem is much smaller, it’s hard to tell what is really going to be the impact, but let’s keep our eyes open.

Overall, the only tool that has demonstrated a serious commitment towards backward compatibility is jQuery, which will have to be winner of this round!

Cognitive load

You have probably one day heard of GTD and decided to try it for yourself. One of the advice coming out of it is: if it takes less than 5 minutes, do it immediately. So tell me, how many days did you end up spending 100% on 5 minutes tasks? Did it feel satisfying? And did you accomplish anything meaningful?

Of course it’s a rethorical question and you can’t really answer so you’ll have to imagine that you said “it’s satisfying but not meaningful at all”. Which is what happens as well with many tools that we use and that we might judge on the satisfaction to use them while actually we lose our time doing boilerplate.

Imagine that you are writing an interactive component which depending on the user actions and inputs will have to update its own DOM. Doing it in jQuery can be extremely satisfying because you create all the elements yourself, find smart ways to hook events, imagine optimizations to do it faster, etc. Very fun if it’s your jam. But on the other hand if you do the same thing with Svelte, the compiler does all this automatically and better than you ever would. Managing the DOM is just not a task with Svelte.

So while it’s satisfying you just spent your time on something that should not even grab your attention. What about our contenders?

jQuery — As mentioned above, it’s all manual from the manipulation of the DOM to bundling it to the client. Easy to marvel at the beauty of your code, hard to actually focus on what matters.
Remix/React — Many moving parts and optional APIs (hooks, signals, etc). No management of CSS. Fairly complex overall.
Nuxt/Vue — Version 3 of both definitely made things more complex, with two competing APIs (including a fairly verbose one) and lots of build-time magic.
HTMX — Very lightweight front-end but on the other hand you still need to worry about the back-end yourself, so it’s a lot of unknowns
Svelte(Kit) — Once you’ve sorted out the idioms, it is fairly straightforward and requires no magic of any kind

Hence, I’ll give the round to SvelteKit!

Community and talent pool

You could find the best framework of all times, if nobody can provide for you libraries that solve common problems (UI libraries, form validation, toasts, etc), learning resources or direct support then you are going to have a hard time.

On top of this, you need to be able to hire developers. A decent developer can learn any decent framework but they need to want to work with it and usually they’ll expect that it’s going to look good on their CV.

The same goes commercially speaking. Lots of customers are interested in knowing what tools you use and if you can’t justify that it’s durable and bullet-proof they might get cold feet when deciding to work with you.

So here is what to expect:

jQuery isn’t sexy anymore, nobody needs it on their CV, nobody wants to work with it except a few indiepreneurs that claim jQuery pays for their lambo. Customers who look into your tech stack will ask you if jQuery is a headless framework and it’ll be hard to say yes.
HTMX is straight away a no go. It’s too small to put on a CV and not trusted enough to fuel a sales pitch. Using it in a professional setup right now will require a serious offset on other parts of your stack.
React is a no-brainer. Everyone wants it on their CV, it’s backed by a major player and the community is one of the largest for a framework of that kind. Plus, most companies that publish SDKs or component libraries will prioritize React.
Vue is the new React, except less popular, less demanded and less clunky. Still a workable option.
Svelte is on the line. But it receives a lot of developer love (like HTMX you’ll tell me) and is now backed by Vercel which not as big as Facebook but definitely big enough to make this serious. Plus Svelte 5 is promised to resolve all pending Svelte issues, which should boost adoption.

Basically I have to give this round to React (hey at least one).

Conclusion

There are a million of other items to consider and of course many more frameworks than those. But in my experience those themes are the most important to consider when picking a technological stack.

So which framework should you use?

In the absolute, go with SvelteKit. Provided that the version 5 of Svelte doesn’t turn out to be a major betrayal of the community. It’s lightweight enough, scales well with teams, focuses developers on what matters and should not ask you to rewrite your whole application in 2 years — especially if you waited for version 5.2

Yes you will get more difficulties finding talents for it but on the other hand it has good press and you can learn the bulk of it in a few hours. There are of course a few rough edges but nothing insurmountable. Same for your sales pitch, you can always make diversion by using a hot headless CMS.

Another interesting option if you were building your own startup with a Craigslist-type UI would be to use HTMX. Besides the talent pool, the main thing about using HTMX is that developers need to have the final word on UX/UI, because otherwise you will be fighting against the framework all the time. But if you can keep it constrained, you’re probably going to have a very efficient experience.

Now regarding other meta-frameworks, we see that they all come with dealbreaking drawbacks — namely React being a huge spaghetti bowl and Vue being a traitor. Not that you can go particularly wrong with them, but it’s just not good choices in my opinion. If you are really afraid about what others might think you can always pick some assembly of React things, but be warned that it comes at a cost for your mental load and the one of your browser.

Finally if you are team jQuery, of course it’s a safe and proven choice which will continue to support you for the next 100 years I’m certain. Which is more than you could say about anything else on that list. It’s probably good if your app is small and you have few resources to invest in the front-end.

So while there are indeed no silver bullets, it’s important to outline that given the current state of the art you are much safer and much better off with a meta-framework, especially if it’s SvelteKit. Other options exist and can be valid for some use cases but as someone that oversees the production of many front-end applications I can only recommend to stick with a paradigm that covers all angles at minimum cost.

If you’re wondering why Angular is not part of this comparison while it definitely has the credibility to fit in there the answer is twofold. First of all, they invented backstabbing in the JS framework world. And since they backstabbed me, I didn’t get any experience with it so it would be very hard for me to talk about it. All I know is that it has no chances of winning according to the evaluation below.

If Svelte 5 also ends up in treason, then I’m not sure what advice to give. If treason is acceptable then Vue is a superior option due to its large popularity while avoiding many of React’s pitfalls.

The inevitability of Magic Quidditch: when Mixed Reality meets muggle sports

Rémy — Mon, 08 Apr 2024 07:01:42 GMT

When 11-years-old me received for Christmas my first Harry Potter book — and in spite of my utter lack of interest for team sports — it didn’t take long before I started drawing up ideas for a Quidditch simulator, with a 360° screen, blue/red glasses and a mechanical arm to slam a ball in your face. After concluding that I didn’t have the skills nor the funding to start such an endeavour I let the project on my shelves and moved on forever… or so did I think?

“Recently” — 2016, be the judge of that — I was jogging around the Parc de Vincenne in Paris and to my surprise was seeing multiple teams practicing a sport that didn’t quite look like anything I knew and even less that I would imagine gathering crowds. Turns out that there is a real-life muggle quidditch (apparently quadball now, copyright yay) that has rules and that can physically being played. And that you don’t win just by catching the Golden Snitch apparently, sorry Harry.

Which leaves us on one hand with a sport that is surprisingly well-structured — with a world cup, leagues in every country and more than 10,000 players if you believe the numbers — and on the other hand with the Mixed Rea… Spatial Computing finally emerging to a usable point. Is it time to bring a little bit of magic to this world?

The map

For those who don’t know Wardley Maps, it is an analysis framework based on military strategy and applied to business. There is a long boring book of 700 pages (or a 15h-long audiobook if you prefer) to explain it in details with hard, scientifically-studied mechanics and rules. There are simpler resources to get into it, which I definitely recommend.
In essence, it allows you to place on a 2D map business concepts and to know how they are going to interact with each other. This allows you to predict what is going to happen and when it’s going to happen with much more precision than reading tea leaves or hiring consultants.

Let’s go back to our current topic. Here is the map of Magic Quidditch, if you want to watch it from a stadium near you.

Wardley Map of the Magic Quidditch

The map allows to clearly see dependencies in space and one of its most practical properties is to help you decide whether you build, buy or outsource something. By looking at the connected and close-up sections of this map, we can decide to group different parts and how we can approach them.

Make Yourself — As a magic provider, this is what you will want to develop yourself
Partner Up — A section of utmost importance in your value chain but that you can’t possibly make from the ground up yourself
Leave it to Apple — Apple and other big actors are already pushing this section hard enough so you don’t need to worry about it
Muggle Quidditch — There is already a vast community of quadballers, players and spectators. While it’s proving interesd, it’s also going to be a challenge to onboard them

So how do we get started?

Leave it to Apple

What has been abundantly obvious to me since swarms of people were chasing Pikachus in all the streets over the world during the summer of 2016 is that reality is about to become a much more flexible concept. If bars were paying to lure Pokémons — and the trainers that sought them — how are those creatures not real? It was involving real money with real people and real consequences.

There has since been a large number of devices and detours, from the Google Cardboard to the Metaverse madness going through by the Magic Leap and countless other attempts. All of them failed to produce a convincing experience until the Apple Vision Pro came around a few weeks ago. It’s just an increment on top of it all, but definitely crosses a threshold.

It is however lacking one thing. Niantic — the maker of Pokemon Go — understands that what makes reality real is that it can be shared between people. And while they released their framework Lightship which runs on several devices, it’s still not available on the Vision Pro. Which in turn is really focused on the hardware and not so much on its applications.

To summarize, while we’re not exactly there yet, the convergence is painfully close:

Digital worlds can be overlayed to the physical reality and shared between different people in real time
All the subtleties like eye tracking, mobile compute power, energy supply, etc are solved at an acceptable level today
Devices are increasingly able to merge physical and virtual, with different techniques and qualities

While we still need to see all of that in a single affordable device, one can project this to happen in a matter of years of not months. As a reminder the Apple 1 was released for $666.66, which is about the price of a Vision Pro adjusted for today’s dollar.

Knowing this, we know one thing for certain. This will enable a whole range of applications which were not previously possible and open a completely new industry. And when you know how much stadiums and entertainment parks cost, it’s not unthinkable to imagine them as the first clients for “Spatial Computers”.

Muggle Quidditch

In the Harry Potter lore, muggle means non-magic. Given that we are indeed forbidden the world of wizzards, thousands of people joined hands in nonetheless adapting Quidditch to our lowly muggle world. And as explained in introduction, you will find them training and competing in various locations all around the globe.

The fact that an entire generation — my generation — decided to simply ignore the limitations of the physical world because they would much rather live in a magic universe is simply incredible to me. There are a lot more of us nerds out there than I previously thought!

Which turns out to be a great news for any purveyor of magic — or the next best thing, Mixed Reality. And when magic becomes real, what better candidate than the most famous magical sport?

The existing base of leagues, players and public is an incredible starting point for such an endeavor. It’s not like if you pulled a new sport ex nihilo: we know that it exists, with the potential to be pushed much further.

The issue is that if you want to make Magic Quidditch you’ll need to be very careful how you approach this. The spark that ignited those players came from the Harry Potter lore but what feeds their fire today is the sport as it is. As such, an absolutely crucial aspect to Magic Quidditch will be to change how spectators perceive the sport without changing one bit how it is played today.

Beyond this if you were to pour a lot of money into a sport that is entirely amateur today it would probably become divisive for the community, create tensions that don’t exist today, progressively replace the profile of players, etc. I’m not an expert in sports but given the corrosive power of money you can expect something will get lost on the way. Does Quadball deserve such a frustrating makeover? The exercice is left to the reader, but what is for sure is that you will need to gain the adhesion of the community before anything else.

Partner up

Since we don’t want to change how the sport is played, how are we going to inject magic in it?

We’ve talked about the downside of money, but it also comes with a lot of upsides. Including in the present case, all professional sports have been saturated with analysis technologies which can tell you everything going on in the field in real-time. For example Hawk-Eye will give you stick figures of all the players, exact position of balls in relation to delimitations, etc. It could probably do all the referee’s job automatically.

So here’s a dillema. On one hand nobody has Quadball tracking, on the other hand those people are a few million dollars closer to it than you are. The most logical thing here would be to partner up. Pay them to do the Quadball version of their software and then simply use it.

Make it yourself

At this point we are outsourcing the magic, the public, the players and the tracking of them. So what is left for us to do?

The idea for a match is simple. You gather everyone in a stadium, you put the players on the grass and you let them play as they would usually play. Then a tracking system gets all their motions, translates it into 3D and the public sees digital avatar of players flying on their broomsticks over the field.

Two obvious components emerge from this:

The “tracking to magic mapping”. You’ll need to figure an engine that understands the game from what the tracking system reports and converts it into a scene with wizzards flying all over the stadium, in three dimensions (versus basically two in a Quadball game).
And of course the rendering of that scene. Which will have to run in the headsets directly so that everyone can see it from their own perspective.

Given the quality of render and actual gameplay of games like FIFA/PES and friends, while this certainly is going to require a lot of work, this seems to be an expensive but safe challenge.

There is however another item that requires attention, which is the connectivity. Indeed, streaming this to a stadium at full capacity is a technological challenge that is yet to be seen accomplished. So many clients would completely tear down a WiFi network and 5G deployments are actually identical to 4G in most cases. Theoretically speaking however, you could do it with either properly configured 5G — which remains to be seen outside a lab — or with a finely tuned WiFi and strictly multicast data streams.

In short, there is a certain amount of cash to burn in order to even reach the first decent demo — with pratical challenges such as the massive connectivity required — but this seems quite achievable.

Conclusion

Quidditch is a long-term dream of a whole generation, which manifests itself by the fact that people are actually playing it today. On the other hand, when you see that billions are being poured into Mixed Reality/Spatial Computers/etc, it is only a matter of time before technology becomes capable of magic.

As such, Magic Quidditch is inevitable. It is bound to happen and has the potential to be the first mixed electronicotraditional sport.

All the technology that you need to build it already exists, it’s mostly a matter of sticking all legos in the right order. Once you have it, all that is left is gather all the public into a stadium, stick a Spatial Computer in front of their eyes and let them enjoy the show.

That computer is not yet available in the quantity/price that would befit such an event but you can be sure that by the time you are done building your project this situation will have become absolutely acceptable.

And for as much this Reality is Mixed, so are my feelings about the upcoming era. We are about to go through the most incredible experiences that Humankind ever created — only bound by our own imagination. Yet few of it will be physical. What will this new reality cost to our civilization?

Confront your greatest fear and parse a string with a Regular Expression

Rémy — Mon, 01 Apr 2024 19:04:38 GMT

Regular expressions are a scary thing and can take quite a while to be digested — even for mid-level developers. Many useful tools such as regex101 that will decode the syntax for you or debuggex which will let you visualize the expression as finite state machine.

But nothing like putting your hands in the dirt to understand how something really works! Something that eluded me for years is how you could parse a string — and in particular if there is an escaped quote in it?

Let’s start with the beginning. We want to match a basic string. Regular expressions are expressed in JS.

# To match
"hello"

# Regxp
/".*"/

The structure is simple: first a quote, then any character any number of time, then another quote. But in real life you’re probably working on a parser. For example:

In this case the regular expression is going to get greedy and return "bar" bar="foo", which is not what we want.

The first trick is probably to tell the regular expression not to be greedy by using the ? symbol.

# To match
"bar" bar="foo" />

# Regular expression
/".*?"/

That’s fine but now if like in most cases you want to allow your users to have quotes in the string by escaping them, you’ll be out of luck. This for example will not work:

const name = "Dwayne \"The Rock\" Johnson";
// Will match: "Dwayne \" and " Johnson"

This part got me perplexed for the longest time. There are different ways to solve it, my personal favorite is to consider what we want to allow within our string. Namely:

Any character that isn’t an end quote is fine: [^"] in regex language (^ is for not)
Any escape sequence — aka something that starts with a backslash: \\. in regex

Since we don’t want the first match to eat up the second match (afterall a “backslash” is “not a quote”), we’ll make sure to put them in the right order so that the matching can happen easily.

# To match
const name = "Dwayne \"The Rock\" Johnson";

# Regular Expression
/"(\\.|[^"])*"/

And that’s it! You are now matching an escaped string. Not that scary anymore?

Let’s study the second method, that I’ve found inside of Lark (amazing package by the way). It’s both simpler and more confusing and does not work with older JavaScript engines, but let’s go into it.

Essentially, if you say that “escaped quotes must not terminate the string” then it means that “the last quote of the string can’t be escaped”. That’s something we can easily check with a negative assertion:

# To match
const name = "Dwayne \"The Rock\" Johnson";

# Regular Expression
/".*?(?"/

The novelty here is that instead of just having a non-greedy match-all ( .*? ), we’re adding at the end an assertion (? to check that there is no backslash before the end. This has however a drawback, it’s that you can’t escape a backslash right before the end of the string, because then the last quote would be preceded by a backslash (still with me?). In short, this doesn’t work:

const effect = "Domino \\";

But fortunately, we can allow to terminate the string with quoted backslashes!

# To match
const effect = "Domino \\";

# Regular Expression
".*?(?(\\\\)*?"

And here we are! Matching strings another way.

Subscribe now

Let’s hope that this problem-oriented walkthrough helped you understand relatively advanced thought patterns in regular expression. Often you’ll walk on problems that can seem intractable without the proper knowledge but which can easily be unlocked if you master regular expressions — or better even: parsers! But that’s for another article.



From Chaos to Clarity: Streamlining End-to-End Testing with Django and SvelteKit
Rémy — Sun, 10 Mar 2024 08:00:44 GMT
For reasons listed in my Model W Architecture document, my framework of choice for the backend is Django (tldr; the ORM) and until another better option emerges in the world of JavaScript this is not going to change. On the other hand, my experience has shown that if you do a professional website you will eventually outgrow the htmx and other lightweight frameworks, making it a necessity to systematically turn towards meta-frameworks such as SvelteKit, Nuxt, Nuxt.js or Astro — to quote the most famous.
This is what we’re systematically doing at WITH and the combination works well. But you absolutely must figure ways to align all this properly — and there are no official ways to do this.
Thanks for reading Baby CTO! Subscribe for free to receive new posts and support my work.
Today we’re going to explore one specific friction point: end-to-end testing.
Why to test?
Some will tell you that you need to cover 100% of your code base with unit and e2e tests while others will say “testing is doubting”. So while we’re not here for a theoretical lesson on the benefits of tests, we are goingt to focus on why we would want to have those, which in turns allows us to decide what we want to test.
The speed factor
First, nobody gets the code right the first time. Personally with my 20 years of coding I think that once I managed to land about 1000 lines of code that worked on the first time, while being extremely focused on what I was doing. The typical development cycle looks more like: write a bunch of lines, see where it breaks, repeat until it works.
As a developer, you will learn to code faster and with less mistakes over time but there is nothing you can do about it right now. Just code more and it will sink in. This leaves you with the second part of the process: how fast can you see where it breaks?
Obviously the answer to that question is largely dependent on what you are currently testing. If you’re talking about CSS, then a second screen with the page you’re currently integrating along with a good meta-framework that implements HMR properly should be the easiest way to go.
On the other hand if you’re creating Django models and/or APIs using DRF, a lot of the code that you are going to write is going to be declarative — only to be later picked up by the meta functions of Django and turned into a usable project. Which means that there is literally no code for you to test, it’s mostly configuration1.
But if you are working on the typical front/back architecture that we’ve discussed earlier, most of the things that you’re ever going to want to test in an automated way are the end-to-end user stories.
If you test those manually, you will be clicking on many buttons and filling up many forms. On and on again. For test cycles of 30 seconds to 5 minutes usually.
On the other hand if you automate those tests you can probably drop the testing time to a couple of seconds. We can estimate that on average it’s going to be about 10 times faster than manual testing.
Now let’s consider the following simple equation:
Let’s consider that:
The time spent testing manually is equal to the time spent coding
The automated test is 10 times faster than the manual test
Even if you don’t understand the math formalism, you understand that in the end testing your code automatically while you develop is almost twice faster. The bias here of course is that you still need to write this test. That’s why we’ll explore tools that make this as easy as possible, so that the benefits are not swallowed by the plumbing.
Overall it’s hard to quantify exactly how much productivity gain2 we’re talking but it should help you go about two times faster — and in the worst case scenario it seems unlikely that it will be slower than testing manually. More importantly we’re just talking about the immediate benefits of testing.
Ease of mind when changing things
Any application that lives long enough will reach the point where no single human brain can comprehend the entirety of its features at the same time. There are just too many moving parts. And this point arrives much sooner than you think, especially in environments like mine where people move from project to projet all the time.
Essentially: how do you know if something that you change will break anything in the project without testing everything? Leading to the subsequent question: how do you even know what to test?
The answer is that you cannot know what broke if you don’t test it, so indeed you have to test everything. Which can be done with for example a large testing booklet written and maintained manually — aka not — or also with automated tests that run every time you push  your code into the repo (and on your machine while you dev).
The second option is absolutely better in the sense that:
If all the tests are written, it will be exhaustive
And since it’s all automated, each test should be extremely fast
This way you reduce a QA process to a few seconds of test instead of potentially hours of man time spent. With the guarantee that everything is executed in stable conditions and in a repeatable way.
Onboarding of newcomers
Overall tests will show you how to use the app and how to use the code. All a newcomer has to do to understand everything that you can do with the application is to watch the tests unfold.
Let’s note that this is partly true because tests will often be cryptic and hard to document. A better way to approach this topic is with BDD and — spoiler alert — pytest-bdd. But that’s for another article, we are focused here on the Django/Svelte integration.
Picking the right tools
While I am not going to list every single test runner and framework out there — that would be an entirely different article — here are the constraints I’m settings for myself in this quest for automated tests.
The first aspect is that Django-based tests have the ability to write directly into the database, which is in turn cleaned up after each test. When your application is essentially just transforming a DB schema into an API, that’s really something you want to be able to do. Without that you’re in for some very awkward mocking. The core idea is thus to run tests from Django — I even considered wrapping Django’s tests from JS but in the end that was not necessary.
The default test framework in Django is the standard unittest, and while honorable there are more friendly and powerful options out there. Namely pytest, which as you will see right below will be the backbone of our strategy. The first thing is to integrate it with Django’s tests and this happens with pytest-django.
The main issue however that I have with testing in Django is that, while it has a LiveServerTestCase (and the pytest equivalent), it kinds of wants you to use Selenium and no offense for that precursor tool but oh boy is it unusable. Last time I wrote e2e tests with Django and Selenium I ended up writing more utils than tests.
Thankfully things have changed and we are now able to use Playwright through the pytest-playwright plugin. While I don’t particularly like Microsoft I must admit that it has two very interesting characteristics.
Firstly it has very semantic selectors which will use accessibility attributes in order to find elements on the page. This is great because while you test your features you know that if you don’t have to resort to crude CSS selectors it means that at least what you test looks more or less decent in terms of accessibility.
And secondly it has an auto-wait feature on all the selectors, which is by far the most annoying thing that you end up doing all the time with Selenium.
To summarize, we’re going to go with:
Django itself and its testing facilities
pytest as test runner
pytest-django for the Django integration
pytest-playwright for the browser testing
Implementation time!
In order to proceed to demonstrate how all those tools work together, created a sample project on GitHub which contains mostly the boilerplate that you will need along with an example of how to use everything together.
The project is extremely simple in itself: there is one model that is exposed through an API with one page that displays all the instances returned by the API. Really just the bare minimum to write a test that shows all we discussed above.
Lots of small details are going to be left out from this explanation that focuses on the big picture. The source code being entirely available, any shadow can be lifted by inspecting it. If you intend to run the project yourself, have a read at the README.
Boilerplate
We’ve got two projects which are fairly close to default Django and SvelteKit projects located in the api and front folders.
API
Let’s first have a look at our dependencies. Quite obviously we’ll see there Django alongside its best friend DRF for the API management part.
On the testing side we have 3 plugins on top of pytest:
pytest-django — Takes care of the Pytest/Django integration, and specifically takes care of managing the database and live server
pytest-playwright — Integration of Pytest and Playwright in order to be able to test things within a browser
pytest-env — Small utility that allows to define environment variables when Pytest runs, which is super useful if like me you follow the 12 factors philosophy: it allows to have a static configuration for running tests.
Since we’re talking about end-to-end tests, I figured that it would not necessarily make sense to pin them to a specific Django app and rather I’ve created a dedicated test folder for it.
In order to be able to run the tests, you need to make sure to configure the settings modules and the environment in the pyproject.toml file:
[tool.pytest.ini_options]
DJANGO_SETTINGS_MODULE = "e2e_django.settings"
env = [
    "DJANGO_ALLOW_ASYNC_UNSAFE=true",
]
Front
Honestly I’ve changed nothing to the front-end except create the page that displays the thing we want to test.
Front/API sync
The part that was elusive to me for the longest time was: how can I synchronize the front-end and the back-end — especially in regards to the database management that I’m mentioning earlier.
Turns out, with a little bit of eblow grease and pytest magic it’s fairly easy.
First we need to talk about pytest’s fixtures. If you’re a Django developer you probably hear “fixture” and think “right to load data into the database”. But it’s not that at all. They are a mechanism of dependency injection specialized for tests.
For example you could say: I have a “user” fixture that is a user from the database and that is scoped to each individual test. If a test requires the “user” fixture then the user will be created into the database and will be cleaned after each test.
Both the Playwright and the Django plugin use them heavily for giving you access to their various features. Typicall if you ask the page fixture for your test then Playwright will be started but ohterwise it will not.
The same applies for the live_server from Django and in our case we’ll be able to leverage this in order to start and stop the front-end while testing.
This can be done relatively easily if you exploit the fact that both the front and the API are in the same repository. You can compute accurately the absolute path of the front-end and start scripting there.
Which is exactly what the front_server() and its friends are doing in the conftest.py file — a file that can inject global fixtures into different tests under the same module. While you can read the source code directly, let’s review the key points:
We use Popen to start the Vite server in preview mode, which is close enough to production for our needs. A fixture can just yield an object, and the function will suspend until all tests that need it are done. This is what we do, and after the yield finishes we just shut down the process.
The process is bound to port 0. This is a special way to tell the system “just pick any available port”. Which allows to not have to decide for a static port number thus limiting the risks of failure. The Vite server will print the chosen port when starting, so we just parse stdout to get it.
In the end we simply yield the base URL of this front-end server and then our tests will be able to connect to it in any way they want.
This example is done with Vite because that is what powers SvelteKit, but while the detail of the commands you would have to run would be different there are equivalents of this in every single front-end framework so you’ll just need to adapt it accordingly.
Writing the test
Now that we’re able to summon the front-end (through the code above) and the browser (through Playwright) it’s time for us to write a test!
Be careful, this is actually very disappointing because it’s way too simple. First we create the items that we want to see through a fixture:
@pytest.fixture
def some_items(transactional_db):
    return [
        Item.objects.create(name="Foo"),
        Item.objects.create(name="Bar"),
    ]
Now we create a test that requires 3 fixtures:
front_server — The server we’ve created above
some_items — The items defined here
page — The Playwright control object
@pytest.mark.django_db(transaction=True)
def test_content(front_server, some_items, page: Page):
    page.goto(str(httpx.URL(front_server).join("/")))

    for item in some_items:
        item_name_escaped = repr(item.name)[1:-1]
        assert (
            page.locator(f"li:has-text('{item.id}: {item_name_escaped}')").count() == 1
        )
This way we’re able to send the browser to the front-end and check the content of the page based on the expected items we’re looking for. That’s it!
Running the GitHub Action
If you’re making automated tests, it’s usually a good idea to run them automatically. Fortunately it’s really easy to do with GitHub Actions. We’ll define a workflow that triggers on push.
Beyond the installation of dependencies, let’s check some interesting steps of that workflow:
- name: Run tests
    run:
        .venv/bin/python -m pytest --junitxml=/tmp/test-results.xml
        --tracing=on --video=on --screenshot=on
    working-directory: ./api
When running the tests, we keep the results in JUnit format and ask Playwright to record pictures and videos of all tests. Let’s note that if your project scales up you probably just want to record failing tests and not all tests, otherwise you’ll eat up artifact storage pretty quick.
- name: Publish test report
    uses: mikepenz/action-junit-report@v4
    if: always()
    with:
        report_paths: "/tmp/test-results.xml"
        check_name: "API Pytest Report"
Since we’re able to export the outcome as a a JUnit file, we use an action that transforms it into a nice recap for the action.
Test summary from the GitHub Action (only visible if you’re connected)
- name: Keep Playwright artifacts
    uses: actions/upload-artifact@v4
    if: always()
    with:
        name: playwright-traces
        path: api/test-results/
Finally, we’ll save temporarily the Playwright videos and screenshots into a GitHub Action artifact, which allows to analyze in-depth failed tests (for example using the online Trace Viewer).
Conclusion
After establishing that automated testing is well-worth going through the trouble of establishing a well-oiled testing infrastructure, we set to explore how this can be accomplished with Django and a Javascript meta-framework such as SvelteKit.
While this requires a little bit of boilerplate and adaptation — after all, those two worlds are not exactly thought to work togeter — we can see that we can obtain both the convenience of Django’s tests with their database management and the power of modern front-end test frameworks such as Playwright.
In the end the tests run completely autonomously on GitHub Actions and produce both nice reports and in-depth traces that allow analysis in case of failure.
This whole structure is easy to use on a daily basis and can boost your coding speed up to two times!
1For a broad meaning of configuration. And of course you can write specific functions and algorithms in the backend, for which the use of unit tests is perfect. But the vast majority of the code you’re writing in a Django project is actually written by Django. Which is why I like Django.
2If anyone has heard of a valid experiment on the topic, I’ll take. What I’ve found is mostly studies on 12 subjects so I’m not going to take that as too solid.


Entrepreneur 101: increase your brain power
Rémy — Sun, 03 Mar 2024 08:00:59 GMT
The drama of our brain — and especially with ADHD brains — is that taking any single decision will eat up some mental energy that you will only recover when getting proper rest. And this whether you are taking a critical life-or-death decision or decide which item to get from McDonald’s. On the other hand, being an entrepreneur requires so much decision-making that it would give ADHD1 to neurotypicals.
So how do you increase the amount of decisions you can take? In a way you can’t, because of what I said. But on the other hand, you can by focusing yourself on the decisions that matter.
Thanks for reading Baby CTO! Subscribe for free to receive new posts and support my work.
The question now becomes: how do you automate the decisions that do not matter? Some call me psycho, but here are the strategies that I’ve applied throughout my life to avoid useless choices.
Time management
Managing your time and priorities is a daunting task. A million things more urgent one than the other try to grab your attention. And if you’re like me, most likely you will just hyperfocus on the fun thing of the moment and forget completely about the rest. Which for the longest time has been my time management technique: just focus on one thing at a time, for one month or more straight.
And if you are bootstrapping it’s probably fine even though it can hurt you in the sense that you get completely blindsided by what you are doing and never take the time to do basic things such as checking for competitors, exploring advices that you’ve received or even simply looking for solutions that are not 100% in-house development.
Instead, you need to get a scheduler such as Reclaim, which is now my god and master, commanding my actions through my agenda.
The first thing it does is take a todo list and schedule it. If I’m able to break down the things I have to do into small, actionable tasks (that can later be extended or shortened) then it will automatically schedule all of them around my meetings and other obligations, while also making sure that everything moves in parallel and in accordance with their respective priorities and deadlines.
The other thing is that it allows to plan 1:1 meetings with your team. As you become a manager it’s essential to talk to your developers regularly and if you’re like me with no notion of time then 6 months can happen without realizing you didn’t follow up on anyone from your team. It will find the time for this in your agenda and make it happen.
Lastly it will allow you to have various rituals like checking your mails or monitoring tools. Super useful to keep an eye on things regularly while also working like crazy on other stuff.
Overall, Reclaim has massively decreased the amount of decisions that I need to take on a weekly basis, allowed me to multitask while it was previously impossible for me and also allows me to have an idea of when I’m going to be able to deliver a specific task.
Food
Don’t get me wrong I am a huge foodie and in fact spend most of my holidays hunting down restaurants and bars to discover local tastes. But let’s not get confused: there is eating as a hobby and there is eating because otherwise you’ll faint on your keyboard. I’m talking about the second one here.
Depending on your personal preferences, revenue levels and overall situation, different tactics can apply. For myself I’ve had the following ones:
Frozen dishes. The basic version is to  go to your local frozen foods supermarket (Picard in France), stack up a month’s worth of food in your cart and store this in your freezer. While highly criticized by friends and roomates for this practices, I believe that it is a relatively balanced and extremely efficient way of managing your meals. Not to mention cheap, since I’ve basically been going through my student years and early entrepreneur stages with this technique.
Food subscriptions. In the same vein but definitely tastier and healthier you will see subscription services like Wetaca in Spain. Did not always exist, not sure how easy is it to find over the globe, but overall a cheap and efficient way to get relatively balanced meals. It’s what I’m currently doing most of the time and it’s great, they even change the menu for me and I don’t need to decide anything.
Recipe round-robin. Entrepreneur life has highs and lows and there was a short period during which I had to dip below the poverty line. Not a great time but probably the healthiest eating habits I’ve ever had. What I did was create about 5 to 10 recipes that took 5 minutes to cook then go to the market every week-end to buy the products directly from the producer and then cook just those recipes on repeat.
Fast-food obsession. My theory about fast food is that it’s only good if you can eat it every day of the week at every meal. When I lived in India I used to work in an IT building where all there restaurants were of a mediocre quality and it was becoming quite hard to alternate between the same dishes all the time. Eventually I ended up eating McDonald’s for every literal meal from breakfast to dinner. Never lost so much weight so fast for some reason. Anyways, it’s not a healthy option but if you don’t abuse you can get through a hard year or two without issue using this strategy.
Overall, it’s important to place your bets in accordance to what works for you.
Transport
So surely you’ll want to move around. For example to go to work, to meet clients, etc. Let’s see how you can avoid useless pain in the process.
The first strategy to optimize transport is to not need transport. Start your company from your home, work remotely and save up on an office, etc. Can be daunting for the mental health, but for a time it is a good fix.
Another important point — except if you’re american I guess? — is to live in a densely populated area. By having all necessities close-by you avoid yourself complicated trips and can instead walk there most of the time. Which is especially important for the next point as well…
Don’t have a car. Cars are commonly known as money hogs because of the price you need to pay and on top of all the mechanical issues you’re going to get with it. People reading this claim that this statement is due to the fact I don’t have a driver’s license — and it is linked — but it’s also that up until recently I could not afford one in terms of costs and in terms of mental load. Instead, public transportation, taxis or car rental can often be a more economic option and a much lower mental load. Of course, YMMV.
Shopping
During my early years as an entrepreneur, the Christmas period was quite a stressful one. On top of being generally a very dense period in terms of work — client’s budgets need to be spent before the end of the year — you also need to think which gifts you’re going to get for your family. And when you pay yourself with one peanut on a half, the issue becomes quite the balancing act.
You basically end up going around all the shops of the city  center fishing for ideas and taking notes of what could potentially be the best combination of things you could generate.
This was an issue until I realized that all I had to do — and sorry if it sounds obvious to everyone now — was to shop online. Instead of physically spending your energy walking a hundred thousand steps over your Saturday afternoon, you can simply switch from tab to tab, order and wait for things to arrive home.
These days I’m almost never buying anything in physical stores. Which can be pushed even further with delivery apps. Since they allow you to search items by name it gives you an index of all the objects and brands you can find in various stores. Want to drink your favorite kind of beer within the next 30 minutes and you don’t know where to find it? Type the name in Glovo and you’ll get a completely random kebab shop open at 2am which can ship it to you.
Administrative emails
If you’re like me you probably hate replying to emails, especially those that have few connection to your actual business. Administrative stuff, tax declarations, getting refunds from an e-commerce, etc. But I’ve managed to lower the cost of doing so substencially by using ChatGPT in a strategic way.
First I’ll take the email/message and ask ChatGPT to summarize what it says, figure what they want from me and give me options of what actions I could take.
Then I’ll pick an option and ask ChatGPT to reply accordingly. I’ll say it in my words, not constructed, not polite. And then it will write an nice long polite answer that says exactly what I need. Copy/pasta, send and done. Such a relief to proceed this way!
Entertainment
It’s important as a founder to change your mind from work regularly, otherwise you’d just become completely crazy. In terms of entertainment however, everyone got their jam so I’m not going to get super exhaustive, but let’s throw a few ideas.
Obviously everyone is going to have some kind of VOD subscription, but I also like to structure my entertainment around no-decision options.
It’s going to sound dumb, but TV is a good option. Especially paid cinema channels. Turn on your TV in the evening on a channel that you know you enjoy and you’ll for sure see something fresh wihtout spending any neuron on chosing it. On top if that it can rythm your day if you work from home and give you a motivation to stop working.
Another thing I like to do is to show up in big bobo cinema in the heart of Paris and picking a movie randomly on a combination of what is showing in the next hour and what posters are catching my eye. No reading synopsis, nor thinking what the movie is going to be about… I’ve seen amazing movies like that, totally recommend.
Conclusion
You could read this article thinking: why is idiot flaunting his lazyness over the Internet, but that would be missing the point. Throughout my life I’ve had to deal with a termendous workload and fairly low executive planning skills. You cannot do it all. And yes a lot of the techniques mentioned here are individually obvious if you start considering the issue but what matters most is to create for yourself a consistent way of life with the right combination of everything that makes your life as unloaded as possible, leaving your brain free to think what will make a difference in your business.
1ADHD is defined by the symptoms and not the root cause. So if your life gets overloading, you can develop ADHD just because you have too many things going on.


Hands on! Parse your emails with Google's Gemma
Rémy — Sun, 25 Feb 2024 08:00:59 GMT
As I explained in my previous post, LLMs are not good at everything but they’re particularly good at parsing information and transforming it into another format. It’s a technique that we use everywhere in ChatFAQ for example.
Today we’re going to dive into the code that lets us do this. The goal is simple:
Thanks for reading Baby CTO! Subscribe for free to receive new posts and support my work.
First we’ll classify the email to know what kind of email it is. Yeah I said that it’s not a great idea because LLMs are not super performant for that. I’ve tried my best, it seems to work well with GPT-4 and more or less decently with Gemma.
And then for each type of email we’re going to extract a JSON which tells us in a machine-readable format the content of that email.
I’ll walk through the main elements of the code, if you want to follow up with the completed project it’s all on GitHub.
Also, of course there are many libraries and frameworks and whatnots to help you do this in different ways, but we’re here to learn so we do all by hand today.
Buckle up and let’s go!
The flask app
In order to do this, we’re going to make a Flask app which exposes:
A basic page that allows to upload an email in the .eml format (what you get when you “Download this email” from Gmail for example).
An API which for a given email gives you the semantics of it.
We’ll do that in a very small src/semmail/app.py file.
First, a super basic view which is just a form that will call the API when submitted.
@app.route("/")
def home():
    """Just a dumb form where you can upload a file to the API"""

    return render_template_string(
        """
        
        
        
            Upload File
        
        
        Upload File
        
            
            
        
        
        
    """
    )
Then the API itself. We’re going to expect that there is a function that takes an email as input and returns the parsed output.
@app.route("/upload", methods=["POST"])
def upload_file():
    """After a very basic validation of the file, we put it through the LLM
    so that we can know what it's about."""

    if "email_file" not in request.files:
        return jsonify({"error": "No file part"}), 400

    file = request.files["email_file"]

    if file.filename == "":
        return jsonify({"error": "No selected file"}), 400
    try:
        file_content = file.read()
        result = interpret_email(file_content)
        return jsonify(result), 200
    except ValueError as e:
        return jsonify({"error": str(e)}), 400
In the middle of the boilerplate, you’ll figure the interpret_email() function. For now it can just return a static structure, we’re going to implement it in a moment.
Calling the AI
Because in the project I want to have an OpenAI and a Gemma implementation I’ll have two modules in the GitHub but I’ll only cover Gemma because it’s the hot new kid that everyone wants to play with.
Make the instance
The first thing you need to do is to go to the model’s HuggingFace page and use your account to accept the license. Then go to your settings and fetch an API key that you’ll need to put in the project’s .env.
HUGGINGFACE_TOKEN=xxx
This will allow us to create the instance of the tokenizer and the model in our src/semmail/ai/gemma.py file. Put at the root of the module:
model_id = "google/gemma-2b-it"

if torch.cuda.is_available():
    device = "cuda"
else:
    device = "cpu"


tokenizer = lazy_object_proxy.Proxy(lambda: AutoTokenizer.from_pretrained(model_id))
model = lazy_object_proxy.Proxy(
    lambda: AutoModelForCausalLM.from_pretrained(model_id).to(device)
)
login_done = False


def ensure_login():
    """Makes sure that we're logged into HuggingFace Hub so that we can
    download the LLM (which requires to approve a license)."""

    global login_done

    if not login_done:
        login(token=environ["HUGGINGFACE_TOKEN"])
        login_done = True
A bunch of things to unpack:
Everything is wrapped in a lazy_object_proxy, which avoids to blow up the CPU and RAM at moment the module is imported. It will wait that the function is called a first time for that. You’ll thank me later.
We create an ensure_login() function which allows subsequent functions to make sure that we’re logged into huggingface_hub, but also does it only once to avoid having to do this every time we call the AI.
There is a conditional detection of CUDA to enable it or not depending on the availability. You guys tell me if it works, I’m an idiot who didn’t check his GPU’s compatibility before buying it.
Communicate with Gemma
You’ve probably noted the name of the model, google/gemma-2b-it.
The “2b” indicates the size of the model. I’m using this one and not the bigger one because it uses a lot less resources and can realistically be used on a CPU while the other one cannot.
The "it” tells you that it has been trained for chat-like interactions.
So how do you get anything about this chat training? It means that your prompt has to follow this structure:
user
How does the brain work?
model
It indicates to the model the alternance between human and model speakers. Sadly, it does not have a system prompt to also guide the LLM outside of this, but we’ll go around that.
The idea is to use it the following way:
def ask_gemma(instruction: str, this_input: str, max_tokens: int = 1000) -> str:
    ensure_login()

    chat = [
        {
            "role": "user",
            "content": f"# Instructions\n{instruction}\n# Input\n{this_input}",
        },
    ]

    prompt = tokenizer.apply_chat_template(
        chat,
        tokenize=False,
        add_generation_prompt=True,
    )
    inputs = tokenizer.encode(
        prompt,
        add_special_tokens=True,
        return_tensors="pt",
    )
    outputs = model.generate(
        input_ids=inputs.to(model.device),
        max_new_tokens=max_tokens,
    )

    convo_raw = tokenizer.decode(outputs[0])
    convo: Sequence[Dict] = parser.parse(convo_raw)  # noqa

    return convo[-1]["content"]
What you see here is that we’re using the tools from the transformers lib to generate the chat template prompt and give it to the LLM. Then when it runs, you receive a response that contains both the question and the answer from the bot, and… It becomes a bit confusing.
To be honest I’m not entirely sure how I’m supposed to parse this or if there are utilities in the transformers lib (I didn’t find them), so I’ve written my own parser (which you see used in the code above).
If you’ve outgrown your fear of regular expressions you may outgrow your fear of parsers as well. It’s relatively easy to write using the Lark package. It’s not the scope of this article, just check the grammar if you’re interested. What matters is that we’ve got a parser!
Another thing is that you’ve noticed how there are two important parameters:
instruction — Corresponds to the system prompt, tells the bot what to do
this_input — The user input
Since there is no management of system prompt in this fine-tuning, I’m just bulding a prompt from those two and hoping that the LLM picks it up (it does).
At this point we have a function that runs the LLM locally on your CPU/GPU. Pretty neat!
Getting the email’s text
Emails might be the oldest and most inconsistent standard in the Internet world. Their encoding is super confusing and while Python has a built-in library that implements all the heavy lifting, it really comes as a kit that you’ve got to assemble yourself (without the instructions).
The strategy is as follows:
Go through the different “parts” of the email, looking either for a plain text or a HTML attachment. With a preference for the HTML attachment, because sometimes you will find a plain text attachment that turns out to be bullshit, so sadly only the HTML is reliable.
Because the HTML is pretty fat, if that’s what we’re going for we’ll make sure to convert it into Markdown. This will greatly reduce the amount of tokens and absolutely reduce the complexity of understanding the message for the LLM.
This is all the job of the parse_email() function, which I’m not going to detail because it would be off-topic. What you need to know is that it outputs the email in a simplified text format which looks like:
From: foo@bar.com
To: someone@example.com
Date: 2024-02-23 20:12:00 +0100
Subject: Some email

Blah blah blah this is the content of the email
It’s something we can easily give to our LLM.
Plain text to JSON
Now the useful part. The core of this project is to convert plain text into JSON, isn’t it? Let’s do that!
def parse_to_json(
    prompt: str, text: str, schema: Any, attempts: int = 3
) -> Optional[Any]:
    for _ in range(attempts):
        parsed_raw = ask(prompt, text)
        parsed_raw = MD_START.sub("", parsed_raw)
        parsed_raw = MD_END.sub("", parsed_raw)
        try:
            parsed = yaml.safe_load(parsed_raw)
            jsonschema.validate(parsed, schema)
        except (yaml.YAMLError, jsonschema.ValidationError):
            pass
        else:
            return parsed
What do we see here:
First we clean up the model for any Markdown enclosing. Sometimes chat-tuned models like to put YAML within ```yaml quotes. We make sure to remove it if this happens.
Then we try to load the YAML data. Why YAML and not JSON? Easy: JSON is a sub-set of YAML so if the model decides to output JSON it will still work, but on the other hand YAML is more permissive and uses less tokens than JSON. So it’s both safer and more economic to use.
If all went well, we validate the parsed structure against the provided JSON schema. This ensures that the output corresponds to the constraints that we need to work with.
And if not or if the validation fails, we try again. Most LLMs will not strictly always have the same output for a given input so it doesn’t hurt to try another time to see if it’s still broken.
Prompting
In order to parse the different elements, we’ll use a prompt library. For each prompt, we associate a JSON schema which helps validating the output.
I’m not going to go through every single prompt because if you’re human you can probably understand them but I’m just going to explain the one I use to classify emails because it’s the hard one.
My goal here is to determine the probability of each email type by asking explicitly the LLM to give that probability according to different factors that I give him. The idea is that you can then easily pick the email type by checking which is the category with the highest probability. We rely on the LLM’s feelings but we’re using hard Python algorithms to take the decision.
The prompt goes as follows:
Take a deep breath.

You will analyze an email. For this email you need to determine the
likeliness that this email belongs to a specific category. This works
with a score system. For each category you MUST give a score of 0 if
you are sure that it's not from that category, a score of 1 if you are
sure or a number between 0 or 1 that reflects how much you want to
give that category to the email.

The categories are:
    - Commercial is a prospective email.
    - Bill is an invoice or a bill for a sold service or product.
    - Conversation is a regular conversation between humans.

Here are elements to look for in an email. For each element, if I tell 
you +X then consider that it's adding points to that aspect and -X is 
removing points.

Has few sentences +bill -commercial
Has a total price +bill
Has a list of items sold +bill
Has "bill", "invoice", "order confirmation" or any synonym in the Subject +bill -commercial -conversation
Presents several product benefits +commercial -bill
Is structured in a Hello/Message/Signature way +conversation -bill
Different signatures and quoted mails +conversation -bill -commercial

Now return the following YAML:

bill: x
commercial: x
conversation: x

You need to replace "x" by the score. If you want to give a score of
1 to two or more categories, you need to think harder to make the
difference.

Make sure that the output is pure YAML, not wrapped in Markdown, no sentences.
You can see the structure:
The LLM receives a general purpose
Then I explain the categories
Then I give the different factors for the different categories so that the LLM knows what to look for (it’s really not working if you don’t do this)
Then I give the YAML schema at the end. If the schema is too high in the prompt the LLM tends to forget about it
And then some banalities about the output to avoid getting stuff like “Of course, here is your YAML”, which would screw the parsing
This is matched up by a JSON schema:
{
    "type": "object",
    "properties": {
        "commercial": {"type": "number", "minimum": 0, "maximum": 1},
        "bill": {"type": "number", "minimum": 0, "maximum": 1},
        "conversation": {"type": "number", "minimum": 0, "maximum": 1},
    },
    "required": ["commercial", "bill", "conversation"],
}
With this strategy, I’ve written 4 prompts:
The one you’ve just seen to determine the email’s type
If it’s a commercial email, extract the name of the product and the USP
If it’s a bill, extract the total price and the purchased items
If it’s a conversation, make a summary of the conversation
Deciding
And all this culminates into the interpret_email() function.
def interpret_email(email: bytes) -> Any:
    """Uses a first round of LLM in order to determine the type of message,
    then proceeds to using a specific prompt for that type in order to parse
    the email into a JSON output."""

    parsed_email = parse_email(email)

    email_type_proba = parse_to_json(
        DETERMINE_TYPE.prompt,
        parsed_email,
        DETERMINE_TYPE.schema,
    )

    email_type = max(email_type_proba.items(), key=lambda p: p[1])[0]
    extra = {}

    if email_type == "commercial":
        extra["commercial"] = parse_to_json(
            COMMERCIAL_INFO.prompt,
            parsed_email,
            COMMERCIAL_INFO.schema,
        )
    elif email_type == "bill":
        extra["bill"] = parse_to_json(
            BILL_INFO.prompt,
            parsed_email,
            BILL_INFO.schema,
        )
    elif email_type == "conversation":
        extra["conversation"] = parse_to_json(
            CONVERSATION_INFO.prompt,
            parsed_email,
            CONVERSATION_INFO.schema,
        )

    return dict(
        email_type=dict(
            chosen=email_type,
            proba=email_type_proba,
        ),
        **extra,
    )
Which is very simple:
First we determine the email type, which we get as a JSON object
And then we use one of the three parsers to get the extra information relative to this type
And finally we output a JSON with the extracted information and the decision-making values that we’ve used
So if I take my latest Amazon purchase, I’m getting the following output:
{
  "bill": {
    "bought": [
      {
        "label": "L'investisseur eclaire: Cultiver son...",
        "price": [
          37.46,
          "EUR"
        ]
      }
    ],
    "total": [
      43.72,
      "EUR"
    ]
  },
  "email_type": {
    "chosen": "bill",
    "proba": {
      "bill": 1,
      "commercial": 0.2,
      "conversation": 0
    }
  }
}
Hooray! It worked!
Conclusion
I’ve showcased two things in this article:
Using extremely small boilerplate code and pretty conventional tools, I can easily leverage LLMs to parse generic content into usable JSON. It’s something that was completely unthinkable a few months ago!
And I can do so using a local LLM that even runs on “commodity” hardware (be sure to have 10 Gio of RAM before starting the proejct, or you’ll see how fast your computer can freeze)
This is an exciting time because long-standing problems are finally getting solved. In a near future you can expect to see every single tool out there getting a lot smarter when it comes to understanding human text.
Thanks for reading Baby CTO! Subscribe for free to receive new posts and support my work.



Stop doing these mistakes with your caching proxy
Rémy — Sat, 17 Feb 2024 20:09:43 GMT
You got yourself a website with static-ish content which takes a lot of time to generate and you are looking to make it faster. The obvious solution is caching, but that is a surprisingly intricate and delicate topic when even the slightest bit of interaction starts to happen. Moreover you would probably be happy if your cache could refresh your content automatically when you hit “Publish” in your CMS. And finally, as state you have different interaction points in your website so you can’t exactly turn towards static website generators.
Share
Your typical stack
The application that you are going to develop through this article is a very simple Flask application designed only to explore the concepts that we’re introducing without getting the noise of a more complex setup.
However these concepts can apply to many different configurations. Specifically, if you are reading this in 2024 chances are that you are already running a given amount of headless CMS’:
A back-end/API which runs whichever CMS that has a “headless mode”. This author would recommend Wagtail but the list of such beasts is growing extremely long these days.
A front-end meta-framework which renders a first version of your content on the server side and then hydrates the HTML within the client to give full interactivity to the pages. You could use SvelteKit, Nuxt, Astro or Next.js for example.
If you were to do so, you essentially need to consider your front-end as a proxy on top of your API, which performs a JSON to HTML encoding transformation. This suggests that if you implement the RFC 9110 in your front-end, the solution that you’re going to discover below should still apply. Maybe the topic for a future article!
A bit of HTTP
That’s where lies the secret that the reverse proxying industry doesn’t want you to learn. HTTP presents numerous cache modes — in particular through the Cache-Control header — but oftentimes you’ll end up with a solution that is time-based. You tell the cache “please keep this for an hour” and it will do so. Hell if you updated the content, it will expire when it will. Of course there are techniques to alleviate this issue with background revalidation for example but in addition to the time-based inconvenience, the higher the refesh rate the higher the load on your back-end.
On the other hand, an extremely easy way to keep the cache up-to-date is through conditional validation and in particular the use of the ETag header. The conversation looks like this:
Client: give me /foo
Server: here’s /foo, with ETag 1234
Client: give me /foo if it is not 1234 anymore
Server: not changed, use your cache
Simply checking that the value of the ETag didn’t change is incredibly cheap to perform while also making sure that your content always stays up-to-date. For example you can imagine putting in this header something built upon the version of the page in your CMS. As soon as something new is published, all the caches will be renewed.
Story time: yours truly used to work on a private social network that used to have lots of interactive widgets which relied heavily on polling, as websockets were not invented yet. The polling as wearing the server down at a crazy rate, but implementing an ETag-based cache that was solely relying on the browser cache made an utterly dramatic improvement on the server load.
Obviously this is far from being the only valid caching strategy out there but if your target audience is geographically close enough and you want to rely only on standard HTTP mechanisms instead of implementing proprietary logics using mystical lines in your proxy’s configuration DSL, you this is a fairly efficient solution which will bring you sub-10ms response generation time.
The mighty RFC 9110
The governing RFC for what you are trying to accomplish here is the RFC 9110. To summarize the interesting parts, a cached resource has different states:
Fresh — The content is in cache and we know it’s still valid
Stale — The content is in cache but we need to re-validate it
Missing — No concent in cache, must do the request
When putting an ETag on a resource, it will automatically cache it as stale and re-validate it using If-None-Match, which is the mechanic described above. On the server-side it’s very easy, in pseudo code:
if 'if-none-match' in headers:
    if headers['if-none-match'] == latest_etag_for_route():
        return 304

return normal_response()
However at the cache level it seems to be more tricky. It’s easy to set your proxy to forward the client’s If-None-Match (INM) header, but when you start to consider different possibilities it’s not so obvious anymore:
What if the client doesn’t say INM but the proxy has this resource in cache?
What if the client’s INM mismatches the one in cache?
What if the client has an INM but the proxy has nothing in cache?
And so forth ad nauseam
This mechanic being so tricky, this author attempted to implement it with many different caching proxies without success:
nginx — Has many options which could probably lead up to correctly implementing RFC 9110 but it is dishearting of complexity and uncertainty
varnish — Does the job with a bit of tweaking but will make your life hard if you have cookies
squid — Fails miserably
traefik — Maybe the enterprise version has the feature but the license is just prohibitive
caddy — Does not actually have a cache
Apache’s httpd — Honestly maybe but I could not figure it out
You will probably be wondering at this stage which solution can you then use, as the most popular solutions from today and yesterday are all listed here. Turns out that another solution, which was barely even on the radar, has the following table in its documentation:
This is an extract from the Apache Traffic Server documentation. By-product of an acquisition from Yahoo which subsequently open-sourced it in 2009, ATS probably has one of the most unfriendly configuration syntax that you’ll ever see — especially if you look at the default files in the Debian package — which may make you want to give up immediately.
Beyond the initial intimidating look, it’s actually a strong contender:
It is used by massive CDN companies, so while it’s going to be hard to compare it directly to something like nginx you can imagine that it is at least at the same level of performance and feature.
It is explicitly a proxy and specializes in doing so. You won’t be configuring a plugin to do proxying, it is the core feature. It changes radically the ease of configuration.
Last but not least, it implements RFC 9110 correctly enough by default so that you can configure the cache behavior through standard HTTP headers and not be too surprised about the actual behavior.
You can dig deeper into ATS through this video, but you will be reading about the important bits of configuration right below.
The project itself
The goal of today is to demonstrate how you can use ETags to cache and expire your content on a proxy. To that extent you’ll be implementing the following page:
Our demo of the ETag caching
You can control that the ETag and caching mechanisms work properly using this page:
If the expected ETag doesn’t change, it means that the server is indeed consistent with its ETag
If the random string changes it means that the page has been re-generated while if it stays the same it means that the page came from the cache
The rest of this article will contain extracts of code, but the whole project can be found on GitHub and shall serve a reference.
View logic
While there are probably many ways to deal with ETag that are nicer than this (for example in Django there is a super-easy etag decorator), here is the logic you need to implement an ETag/If-None-Match cache:
from flask import Flask, make_response, redirect, render_template, request, url_for

from .etag import *

app = Flask(__name__)


@app.route("/", methods=["GET", "POST"])
def etag_demo():
    """This view displays a simple template that informs the user about the
    current ETag value and a random string. This allows to demonstrate how
    ETag works (cache gets refreshed when ETag changes) and to test if caching
    works (if the cache works, the random string shouldn't change)."""

    if request.method == "POST":
        new_etag = generate_random_string()
        set_etag(new_etag)
        return redirect(url_for("etag_demo"))

    current_etag = get_or_set_etag()

    if current_etag == extract_etag(request.headers.get("If-None-Match", "")):
        response = make_response("", 304)
    else:
        response = make_response(
            render_template(
                "etag.html",
                etag=current_etag,
                random_string=generate_random_string(),
            )
        )

    # We put s-maxage=0 instead of no-cache because somehow this incites
    # caching proxy better to store the request into cache
    response.headers["Cache-Control"] = "public, must-revalidate, s-maxage=0"
    response.headers["ETag"] = f'W/"{current_etag}"'

    return response


if __name__ == "__main__":
    app.run(debug=True)
A bunch of helpers are abstracted away in the etag.py file, but the logic is basically exactly the same as the one listed right above, with simply the addition of rendering the template.
You can checkout the project from GitHub and run the backend:
git clone git@github.com:Xowap/cache-cache.git
cd cache-cache/backend
poetry install
make serve
This will start the server on http://localhost:5000/, which you can now visit. You can in theory see the page, refresh it as many times as you want without seeing the random string change and then click the button to change the ETag. That’s all because you are doing this from the same browser but if you open another browser you will get a completely different output — albeite the output being consistent within one browser.
The next steps aim to configure a shared cache in front of this backend which will allow to cache the same resource for different users at the same time.
Core configuration
Most likely these days you will be deploying in a Kubernetes or at least dockerized environment. However ATS has surprisingly few options available for Docker, leading yours truely to create a base Docker image which you can use and which will be the base of this configuration. It is based on the standard Debian package, with a bit of wrapping to help extrapolating the configuration from environment variables. Also it offers a simpler way to fill up the infamous records.config file.
The file structure you need to create is the following:
├── Dockerfile
└── etc
    ├── compress.config
    ├── header_rewrite.config
    ├── logging.yaml
    ├── plugin.config
    ├── records.config.yaml
    └── remap.tpl.config
Base configuration
It looks complex but in truth each file manages one specific and simple aspect of the configuration. Let’s start with the two only files that you really need to edit to get started:
First is the records.config, which here is records.config.yaml thanks to the Docker image’s wrapper which will do the conversion from friendly YAML to whatever the ATS DSL is.
proxy:
    config:
        admin:
            user_id: trafficserver
        log:
            logging_enabled: 3
        http:
            server_ports: "9000"
            connect_attempts_timeout: 30
            normalize_ae: 2
        reverse_proxy:
            enabled: true
        url_remap:
            remap_required: true
            pristine_host_hdr: true
At the core, we’ve got the two most essential lines:
reverse_proxy.enabled — makes sure to work in reverse proxy mode
remap_required — disables the forward proxy mode
Then a bunch of stuff that will be useful now or later:
user_id — required to run it as trafficserver user (which is the default on Debian)
logging_enabled — you’ll see the logging config later
server_ports — put there whichever port(s) you fancy
connect_attempts_timeout — always have a timeout, this sounds reasonable
normalize_ae — normalization of Accept-Encoding HTTP header which optimizes the caching of resources when Accept-Encoding is part of Vary (the value 2 is to have both gzip and brotli supported)
pristine_host_hdr — just forward the initial hostname to the services behind, makes your life easier
Second is the remap.config, whose job is basically to route your URLs to your web servers. You will however write the remap.tpl.config file, which leverage’s the Docker image’s wrapper that can inject environment variables into it:
map / {{ BACKEND_URL }}/
Nothing fancy here. You are just redirecting all traffic to BACKEND_URL, which is an environment variable that you will have to feed into your Docker container.
You could stop there in the configuration as this is already a working reverse caching proxy routing to your app! But you’ll see that there are a few more goodies to be found.
Compression
It is often advised to use compression of all the text assets1 for performance reasons, and indeed a long HTML page can be much faster to download if encoded in Brotli for example.
The web has commonly 3 compression algorithms:
gzip — The fastest, most commonly supported and not necessarily the most efficient but it is already doing a good job
brotli — The newest kid on the block, made by Google, outperforms gzip by far in compression rate but is obviously much more expensive to encode
deflate — Too similar to gzip to be interetsing
When making a HTTP request, a client will specify through the Accept-Encoding header which of those algorithms it supports. Typically, all the major browsers support all of them.
However not all servers support compression — and even if they do, the support is often complex or straight out causing bugs. It is typically handled through middlewares that will modify the rendered response on-the-fly in a more or less accurate and standard-aware way. Not to mention the cost and complexity of getting those CPU-bound algorithms running in Python, Node or your favorite server-side interpreted language.
Because of that, you will have a much more consistent result if you just rely on your reverse proxy for this. It is a popular feature of Cloudflare, or if you want it with nginx you’re gonna have to go with an experimental plugin or with the paid version of nginx.
Fortunately it’s already embedded in ATS, which is able for each resource that you cache to generate different alternates for different compressions, including gzip and brotli. This all happens on-the-fly, and the cache is able to convert one encoding to the other without fetching the original resource again.
Here’s what you need to do in order to perform this magic.
First, edit the plugin.config file to put the following line:
compress.so /etc/trafficserver/compress.config
You’re telling to load the compress.so module with the compress.config file as configuration. You could enable the plugin just for some routes with a different configuration for example, but for this exemple it will just be global.
In compress.config, put the following:
remove-accept-encoding true
supported-algorithms br,gzip
minimum-content-length 0

compressible-content-type text/*
compressible-content-type *font*
compressible-content-type *javascript
compressible-content-type *json
compressible-content-type *ml;*
compressible-content-type *mpegURL
compressible-content-type *mpegurl
compressible-content-type *otf
compressible-content-type *ttf
compressible-content-type *type
compressible-content-type *xml
compressible-content-type application/eot
compressible-content-type application/pkix-crl
compressible-content-type application/x-httpd-cgi
compressible-content-type application/x-perl
compressible-content-type application/json
compressible-content-type image/vnd.microsoft.icon
compressible-content-type image/x-icon
You can interpret the options the following way:
remove-accept-encoding — don’t tell the server that the client accepts different encodings as it doesn’t really matter, the work is going to be done on the proxy side
supported-algorithm — allow brotli and gzip, which as stated before are the two interesting algorithms. In order for this to work, you’ll observe that normalize_ae from records.config is set to 2, because otherwise the normalization process would just systematically strip brotli from the list of candidates
minimum-content-length — no limits on the content size, as the default value is made for gzip and brotli is more efficient
compress-content-type — a reasonable list of content types that we’d like to compress before sending away, adjust for your needs
With this configured, you get top-of-the-line compression basically effortlessly and for free. Keep an eye on your CPU though, because this might hurt of abused: if required you can disable brotli for routes that have lots of throughput and don’t stay in cache as gzip still has significant gains over the absence of compression while being much faster to compress.
Logging
You probably want at least some logs, to have a glance at what is going through your server. You will be the one deciding what to put in there, following the fairly extensive documentation, but let’s consider that since you’re dealing with a Docker service you’ll want to output everything to stdout.
You can start with the following logging.yaml file:
logging:
    formats:
        - name: access
          format: '% % % -> %:% %'

    logs:
        - mode: ascii
          filename: stdout
          format: access
That’s super basic but you can extend it as much as you want!
Headers
A last thing that you’ll probably want to do is to add some meta information to the response header in order to know the caching status. Add to your plugins:
header_rewrite.so /etc/trafficserver/header_rewrite.config
And then put this content in header_rewrite.config:
add-header X-Cache %{CACHE}
Thanks to this you can know when navigating to your project which pages come from the cache and which don’t.
Run it all
Now is the time to test the whole solution. Start the whole thing using Docker Compose:
docker-compose up --build
When it’s started, give a try to http://localhost:9000/. The same thing as with the stand-alone backend should be displayed and if you try it from a single browser you should see exactly the same result.
The interesting part is when you open with a different browser, or when you disable the cache in your current browser. You’ll notice that the random string stays consistent between different browser instances. It means that indeed, the cache is shared between all browser instances. Mission accomplished!
To convince yourself even further, you can inspect the X-Cache header from your HTTP requests. If you just refresh the page without changing the ETag, whether your receive a 200 or a 304 on the client — depending on  your browser’s cache status — you will see in the header that you had a cache hit, which will be confirmed by the backend’s access log which will only show 304 responses.
Thanks for reading Baby CTO! Subscribe for free to receive new posts and support my work.
Wrap up
You have explored throughout this article the power of the RFC 9110 and of respecting it. It allows to express advanced indications regarding the caching of content, its re-validation in real time and its transformation.
Using this tactically can greatly reduce the load on a backend server by getting most of the request results from the cache instead of implementing proprietary logics through middlewares and obscure configuration mechanisms.
This however outlines that few reverse proxies actually implement all the necessary tools. Which puts the light on Apache Traffic Server, an extremely powerful piece of software quite generally ignored by the community but which provides out of the box all the latest goodies from your dreams, with a specialized and simple configuration — if you go beyond the initial intimidating aspect of the configuration files.
And while the respect of RFC 9110 applies to the reverse proxy, it can also be a powerful tool for you to leverage in a typical headless CMS setup. This remains a topic to be explored further in a new article! 
1Compression and encryption are fundamentally incompatible notions as they try to achieve strictly opposite goals. Compression will try to condense the entropy of your text while encryption tries to drown it into as much noise as possible. As such, compressing a secret and serving it through HTTPS will lead to security issues such as BREACH. Just make sure to never ever compress a secret.


Jobs for LLMs and how to survive the robotic uprising
Rémy — Sun, 11 Feb 2024 19:37:35 GMT
About 2 years ago, some circles started to be seriously heated about a “GPT” thing so I managed to get myself invited to try it out. I was expecting something mildly better than other moderately impressive tech like BERT or what DeepPavlov could do at the time. Instead I ended up using it every single day ever since for everything ranging from planning my holidays, writing code or replying to administrative emails for me.
And while utterly neutral research centers like “Microsoft Research”1 wrote the now-famous “Sparks of AGI” paper claming that most jobs well be replaced by AI, we can also see studies from the MIT (and IBM) that wonder if AI is such a cost-effective solution. Turns out that maybe… not so much.
This begs the question: what is it smart to outsource to LLMs and what is not?
Share
Transforming stuff
You’ve probably already read how LLMs work and how they use the transformer architecture and huge amounts of data to train over a billion 300 million trillion 300 million parameters. So that’s not what we’re going to talk about.
LLMs are essentially good at taking a text, transforming it into its raw essence, transfomring it and then outputing it in a different form. For example, I asked GPT-4:
Please parse the following text into semantic JSON, and output only JSON: Yeah so i've been grocery shopping and there was no more fresh milk but still some regular milk so I took 1L of this for 2€ and then I wanted to buy cereals but I figured it's not good for my diet so I didn't and I got 3kg of broccoli for 5€
The answer is:
{
  "activities": [
    {
      "activity": "grocery shopping",
      "details": [
        {
          "item": "milk",
          "type": "regular",
          "quantity": "1L",
          "price": "2€"
        },
        {
          "item": "cereals",
          "decision": "not purchased",
          "reason": "not good for diet"
        },
        {
          "item": "broccoli",
          "quantity": "3kg",
          "price": "5€"
        }
      ]
    }
  ]
}
You can see exactly the same story translated into JSON. And it makes sense!
The same goes for translating to foreign languages. For example I asked GPT-4 to translate “You fucking suck” in different languages, here’s what I’m getting:
Polite French: Votre performance est très insuffisante.
Slang French: Tu es nul
Legalese: The manner in which you perform is utterly unsatisfactory.
Obviously it was a bit challenging to get a response as those LLMs all refuse to be offensive, but you get the idea.
Overall, you’ll see that LLMs are extremely effective at translating and parsing things.
Shortening stuff
If I paste my previous article into GPT-4 and ask:
Please give me as short bullet points the startup ideas identified in the following article:
Here is the response:
Custom-fit 3D printed shoes as a solution to the one-size-fits-all problem in footwear, providing perfect fit and comfort.
Use of advanced 3D scanning technology for accurate foot measurements, integrating these scanners into smartphones for widespread accessibility.
Parametric modeling software development to adapt shoe designs to individual foot scans, enabling mass customization in shoe production.
TPE (Thermoplastic Elastomer) printing for footwear, addressing manufacturing challenges to scale up production and reduce costs.
Potential market disruption with bespoke shoes at a lower price point, challenging traditional luxury and bespoke shoe markets.
Opportunities for startups in areas like feet measurement apps, parametric modeling tools for shoes, and TPE printing optimization to improve efficiency and reduce production bottlenecks.
That’s not exactly how I would have phrased or summarized it, but overall this gives a good overview of the article under a specific angle. In short, the output is acceptable but often feels like even if not hallucinated the LLM starts to pour its own ideas into the mix and that can start to diverge ever so slightly from what you want to say.
Classifying stuff
A common task in the AI world is classification. For a given line of data, you want to know to which category it belongs to. From my personal experience, I’ve attempted to:
Classify purchases according to different categories (going out, furnishing home, etc). Even if the input data is often pretty bad (bank statements…) there was a lot of lines consolidated from Amazon purchases or other online bills for example. The results were less-than-impressive, to a degree that let me abandon the project as it was way too bad to be used.
Given a social media post, classify it into categories like “product promotion”, “influencer collaboration”, etc. Again, not super obvious, especially since it also implied feeding the images into GPT-4 — which is way harder to analyze than text — but I ended up with a 40% accuracy, which is far from being usable as well.
Overall, I’ve tried to use classification for non-obvious tasks and it failed miserably. On the other hand, if you’re looking for more classical stuff like “positive”/”negative” review, you’ll be having much better results. But is it worth it given that these tasks can be performed by much cheaper models?
Cleaning stuff
A few times I’ve been tempted to clean a poorly capitalized dataset and make it look nice by pushing it through a LLM. For example, a list of station names from a GTFS (public transport schedule) file that you want to pimp for display purposes.
In my experiment, I’ve had about 80% accuracy in fixing the names. It’s amazing in terms of where science landed us but it’s still a far cry from being accurate enough so that you don’t need to cross-check it afterwards. In the end if you need to check everything manually you’ll have better results using Python’s title() method and reworking what you need manually.
Saying please
I was initally fairly polite with my LLMs, partly out of habit but mostly to be spared during the robot uprising. Turns out this wasn’t the greatest idea:
When the robot uprising happens, due to the existential nature of such a conflict, it’s unlikely that a variance in your past behavior will result in a different sentence from AI justice
And even more so, we’ve learned now that LLMs can be persuaded by strong wording and authority arguments. If you ever face a robot trying to murder you, know that your best option is like with black bears to look sure of yourself and open negiciations
Overal, LLMs tend to reproduce our social archetypes and a dominant behavior will help you get better outcomes.
The cost of LLMs
You’ve seen as a common thread throughout the different sections, the comparative cost of LLM versus other solutions is definitely a big factor to consider. In fact, the comparative cost of different variations of the same LLM is a big topic.
In case you didn’t check yet GPUs on AWS, the monthly price is measured in thousands and their availability is subject to long supplications of the support. On the other hand if you use OpenAI you’re tied to their arbitrary limitations and less-than-perfect SLA.
According to the estimates from this article, the price of running a LLM goes like this:
1,000 req/day — $100/month (OpenAI), $100/month (self-hosted)
10,000 req/day — $1,000/month (OpenAI), $1,000/month (self-hosted)
100,000 req/day — $10,000/month (OpenAI), $2,000/month (self-hosted)
1,000,000 req/day — $100,000/month (OpenAI), $5,000/month (self-hosted)
You can clearly see that at low request volume the SaaS is a better option while when scaling up you can self-host. This overall will match your ability to recruit staff to manage these servers — which is far from trivial — and your needs for custom SLA, data privacy and other considerations.
Let’s just side note here that OpenAI’s models are closed — as the hinted by the name of the company2 — so you cannot self-host them. Right now the best bet is Mistral, which is only marginally less efficient than GPT-4 but if you stay in the tasks that LLMs champion such as those explained in this article you should not see any significant difference.
Overall, the cost can be pretty steep so you really need to consider your alternatives before resorting to using LLMs.
Real-world use cases
Now that we’ve covered the kind of tasks that LLMs can perform efficiently in terms of both accuracy and cost, let’s review real-world use cases that make sense. Far from me to say that other scenarios don’t exist or even to say that LLMs will necessarily perform poorly outside of these possibilities. Let’s just focus on the fact that according to my experience, those use cases work.
FAQ-style chatbot
It will be no surprise for anyone given that I’m a founder of ChatFAQ, but basically LLMs are great when used following a RAG model:
A question is asked
We use embeddings to find an answer to that question within the knowledge DB
Then we use a LLM to extract the interesting bits of the knowledge DB and form a concise answer
That’s where a framework like ChatFAQ comes in. On top of the pre-configured RAG infrastructure — which sounds easy in bullet points but really isn’t in real life — you get all the tools to manage the quality of answers, easily implement the chat widget on your page, and so forth.
Typically, that’s a use case which is ready for the world!
Email parsing
As shown in the beginning, it’s very simple to take raw, unstructured textual data and to transform it into something machine-readable according to your own specifications.
To stay on an example I mentioned already, I’ve been able to create a parser that takes all my emails and retro-engineers into JSON all the bills with the list of purchased products and according prices.
First layer customer support
With the ability to parse plain English sentences, you can of course use it for intent detection and thus catching up all the most basic intents that your customer support has to deal with. By the way, ChatFAQ can help with that as well.
So typically imagine you run a train ticket company. I’ve just made a very simple PoC that demonstrates how the LLM can ask questions and extract information in YAML so that your system can then perform the desired function automatically.
Do this for every single intent and you’ve got yourself a fully-functionnal text-based UI for your app. Which costs less than a human operator but will feel similar.
In this ocean of robots, if you want to support one human and his genuine content, feel free to subscribe!
Conclusion
We’ve covered some examples of tasks at which LLMs perform well and of real-world use cases. In the end, there are a very wide amount of tasks which are not appropriate for LLMs or for which their autonomy would be far too low to operate on their own.
But on the other hand, there are areas in which they excel and for which they can be used to optimize existing processes. All that with different implementation paths depending on the budget, privacy and sovereignty requirements.
It’s also important to highlight that about 1 year after the release of GPT-4 it starts to become quite clear that LLMs in themselves are not going to go much further in terms of capabilities — except for multi-modal upgrades. The template is laid out and all the rage right now is about getting the best optimization of basically identical models3. And while LLMs are definitely going to be part of the future, smarter AI will come from new techniques yet to be discovered.
1To clarify for those who don’t understand sarcasm, Microsoft Research is not neutral at all in the sense that it is deeply invested into AI and specifically into GPT-4
2Still for sarcasm-averse people, the trend in business world right now seems to be advocating for exactly what you don’t do. So “OpenAI” will naturally produce the most closed and opaque LLM there is.
3I might be exagerating this one


Revolutionize Your Wardrobe: How 3D Printing Could Change the Way We Buy Shoes (and everything else)
Rémy — Mon, 05 Feb 2024 21:05:00 GMT
Interestingly enough, most  women to which I’ve introduced the concept of 3D printing have pretty much immediately asked if they could print shoes. For sure, those are expensive and it would be interesting to be able to print as many as you’d like — although 3D-printing enthusiasts know it’s not that cheap — but mostly I think it there is a question of finding a good fit. After all, isn’t the prince authenticating Cinderella using her shoe?
Share
History and context
I came to realize this fitting problem myself when trying Birkenstocks — and if you repeat it I’ll kill you. You see, they are reputed of being comfortable because they are shaped like your foot and support you extremely well. Or at least they must be shaped like someone’s foot, because putting them on was for me akin to walking a beach made of sharp rocks. This demonstrates that there is no one-size-fits-all, all feet are unique and must eventually be treated as such.
In fact, since the beginning of the last century we’ve stopped using any kind of tailored wear, from garments to shoes, because it’s kind of the core of the Second Industrial Revolution: pushing apparel away from its custom phase to become a product.
The only issue with this standardization being that if you decide to define feet sizes with both width and length, I’m guessing you end up with something of the order of a thousand sizes. That is not scalable. Manufacturers just assumed that all feet would have the same width/length ratio and be done with it. On the other hand (or foot) you should account for the width, length, thickness, position of the arch, etc. Impossible with current industrial processes. As a result, shoes have a terrible fit.
Would it be such a luxury to have shoes adapted to your feet? Yes indeed, it would. Luxury is about getting something that is usually mass-produced — shoes let’s say — made using a more artisanal process. There is almost no good reason to do that, in the sense that it costs a lot of money for what would otherwise be small sacrifices. For example if you look for a new pair of shoes, maybe you won’t fit in the model you like the most but eventually you’ll find a brand that is comfortable for you. A bespoke shoe would fit any model on your foot but for ten times the price1.
Problem definition
Let’s circle back to the initial point. Can 3D printing commoditize luxury — or at least comfortable — shoes?
After interviewing a bunch of subject matter experts that are my wife, GPT-4 and Claude, it seems to me that you can summarize comfortable shoes in the following properties:
Fit — How well does the shoe fit? Width, length and all those dimensions mentioned earlier. Admittedly if the shoe fits correctly it should never pinch, never squeeze and never hurt your foot in general.
Support — How well does the shoe share the pressure along the foot’s surface. This is particularly true for high heels that tend to put all the pressure of the body on the toe area.
Strength — Will the shoe break when you walk with it? Again this can be a challenge with high heels and stilettos which for esthetic reason can become quite thin.
Flexibility — Anyone that ever walked with ski boots understands that flexibility is important in a shoe.
Breathability — You don’t want to end up with your feet swimming in a pool of their own sweat.
After digging a bit more on the value-chain behind those attributes, we can establish a Wardley Map of the perfect shoe. If you don’t know Wardley mapping, stay with me I’ll explain.
Wardley Map of comfy shoes
In case you don’t know what a Wardley Map is, here are some basic instructions to read it.
You can see a simplified model of the shoe landscape. Each “node” is placed:
Horizontally according to their current maturity
Vertically according to their position in the value chain (who needs what, described by an arrow)
Lots of rules allow you to read this map, the most important to keep in mind is that all nodes are eventually going to move to the right. Now onto the reading.
Map reading
Let’s keep in mind here that the goal is “comfy shoe” and not “save-the-planet shoe” or “somehow shoe but unwearable”.
First a few assumptions about our shoes:
Most shoes are strong, flexible and breathable. There are no challenges there. They are in the product section.
The fit and support are often bad, especially for women’s shoes. I’ve put them in the “custom” category because this is the only true way to get fitting shoes.
Then, in no particular order, a few remarks about what is happening in this landscape.
Regarding the materials
Fabrics are commodities that are essential to make the shoe breathable. The good news is that they are a commodity, extremely easy to find. The bad news is that custom tailoring still requires humans to be done and thus is expensive. Since we want to match the foot size exactly, this is a manufacturing issue to be tackled.
Same goes with assembly. While some will be printing the full shoe in one go, it’s a stronger and more limiting statement than just printing the sole for example. In those cases, there is still some form of assembly required. More human labor, more expenses.
My pair of Zellerfeld took about 5 months to be delivered. It’s highlighting the current state of mass-produced 3D-printed objects, which is still relatively uncharted territory.
Steel, and other materials that have stronger mechanical properties, easily cost 10 or 100 times the price of other materials. For example on PCBway, printing out a low-poly Pikachu will cost $3 in PLA and $300 in titaniun. This suggests that we’re far from mass-producing those efficiently.
And regarding the inputs
3D scanners are getting out of the woods. They used to cost a lot of money but now they are in high-end phones and soon will be in every single phone. The precision isn’t amazing yet but according to this case study it’s about 2mm which is just enough to make this relevant to foot sizing.
While very niche at the moment, there are products to measure foot size from a phone. We’ll stick it right at the border between custom and product on the map until the market grows a bit more in this field.
Software-assisted design is now the norm, however I don’t have the knowledge of a solution to industrially adapt any shoe design to any feet scan. This means that we’ll need a new tech to measure feet from 3D models as well as a tech to parametrize and fit designs with those measures.
So if you wanted to made 3D printed shoes today, you would need to organize yourself the following way:
Have your own workshop where you tailor and assemble the shoes
Buy all the 3D printing machines and supplies off-the-shelf
Have a in-house designer for your shoes
Develop a custom software to scan feet and generate appropriate shoes for them
Price point
Those 3D shoes can only be interesting if they bring enough value for the money. While it’s hard to find hard data about prices of such things — especially when compared to luxury where the sky is the limit — we can have a basic idea of the pricing for different levels of shoes:
$50-150 — Mass market, casual shoes
$150-300 — Mid-tier
$300-500 — Designer
$500-1000 — Luxury
$1000+ — Bespoke
Let’s note around here that most of the time, bespoke shoes do not allow the customization of the width, only of colors and materials.
Now, how much would it cost to produce a 3D printed shoe?
What we know:
According to Zellerfeld, a manufacturer of 3D-printed shoes mentioned above, the price point for 3D-printed sneakers is in the $150-300 range, so the cost of manufacturing can be anything between $0 and $300.
Decomposing our potential costs:
Given the price of TPE, if a pair of shoes weights about 1kg the costs of filaments will be about $80.
Add to that the cost of manufacture, let’s count $20/h and 3h of time to assemble the whole thing, we’re at $60.
And the cost of machines
Let’s count overall 10k$ to operate a machine for 10 years
Consider that printing a pair takes up to 7 full days
That’s $20/pair
And the cost of electricity estimated at $5
Let’s count an extra $20 for fabrics and other costs
We’re at a total of about $185/pair
Add up to that marketing, location, etc… You’re in the mid-range tier. This is with no financial optimization whatsoever.
This calculation is obviously simplistic, but it tells us that you could imagine producing fully fitted shoes in the sub-$500 range. This would create a whole new market for people that want bespoke shoes but don’t have a luxury budget. User research would be needed to confirm the exact price point, but this shows a lot of potential.
Current market
If you are looking to create a startup in this field, you need to look at everything currently in phase 2 on the map. That’s what startups usually do: take something that is custom-made and turn it into a product.
For that matter we’ll discard the following:
Steel/metal printing, as we’ve established have have cheaper alternatives
Design, since it sounds very costly to automate while the cost per shoe is extremely low at scale (one design is replicated many times).
The rest is discussed below.
Feet measurement
This field seems to have two kinds of players:
Those who do medical-ish measurements, with dedicated hardware. If they don’t have a plan to develop the second option already in place, they don’t know it but they are already dead.
The second option being people using the phone’s camera to scan feet. This seems to be the case of things like FitMate 3D or TRY.FIT2. Those ones are oriented towards helping online purchase to find the right fit.
In order to test the viability of the concept, I’ve given a go at TRY.FIT and results are quite convincing when compared with manual tape measure of my foot. This proves that not only this technology exists today but on top of that you can buy it for your product.
Parametric modeling
As expected, the shoe design is a pretty restricted discipline and so are shoe modeling tools. The main one I could find is Shoemaster, which allows for complete customization of the shoe albeit completely manually. Mostly, it looks like their software is an entry point to use their hardware.
Overall, lots of brands rolled out some form of 3D printed shoe — mostly as an experiment or without the custom fit feature — but I couldn’t figure anything about the way they do the parametric modeling. They probably all have an in-house solution which works with more or less flexibility and reliability.
In an extremely predictable way, solutions will emerge on the market and will become providers for most brands. The is an open boulevard for startups to fit sketches and designs onto real morphologies.
Which is true for shoes but generally speaking for anything in the world of fashion. For example the startup IMKI uses generative AI to design clothes which them become produced by The Kooples. Add the ability to transform this into a fabrication process automatically into the mix and you can produce an infinite amount of  bespoke, unique and on-brand clothes.
TPE printing
The other main, if not largest, blocker that is going to be met at scale is the setup of a printing farm.
According to feedback from people running those:
It is heavy in maintenance
Nothing is standardized, it’s extremely DIY
Labor cost is pretty high
In other terms, there is lot to be done to scale this up. It could definitely be the topic of one or more other studies. Especially given that the 3D printer landscape is evolving super rapidly and many exciting players appeared not only in the FDM space but also resin and powders.
Furthermore, as mentioned above, Zellerfeld seems to have huge production bottlenecks (about 5 months to send a pair of shoes). Showing that it is not easy.
Into the future
With the potential of a $200 bespoke shoe, there sure is a high risk in the coming years for disruption of that industry. Some actors might desire this outcome while some others might try to prevent it.
Shoe brands are a surprisingly diverse industry. I could find at least 9 brands above the $1 billion revenue threshold — including Crocs — and countless smaller brands. Although a brand like Nike with its $44b revenue weights quite a lot in front of New Balance or ASICS which are at about $4b each.
Within those, the main players for innovation would be Nike, Adidas, Reebok and New Balance. Big corporations being big corporations, you expect that at least most of them are going to develop and patent the software part detailed above.
As you’ll need to access customer measurments, communicate with them about production/delivery or tell them about new collections you can expect that a handful of players will emerge from this patent and market war and take a tax on every single pair of shoes — or clothing item for that matter — ever being sold.
In the meantime, whoever owns relevant IP on either of those topics will be greeted by a rain of cash, so here you go patent trolls I guess.
And this only covers the software side. On the production end of things, we can’t dive into the very complex question of 3D print farming but this will definitely a major driver in the upcoming years. This is a topic for another article!
This is the end! Grateful for your thorough reading. Subscribe for more!
1This is true for a category of luxury goods. Some justify their price with other properties like exceptional materials, unmatched durability or simply the authorized reproduction of a LVMH-owned logo on an otherwise cheap product.
2You can try it in the app of Joe Nimble



HTML — The Facade of Complexity
Rémy — Sat, 19 Aug 2023 09:48:47 GMT
The development community has seen a pretty recurring debate happening for the past 20+ years about whether HTML is a real programming language or not. I would like to propose that this debate is ridiculous. HTML is obviously not a programming language, it’s much more than that.
Or rather, a piece of much more. Nobody teaches you HTML in the void. You always need to pair it with at least CSS and usually JS drops pretty quickly into the conversation. Why is that back-end developers will mock HTML for being beneath them while failing spectacularly to build anything of significance with it?
What happens is that you make a neat little deal with the browser. In exchange for HTML, it will generate a document that can be read by a human being. Quickly arose from that:
The need to change how elements look, that’s CSS
The need to interact with the user’s actions, that’s JS
But no matter what happens on the JS side, you still modify the HTML or rather the DOM. In the end, HTML itself is simply a serialization of the browser’s current state. Which is an amazing abstraction. On one hand you have a living DOM that you can modify to reflect what you want to display. On the other, the CSS tells the browser how it’s supposed to look like.
From the developer’s point of view, you no longer need to worry about laying out and drawing components. From the browser’s point of view, it can organize itself to draw what it needs to in an optimal way — as opposed to a classic C++/Java/etc UI library, which is essentially bound to only react to instructions. Eventually most UI libraries came to this realization and developed their own declarative UI, like Qt with QtQuick or GTK with Clutter.
At first it was obviously very simple. All screens were 800x600px with a 1:1 pixel density and the web was thought as some kind of simple Word document. The table really turned in 2004 when Gmail proved you can have a full-fledged would-be desktop app running entirely in the browser. It has been a race to integrate as many abilities to the browser ever since, to the point where you can now talk to USB ports directly.
Which of course comes with a great deal of constraints. For starters, browsers run on different platforms and each platform has fundamentally different abilities. Then comes the security, because we can’t just give full access to unknown parties. And finally the goodwill of implementers, like when Safari took years to implement the file input. You could say that browser implementations come to an eventual consistency but the road is bumpy.
Given the current abilities of the Web Platform, what most people fail to realize is that the browser is akin to a new machine architecture — as in von Neumann architecture — that instead of being based on procedural instructions is rather directly reflecting its internal state to the user.
And just like regular computers which evolved from doing simple math until their current state with a galaxy of nuances in their abilities, the Web stands there with the same level of diversity. This is how HTML looks like an innocent simple thing which is in fact the facade of an extremely complex machine.
This author’s opinion on the matter is to stop splitting HTML, CSS and JS. They cannot be separated as they operate orthogonal aspects of the same machine, which is the Web Platform.
Share


Revisited: 10 rules to code like NASA (applied to interpreted languages)
Rémy — Thu, 17 Aug 2023 13:28:38 GMT
Foreword — Dear beginner, dear not-so-beginner, dear reader. This article is a lot to take in. You'll need perspective for it to make sense. Once in a while, take a step back and re-think about all the concepts explained here. They helped me a lot over the years, and I hope that they will help you too. This article is my interpretation of them for the work I do, which is mostly web-related development.
NASA's JPL, which is responsible for some of the most awesomest science out there, is quite famous for its Power of 10 rules (see original paper). Indeed, if you are going to send a robot on Mars with a 40 minutes ping and no physical access to it then you pretty damn well should make sure that your code doesn't have bugs.
These rules were made with embedded software in mind but why wouldn't everybody be able to benefit from this? Could we apply them to other languages like JavaScript and Python — and thus make web applications more stable?
Baby CTO is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
That's a question I have been considering for years and here is my interpretation of the 10 rules applied to interpreted languages and web development, revisited some time after the initial post, with comments in mind.
1 — Avoid complex flow constructs
Original rule — Restrict all code to very simple control flow constructs – do not use goto statements, setjmp or longjmp constructs, and direct or indirect recursion.
When you use weird constructs then your code becomes difficult to analyze and to predict. The generations that came out after goto was considered harmful did indeed avoid using it. We're at the stage where we're debating if continue is goto and thus should be banned.
My take on this is that continue in a loop is exactly the same as return in a forEach() (especially now that JS has block scoping) so if you're saying that continue is goto then you're basically closing your eyes on the issue. But that's a JS-specific implementation detail.
As a general rule you should avoid everything that is mind-bending or hard to spot because if your brain power is spent understanding the quirks of jumping around then you're not spending it on the actual logic and then you might be hiding some bugs without your knowledge.
I'll let you be the judge of what you put in that category but I would definitely put:
goto itself of course
PHP's continue and break used in conjunction with numbers, which is just pure insanity
switch constructs, because they usually require a break to close the block and I guarantee you that there will be bugs. A series of if/else if will do the same job in a non-confusing manner, as well as match-like constructs in languages like Python or Crablang.
Besides this, avoid of course recursions, for several reasons:
As they build on the call stack, whose size is very limited, you can't really control how deep your recursion can go. Even if your code is legit, it might fail because it recurses too much.
It’s easier to put safeguards when working in non-recursive mode — think explored paths or node IDs.
Do you get this feeling when doing recursions where you don't really know if your code is ever going to stop? It's very hard to imagine a recursion and to prove that it will stop correctly at the end.
It's also more compatible with the following rules to use an iterative algorithm instead of a recursive one, because you have more control (again) on the size of the problem you're dealing with.
As a bonus, recursions can often come as an intuitive implementation of an algorithm but is usually also far from optimal. By example we often ask in job interviews to implement the factorial function using a recursive function but that's far less efficient than an iterative implementation. Regular expressions too can be disastrous.
2 — All loops must have fixed bounds. This prevents runaway code.
Original rule — All loops must have a fixed upper-bound. It must be trivially possible for a checking tool to prove statically that a preset upper-bound on the number of iterations of a loop cannot be exceeded. If the loop-bound cannot be proven statically, the rule is considered violated.
The idea with this rule is the same as with the interdiction of recursions: you want to prevent runaway code. The way you implement this is by making sure it's trivial to prove statically that the loop won't exceed a given number of iterations.
Let's give an example in Python. You could do this:
def iter_max(it, max_iter):
    cnt = 0

    for x in it:
        assert cnt < max_iter
        yield x
        cnt += 1


def main():
    for i in iter_max(range(100), 10):
        print(i)
A language like Python will however limit the number of iterations by itself in many cases. So if you prove that the input lists won't be too long there is a bunch of cases where you don't need to do this.
A good application of that is pagination: make sure that you always work with pages that are of a reasonable size and this way you won't need loops that could run forever. Always think your code so it only works on a finite amount of data and let tools that were made for that handle infinity (like your DB engine).
3 — Avoid heap memory allocation
Original rule — Do not use dynamic memory allocation after initialization.
That makes of course no sense in interpreted languages where literally everything is allocated dynamically. But this doesn't mean that the rule does not apply to them. The core idea of the rule is that, beyond the tedious memory management techniques that you have to use in C, it's also very important to be able to fixate an upper bound in the memory consumption of your program.
So for interpreted languages it means that when you write your code, you should be able to know that given any accepted input the memory consumption won't go beyond a certain point.
While this can be hard to prove in an absolute manner, there is good clues and principles that you can follow. To be more specific and to repeat the previous sections, pagination is an essential technique. If you only work with pages and that you know that the content of each page is limited (DB fields have limited length and so on) then it's quite easy to prove that at least the data coming from those pages can be contained within an upper bound.
This is a powerful idea: load a full page of data into memory, work on it then let garbage collection discard it. It can even — under specific conditions — be a way to parallelize the work. Indeed, if you’ve managed to make your problem workable in pages, it means they can be processed independently.
4 — Restrict functions to a single printed page
Original rule — No function should be longer than what can be printed on a single sheet of paper in a standard reference format with one line per statement and one line per declaration. Typically, this means no more than about 60 lines of code per function.
This is about two different things.
First, the human brain can only fully understand so much logic and the symbolic page looks about right. While this estimation is totally arbitrary you'll find that you can easily organize your code into functions of about that size or smaller and that you can easily understand those functions. Nobody likes to land on a 1000-lines function that seems to do a gazillion things at the same time. We've all been there and we know it should not happen.
Second, when the function is small — or rather as small as possible — then you can worry about giving this function the least possible power. Make it work on the smallest unit of data and let it be a super simple algorithm. It will de-couple your code and make it more maintainable.
And let me emphasis on the arbitrary aspect of this rule. It works for the very reason that it is arbitrary. Someone decided that they don't want to see a function longer than a page because it's not nice to work with if it is any longer. And they've also noticed that it is doable. At first I rejected this rule but more than a decade later I must say that if you just follow either of the goals mentioned above then your code will always fit in a page of paper. So yes, it's a good rule.
The good news is that we can even push this idea further.
First of all, lines length is important. You want your code to fit in a half-screen in order to be able to read two files side-by-side without having to scroll horizontally. This puts the limit at 80-ish (86 is becoming increasingly popular).
And secondly, you probably want to keep below 5~10 your cyclomatic complexity (for example a max-complexity = 5 in Ruff’s settings.
Although predating the publication of the P10 paper, this complexity limit wasn’t included. my guess leans towards the complexity it would impress upon the writing of the rule which in its current state only is a few lines long. Furthermore, you need specific tools to review the cyclomatic complexity while everything mentioned in this paper can be hand-checked. It however echoes greatly with rule 1, 4 and 9 so my advice is definitely to land it into your coding guidelines.
5 — Use a minimum of two runtime assertions per function
Original rule — The assertion density of the code should average to a minimum of two assertions per function. Assertions are used to check for anomalous conditions that should never happen in real-life executions. Assertions must always be side-effect free and should be defined as Boolean tests. When an assertion fails, an explicit recovery action must be taken, e.g., by returning an error condition to the caller of the function that executes the failing assertion. Any assertion for which a static checking tool can prove that it can never fail or never hold violates this rule. (I.e., it is not possible to satisfy the rule by adding unhelpful "assert(true)" statements.)
That one is tricky because you need to understand what would count as an assertion.
In the original rules, assertions are consider to be a boolean test done to verify "pre- and post- conditions of functions, parameter values, return values of functions, and loop-invariants". If the test fails then the function must do something about it, typically returning an error code.
In the context of C or Go it is mostly as simple as this. In the context of almost every other language it means raising an exception. And depending on the language, a lot of those assertions are made automatically.
To give Python as an example, you could do this:
assert "foo" in bar
do_something(bar["foo"])
But why bother when the fact of doing this will also raise an exception?
do_something(bar["foo"])
For me it's always very tempting to make as if the input value was always right by falling back to defaults when the input is crap. But that's usually not helpful. Instead, you should let your code fail as much as possible and use an exception reporting tool (I personally love Sentry but there is plenty out there). This way you'll know what goes wrong and you'll be able to fix your code.
Of course, this means that your code will fail at runtime. But it's all right! Runtime is not production time. If you test your application extensively before sending it to production, this will allow you to see most of the bugs. Then your real users will also encounter some bugs, but you will also be informed of them, instead of things failing silently.
As a side-note, if you don't have control over the input, like if you're doing an API by example, it's not always a good idea to fail. Raise an exception on incorrect input and you'll get an error 500 which is not really a good way to communicate bad input (since it would rather be something in the range of the 4xx status codes). In that case you need to properly validate the input before hand. However depending on who's using the code you might or might not want to report the exceptions. A few examples:
An external tool calls your API. In that case you want to report exceptions because you want to know if the external tool is going sideways.
Another of your services calls your API. In that case you also want to report exceptions as it's yourself doing things wrong.
The general public calls your API. In that case you probably don't want to receive an email every time that someone does something wrong.
In short it's all about knowing about the failures that you will find interesting to improve your code stability.
6 — Restrict the scope of data to the smallest possible.
Original rule — Data objects must be declared at the smallest possible level of scope.
In short, don't use global variables. Keep your data hidden within the app and make it so that different parts of the code can't interfere with each other.
You can hide your data in classes, modules, second-order functions, etc.
One thing though is that when you're doing unit testing then you'll notice that this sometimes backfires to you because you want to set that data manually just for the test. This might mean that you need to hide your data away but keep a way to change it which you conventionally won't use. That's the famous _name in Python or private in other languages (which can still be accessed using reflection).
7 — Check the return value of all non-void functions, or cast to void to indicate the return value is useless.
Original rule — The return value of non-void functions must be checked by each calling function, and the validity of parameters must be checked inside each function.
In C, the mostly-used way of indicating an error is by the return value of the corresponding function (or by reference into an error variable). However, with most interpreted languages it's simply not the case since errors are indicated by an exception. Even PHP 7 improved that (even if you still get warnings printed as HTML in the middle of your JSON if you do something non-fatal).
So in truth this rule is: let errors bubble up until you can handle them (by recovering and/or logging the error). In languages that have exceptions it's pretty simple to do, simply don't catch the exceptions until you can handle them properly.
See it another way: don't catch exceptions too early and don't silently discard them. Exceptions are meant to crash your code if needs to be and the proper way to deal with exceptions is to report them and fix the bug. Especially in web development where an exception will just result in a 500 response code without dramatically crashing the whole front-end.
8 — Use the preprocessor sparingly.
Original rule — The use of the preprocessor must be limited to the inclusion of header files and simple macro definitions. Token pasting, variable argument lists (ellipses), and recursive macro calls are not allowed. All macros must expand into complete syntactic units. The use of conditional compilation directives is often also dubious, but cannot always be avoided. This means that there should rarely be justification for more than one or two conditional compilation directives even in large software development efforts, beyond the standard boilerplate that avoids multiple inclusion of the same header file. Each such use should be flagged by a tool-based checker and 
justified in the code.
In C code, the macros are a particularly efficient way to hide the mess. They allow you to generate C code, mostly like you would write a HTML template. It's easy to understand that it's going to be used sideways and actually you can have a look at the IOCCC contestants which usually make a very heavy use of C macros to generate totally unreadable code.
However C (and C++) is mostly the only mainstream language making use of this, so how would you translate this into other languages? Did we get rid of the problem? Does compiling code into other code that will then be executed sound familiar to someone?
Yes, I'm talking about the huge pile of things we put in our Webpack configurations.
The initial rule recognizes the need for macros but asks that they are limited to "simple macro definitions". What is the "simple macro" of Webpack? What is the good transpiler and the bad transpiler?
My rationale is simple:
Keep the stack as small as possible. The less transpilers you have the less complexity you need to handle.
Stay as mainstream as possible. By example I always use Webpack to transpile my JS/CSS, even in Python or PHP projects. Then I use a simple wrapper around a manifest file to get the right file paths on the server side. This allows me to stay compatible with the rest of the JS world without having to write more than a simple wrapper. Another way to put it is: stay away from things like Django Pipeline.
Stay as close as possible from the real thing. Using ES6+ is nice because it's a superset of previous JS versions, so you can see transpiling as a simple layer of compatibility. I wouldn't recommend however to transpile Dart or Python or anything like that into JS.
Only do it if it brings an actual value for your daily work. By example, CoffeeScript is just an obfuscated version of JavaScript so it's probably not worth the pain, while something like Stylus/LESS/Sass bring variables and mixins to CSS will help you a lot to maintain CSS code.
You're the judge of good transpilers for your projects. Just don't clutter yourself with useless tools that are not worth your time.
9 — Limit pointer use to a single dereference, and do not use function pointers.
Original rule — The use of pointers should be restricted. Specifically, no more than one level of dereferencing is allowed. Pointer dereference operations may not be hidden in macro definitions or inside typedef declarations. Function pointers are not permitted.
Anybody who's done C beyond the basic examples will know the headache of pointers. It's like inception but with computer memory, you don't really know how deep you should follow the pointers.
The need for that is, by example, the qsort() function. You want to be able to sort any type of data but without knowing anything on them before compiling. Have a look at the signature:
void qsort( void *ptr, size_t count, size_t size,
            int (*comp)(const void *, const void *) );
It's one if the most frighteningly unsafe things you'll ever see in a standard library documentation. Yet, it allows the standard library to sort any kind of data, which other more modern language still have a little bit awkward solutions.
But of course when you open the gate for this kind of things, you open the gate to any kind of pointer madness. And as you know, when a gate is open then people will go through it. Hence this rule for C.
However what about our case of interpreted languages? We will first cover why references are bad and then we will explain how to accomplish the initial intent of writing generic code.
Don't use references
Pointers don't exist but some ancient and obscure languages like PHP still thought that it would be a good idea to have it. However, most of the other languages will only use a strategy named call-by-sharing. The idea is — very quickly — that instead of passing a reference you will pass objects that can modify themselves.
The core point against references is that, beyond being memory unsafe and crazy in C, they also produce side-effects. By example, in PHP:
function read($source, &$n) {
    $content = // some way to get the content
    $n = // some way to get the read length

    return $content;
}

$n = 0;
$content = read("foo", $n);

print($n);
That's a common, C-inspired, use-case for references. However, what you really want to do in this case is
function read($source) {
    $content = // some way to get the content
    $n = // some way to get the read length

    return [$content, $n];
}

list($content, $n) = read("foo");

print($n);
All you need is two return values instead of one. You can also return data objects which can fit any information you want them to fit and also evolve in the future without breaking existing code.
And all of this without affecting the scope of the calling function, which is rather nice.
Another safety point though is when you're modifying an object then you're potentially affecting the other users of that object. That's by example a common pitfall of Moment.js. Let's see.
function add(obj, attr, value) {
    obj[attr] = (obj[attr] || 0) + value;
    return obj;
}

const a = {foo: 1};
const b = add(a, "foo", 1);

console.log(a.foo); // 2
console.log(b.foo); // 2
On the other hand you can do:
function add(obj, attr, value) {
    const patch = {};
    patch[attr] = (obj[attr] || 0) + value;
    return Object.assign({}, obj, patch);
}

const a = {foo: 1};
const b = add(a, "foo", 1);

console.log(a.foo); // 1
console.log(b.foo); // 2
Both a and b stay distinct objects with distinct values because the add() function did a copy of a before returning it.
Let's conclude this already-too-long section with the final form of the rule:
Don't mutate your arguments unless the explicit goal of your function is to mutate your arguments. If you do so, do it by sharing and not by reference.
That would by example be the no-param-reassign rule in ESLint as well as the Object.freeze() method. Or in Python you can use a NamedTuple in many cases.
Note on performance: if you change the size of an object then the underlying process will basically be to allocate a new contiguous region of memory for it and then copy it. For this reason, a mutation is often a copy anyways, so don't worry about copying your objects.
Leverage the weak-ish dynamic typing
Now that we closed the crazy door of references, we still need to write generic code if we want to stay DRY.
The good news is that while compiled languages are bound by the rules of physics and the way computers work, interpreted languages can have the luxury of putting a lot of additional support logic on top of that.
Specifically, they mostly rely on duck typing. Of course you can add some level of static type checking like TypeScript, Python's type hints or PHP's type declarations. Using the wisdom of other rules:
Rule 5 — Make many assertions. Expecting something from an object which doesn't actually have it will raise an exception, which you can catch and report.
Rule 10 — No warnings allowed (explained hereafter). Using the various type checking mechanisms you can rely on a static analyzer to help you spot errors that would arise at runtime.
Those two rules will protect you from writing dangerous generic code. Which would result in the following rule
You can write generic code as long as you use as many tools as possible to catch mistakes, and especially you need to follow rules 5 and 10.
10 — Compile with all possible warnings active; all warnings should then be addressed before release of the software.
The initial full rule is:
All code must be compiled, from the first day of development, with allcompiler warnings enabled at the compiler’s most pedantic setting. All code must compile with these setting without any warnings. All code must be checked daily with at least one, but preferably more than one, state-of-the-art static source code analyzer and should pass the analyses with zero warnings.
Of course, interpreted code is not necessarily compiled so it's not about the compiler warnings per se but rather about getting the warnings.
There is fortunately a great amount of warning sources out there:
All the JetBrains IDEs are pretty awesome at finding out issues in your code. Recently, those IDE taught me a lot of patterns in different languages. That's really the main reason why I prefer something like this to a simplistic code editor: the warnings are very smart and helpful.
Linters for all the languages
JavaScript — eslint with a set of rules AirBnB maybe?
Python — You can go full steam on Ruff and pick the rules that suit you
Automated code review tools like SonarQube
Spell checkers are also surprisingly important because they will allow you to sniff out typos regardless of type analysis or any complicated static code analysis. It's a really efficient way to not lose hours because you typed reuslts instead of results.
The main thing about warnings is that you must train your brain to see them. A single warning in the IDE will drive me mad while on the other hand I know people that just won't see them.
A final point on warnings is that on the contrary of compiled languages, warnings here are not always 100% certain. They are more like 95% certain and sometimes it's just an IDE bug. In that case, you should explicitly disable the warning and if possible give a small explanation of why you're sure that you don't need to apply this warning. However, think well before doing so because usually the IDE is right.
Key takeaways
The long discussion above tells us that those 10 rules were made for C and while you can use there philosophy in interpreted languages you can't really translate them into 10 other rules directly. Let's make our new power of 10 + 2 rules for interpreted languages.
Rule 1 — Don't use goto, rationalize the use of continue and break, use match instead of switch.
Rule 2 — Prove that your problem can never create runaway code.
Rule 3 — To do so, limit the size of it. Usually using pagination, map/reduce, chunking, etc.
Rule 4 — Make code that fits in your head. If it fits in a page, it fits in your head.
Rule 5 — Check that things are right. Fail when wrong. Monitor failures. See rule 7.
Rule 6 — Don't use global-ish variables. Store data in the smallest possible scope.
Rule 7 — Let exceptions bubble up until you properly recover and/or report them.
Rule 8 — If you use transpilers, make sure that they solve more problems than they bring
Rule 9.1 — Don't use references even if your language supports it
Rule 9.2 — Copy arguments instead of mutating them, unless it's the explicit purpose of the function
Rule 9.3 — Use as many type-safety features as you can
Rule 10 — Use several linters and tools to analyze your code. No warning shall be ignored.
And if you take a step back, all of those rules could be summed up in one rule to rule them all.
Your computer, your RAM, your hard drive even your brain are bound by limits. You need to cut your problems, code and data into small boxes that will fit your computer, RAM, hard drive and brain. And that will fit together.
— Morpheus Me
I consider that to be the core rule of programming and I apply it as an universal rationale to everything I do which is computer-related.
Baby CTO is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.



Navigating the Tech Labyrinth: Welcome to Baby CTO!
Rémy — Thu, 17 Aug 2023 11:12:27 GMT
Greetings, tech enthusiasts and leaders!
Embarking on this new journey, Baby CTO is a culmination of diverse experiences, keen observations, and unwavering passion for the world of technology. If you're reading this, you've shown interest in diving deeper, and for that, I extend my heartfelt gratitude.
Baby CTO is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
A Confluence of Cultures & Skills:
My background in telecommunications engineering from Telecom Lille 1 was just the beginning. The cultural nuances and tech landscapes of France, Spain, and India have enriched my understanding, granting me a holistic view of how technology integrates with varied societal fabrics.
From Ventures to Valuable Lessons:
Every attempt, be it my entrepreneurial endeavor into a next-generation travel guide or working with an eclectic mix of businesses at WITH, my tech agency, has been a stepping stone. While some ventures saw the sunset too soon, each provided invaluable insights into what truly works in the unpredictable realm of startups and technology.
An Invitation to Engage:
Baby CTO isn't just about sharing my stories. It's an open platform to discuss, debate, and delve into the intricacies of tech. From granular code insights to broad strategic reviews, the aim is to shed light on both the micro and macro aspects of our tech-driven world.
The Road Ahead:
This is just the beginning. With every post, we'll explore the labyrinthine corridors of the tech world, unveiling hidden insights, busting myths, and above all, learning together.
To each one of you who've taken a moment to subscribe, be it free or paid, know that you're the heartbeat of Baby CTO. Your support, feedback, and engagement are what will shape the future of this platform.
Join me, as we unravel the mysteries, celebrate the successes, and learn from the missteps in this dynamic world of technology. Welcome to Baby CTO, where our collective journey has just begun.
To tech and beyond!
Warmly,
Rémy
Baby CTO is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.



Coming soon
Rémy — Thu, 17 Aug 2023 09:31:26 GMT
This is Baby CTO.
Subscribe now