Walking Authentik through ten major versions

An identity provider eighteen months behind, an upstream that forbids version-skipping, and the migration plan that got it current without losing a single SSO session.


My self-hosted Authentik instance was pinned to a tag from August 2024. Eighteen months of release cycles later, that’s ten major-version hops to current. Authentik’s upstream guidance is explicit: do not skip versions. This is how I walked it through all ten hops in an afternoon, including a mid-chain migration of custom CSS from a bind-mount into a native database field, without anyone noticing their SSO had changed underneath them.

Why it got behind in the first place

The image tag was hard-pinned in the stored compose file. :latest is a footgun and I never use it on a critical stack — Authentik is the front door to every other service I run. Pinning is the right instinct. Not updating the pin is the wrong follow-up, and that’s what happened.

The pin sat at 2024.8.3 for eighteen months. By the time I sat down to update, the gap looked like this:

       2024.8.3 ──────────────────────────────────── 2026.2.2
       (current)                                     (target)

       eighteen months
       ten major-version boundaries
       hundreds of database migrations
       four breaking changes
       one CSS-system refactor

Authentik’s update guidance is unambiguous: upgrade sequentially through every major. No skipping. The reason is database migrations — each major has its own set, and they’re written assuming the previous major’s schema. Skip a major and you’re applying migrations against a schema they weren’t tested against. The official-docs language is “may result in irrecoverable state.”

I read that sentence twice and went to plan the chain.

The chain

Ten hops, in order, every patch the most recent within its major:

 1.  2024.8.3   →  2024.8.6     (patch within current major)
 2.  2024.8.6   →  2024.10.5
 3.  2024.10.5  →  2024.12.5
 4.  2024.12.5  →  2025.2.4
 5.  2025.2.4   →  2025.4.4    (custom CSS migration to brand DB field)
 6.  2025.4.4   →  2025.6.4    (theming overhaul, highest visual risk)
 7.  2025.6.4   →  2025.8.6
 8.  2025.8.6   →  2025.10.4   (channels/cache moved to Postgres)
 9.  2025.10.4  →  2025.12.4
10.  2025.12.4  →  2026.2.2    (final target)

The chain has three notable hops in it:

The per-hop drill

Every hop followed the same loop. I made the discipline strict before I started, because the only way to get through ten hops without a mistake is to do the exact same thing ten times.

1. Update the image tag in Komodo's stored file_contents.
2. komodo_deploy_stack.
3. Wait for all four containers to report (healthy).
4. Verify (four checks):
     a. Container health: docker ps | grep authentik | grep healthy
     b. Login flow:        curl https://auth.example.com/if/flow/default-authentication-flow/ → 200
     c. OIDC discovery:    curl .../.well-known/openid-configuration → 200
     d. Log scan:          docker logs authentik_server 2>&1 | grep -iE 'error|trace' → expected boot-noise only
5. Migration audit (docker logs authentik_worker): list every migration that ran, compare
   against the release notes' "expected migrations" section.
6. Commit. The branch is `chore/upgrade-to-2026.2`; one commit per hop.

The migration audit step was the one that paid for itself. Reading the worker logs after each hop, against the release notes, caught two cases where a migration didn’t apply cleanly the first time — once a transient Postgres connection blip, once a genuinely flaky migration that retries handled. Without the audit, I’d have found out about both via a user error report a week later.

Total wall-clock per hop: about 12 minutes. Ten hops in an afternoon, with a coffee break.

The CSS migration, mid-chain

The most interesting part of the chain wasn’t a version hop. It was the moment at step 5 when a new field appeared in the schema and a piece of my deployment had to migrate out of one place and into another.

My custom Authentik theme — glass-morphism login container, multicolour SSO icons, hidden Authentik logo — had been delivered as a custom.css file bind-mounted into the server container at /web/dist/custom.css. Worked fine for two years. Two problems:

  1. The bind mount is brittle. Container restarts, image updates, volume changes — any of them could lose the mount, and the login page would silently render without the theme.
  2. The 2025.6 theming refactor (one hop later) replaced the static-asset loading path. A bind mount that worked at 2025.4 might not have worked at 2025.6.

At step 5, the authentik_brands.0008_brand_branding_custom_css migration introduced a branding_custom_css TEXT column on the brand table. Now there was a first-class place for custom CSS to live: in the database, attached to the brand, durable across image updates.

Migrating to it was a one-line SQL update, with a dollar-quoted string to safely embed the CSS file’s contents:

UPDATE authentik_brands_brand
SET branding_custom_css = $ak_css$
/* the full 4117 bytes of custom.css go here, verbatim */
$ak_css$
WHERE domain = 'auth.example.com';

The $ak_css$ delimiter is $$-quoted SQL — anything between the two $ak_css$ markers is literal, regardless of single-quotes, dollar signs, or any other punctuation in the CSS itself. The delimiter exists in zero CSS files in the world, which is exactly the property I wanted.

Then: remove the bind mount from the compose file, redeploy, verify the flow page HTML still contains the distinctive selectors (glass-morphism, backdrop-filter, etc.), and only then cross the step-6 theming hop.

What broke (briefly)

Two moments worth remembering.

Step 3 returned a 502 for a few seconds. All four containers were healthy, the worker logs were clean, but the external URL was returning 502 from Cloudflare’s edge. The cause: cloudflared had cached a connection to the old container, which had just been recreated, and was still rebuilding its connection pool. Internal curl 127.0.0.1:9000 was already returning 200; the external 502 resolved on its own within thirty seconds. Wait before panicking is a discipline I had to enforce on myself for the rest of the chain.

Step 8 made the worker logs three times noisier. The channels-to-Postgres migration kicks off a stream of “channel layer reconnection” messages while it warms up. None of them is an error. All of them look like errors at a glance. I wasted ten minutes reading the logs carefully to convince myself nothing was wrong, then moved on.

The thing Komodo did that I’m still grateful for

Komodo (the container-manager I use) is configured with destroy_before_deploy: true on this stack. That means every redeploy destroys and recreates all four containers — server, worker, postgres, redis. For a stateless service this would be obvious-overkill. For Authentik, where every hop runs database migrations as part of container start, it was exactly the right behaviour. The fresh container on every step meant the migration runner started from a known-good state, and I never had a “stale process from the previous version is holding a connection” issue.

The flip side: every hop briefly takes the service offline. About 20 seconds, during which the login page returns 502. If you have users actively signing in during a hop, they will notice. I did all ten hops at 3pm on a Sunday, when the only “user” was me.

What this cost in real time

Ten hops × ~12 minutes per hop = ~2 hours of focused work. The CSS migration added 30 minutes. Reading release notes and pre-flighting the chain added another hour. Total: about 4 hours. The instance is now current.

The 18 months of deferred updates that got it to this point: probably 30 minutes of “update the pin once a quarter” attention I didn’t pay. The lesson is the standard one — incremental maintenance is enormously cheaper than catch-up maintenance — and the only reason I’m sharing it is because I knew this when I started, did it anyway, and got bitten exactly the way I expected to.

Three things I’d tell past-me

Set a calendar reminder. Once a quarter, look at the pinned versions of the four or five critical services on your stack. Decide whether the gap is acceptable. If it isn’t, schedule a Sunday afternoon. The version of past-me who skipped this for 18 months thought they were “saving” the time. They weren’t.

Read the release notes for every major between current and target. Not all of them. Just every major. Look for the words breaking, removed, deprecated. Note any custom integrations that touch the features mentioned. Plan the hops around those notes — sometimes a custom integration needs to migrate between two specific versions, not before or after.

Commit per hop. Every hop should be a commit. The branch name should mention the target version. If something blows up at hop 7, you want a one-command revert to hop 6, not a guessing game about what state you were in. The discipline costs ten seconds per hop. Not having it costs an afternoon.

What’s actually nice about this kind of project

It’s the rare kind of self-hosting work that has a clear, satisfying definition of done. You start at one version. You end at another. Every step in between is binary — it either worked or it didn’t. The dashboard at the end says 2026.2.2 and the login page works and the brand CSS renders and you close the laptop.

If you’ve been deferring an update like this — yes, the chain looks intimidating. Yes, the documentation will make you nervous. Yes, you will read “irrecoverable state” twice and wonder if you should just nuke and reinstall. The truthful answer is: spend an afternoon, write the chain down, do every hop the same way, audit the migrations as you go, commit between each one. It works. It’s just patient work, and it doesn’t reward shortcuts.


The CSS migration via dollar-quoted SQL is one of those things you only need once and which is exactly correct for the job. Filing it under “tricks I should remember.”