▶ /notes / ja4-fingerprinting
JA4: fingerprinting the handshake, not the headers
Every anti-bot rule that reads the User-Agent header has the same weakness: the attacker wrote that header. Request headers are free-form text supplied by the client, which means they are exactly as trustworthy as the client itself. A curl script can claim to be Chrome 137 on Windows in one line. The interesting question for defenders is: what does a client reveal about itself that it did not consciously choose to send?
One of the best answers lives below HTTP entirely. Before a single header crosses the wire, the client and server negotiate TLS. The very first packet of that negotiation — the ClientHello — is a structured list of everything the client's TLS stack supports: protocol versions, cipher suites in preference order, extensions, elliptic curves, ALPN protocols. None of it is secret, and almost none of it is commonly configurable. It is a byproduct of which TLS library you compiled against and how it was built.
▶From JA3 to JA4
JA3, introduced by Salesforce researchers in 2017, hashed those ClientHello fields into a single MD5 string. It worked well until it didn't: Chrome started randomizing extension order specifically to break this kind of passive fingerprinting, and a raw hash gives you no partial information — change one bit and the whole fingerprint changes.
JA4, part of John Althouse's JA4+ suite, fixes both problems. It sorts extensions before hashing (so randomization no longer matters) and is built from three human-readable segments instead of one opaque hash. A JA4 like t13d1516h2_8daaf6152771_b0da82dd1658 tells you at a glance: TLS 1.3, desktop client, 15 cipher suites, 16 extensions, ALPN h2. The segments degrade gracefully — two clients that share a TLS library but differ in ALPN will match on the first two segments and differ in the third.
▶Why bots fail it
The practical power of JA4 is the mismatch test. A request whose User-Agent claims Chrome-on-Windows should produce the JA4 of Chrome's BoringSSL build. If instead it produces the fingerprint of Python's ssl module, Go's crypto/tls, or OpenSSL-via-curl, the headers are lying — and that one signal is worth more than a hundred header heuristics. Most off-the-shelf scraping stacks fail exactly here, because faking the ClientHello means replacing your TLS library, not editing a string.
- —requests/httpx (Python): distinctive OpenSSL-derived fingerprints, trivially separable from browsers
- —Go HTTP clients: crypto/tls has its own recognizable ClientHello shape
- —Headless Chrome: matches real Chrome — which is why TLS alone is never the whole answer
▶The honest limits
JA4 is one signal, not a verdict. Impersonation libraries (curl-impersonate, utls and friends) can replay a browser's ClientHello byte-for-byte, and headless browsers pass by construction because they are the real TLS stack. Fingerprints also collide by design: every Chrome 137 on every machine looks alike, so JA4 can tell you what is talking, never who. In production you treat it as one column in a wider matrix — combine it with header order, IP reputation, and behavioral signals, and weight disagreements between layers heavily. A client whose layers disagree about what it is, is almost never human.
If you want to see your own JA4, this site's bot-check page shows the digest Vercel's edge computes for your connection — the same one a defender would see.