Idempotency Is Easy Until the Second Request Is Different

(blog.dochia.dev)

85 points | by ludovicianul 3 days ago

14 comments

mmillin 1 hour ago
This is an excellent article, I’ve seen almost all of the issues it calls out in production for various APIs. I’ll be saving this to share with my team.
I’ve seen two separate engineers implement a “generic idempotent operation” library which used separate transactions to store the idempotency details without realizing the issues it had. That was in an organization of less than 100 engineers less than 5 years apart.
One other thing I would augment this with is Antithesis’ Definite vs Indefinite error definition (https://antithesis.com/docs/resources/reliability_glossary/#...). It helps to classify your failures in this way when considering replay behavior.
ordu 17 minutes ago
Well, it is "reality has a surprising amount of detail"[1] all over again. Or rather a good specific example for it.
[1] http://johnsalvatier.org/blog/2017/reality-has-a-surprising-...
DarenWatson 18 minutes ago
A couple of years ago, we experienced a silent data corruption incident in our checkout process due to this specific edge case.
A user would generate the idempotency key by loading the front-end application, adding item(s) to their cart, submitting their order but timing out. The user would then navigate back to the front-end application and add another item and submit the order again. Since the user is submitting an identical idempotency key to the same transaction, our payment gateway would look up the request/transaction by idempotency key and see in its cache that there was a successful (200 OK) response to the previous request. The user now believes they purchased three items, however, our system only charged and shipped on two of the orders.
Consequently, the lesson we take away from the aforementioned incident is idempotency keys are really composite keys (Client_Provided_Key + Hash(Request_Payload)).
If a system receives an identical idempotency key (but with a different request payload) the idempotency key should be rejected with a 409 Conflict response with a message similar to "Idempotency key already used with different request payload". Alternatively, some teams argue it should be returned with a 400 Bad Request response. Systems should never return a failed cache response or replace old entries of data.
This article explains how to unlock your flow. The final idempotent key will not be located until the first request completes, but will rather exist when the request is in progress.
To safely accomplish your goal, you have to follow the following steps:
1. Acquire a distributed lock on the idempotent key.
2. Check for the existence of a key in your persistent store.
3. If an existing key is found, verify the hash of the payload against the hash for the payload type. If the hashes do not match, return a 409 error.
4. If the hashes match, look up the status of the payload. If the status shows COMPLETED in the persistent store, return the cached response. If the status shows PENDING in the persistent store, return a 429 Too Many Requests to the user or hold the connection open until the request reaches a PENDING state.
5. After processing the request, save the response to the persistent store before releasing the lock.
While this may look simple on paper, creating a distributed locking state machine for a single API endpoint is typically how developers have their first aha moments with idempotency. Becoming idempotent is often an enormous architectural shift and not just a middleware header check.
shiandow 1 hour ago
This seems to assume retrying a command should result in the same response, but I am not sure I agree.
Idempotency is about state, not communication. Send the same payment twice and one of them should respond "payment already exists".
[-]
- Jolter 1 hour ago
  I don’t know if we’re reading the same article? The linked one states very plainly:
  ”Idempotency is about the effect
  An operation is idempotent if applying it once or many times has the same intended effect.”
  [-]
  - shiandow 46 minutes ago
    I do not disagree with their definition of idempotency, but they silently assume resending the same result is the default. They discus this later on in the article but they do not seem to question why that might not be a good idea in the first place.
    Edit: Perhaps it is my mental model that is different. I think it makes most sense to see the idempotency key as a transaction identifier, and each request as a modification of that transaction. From this perspective it is clearer that the API calls are only implying the expected state that you need to handle conflicts and make PUTs idempotent. Making it explicit clarifies things.
    The article actually ends up creating the required table to make this explicit, but the API calls do not clarify their intent. As long as the transaction remains pending you're free to say "just set the details to X" and just let the last call win, but making the state final requires knowing the state and if you are wrong it should return an error.
    If you split this in two calls there's no way to avoid an error if you set it from pending to final twice. So a call that does both at once should also crash on conflicts because one of the two calls incorrectly assumed the transaction was still pending.
- raffael_de 38 minutes ago
  > Send the same payment twice and one of them should respond "payment already exists".
  You are hiding the relevant complexity in the term "same". What is here the same? I mean, if accidentally buy only 1 instead of two items of a product and then buy afterwards again 1 item. How is this then the same or not the same payment?
  [-]
  - mrkeen 35 minutes ago
    > What is here the same?
    The idempotency key of the request
    [-]
    - raffael_de 31 minutes ago
      How and based on what is the idempotency key calculated which the clients sends with its request? In my double-purchase example above: when would the second purchase be requested with the same key or not?
      [-]
      - Xenoamorphous 17 minutes ago
        If it’s a retry of the same request it should have the same key. If it’s not a retry, a different one. I don’t see the issue.
        If the client sends the same key but a different payload that’s a 400 or 409 in my eyes.
      - shiandow 18 minutes ago
        It shouldn't, an error would be the right response.
      - mrkeen 18 minutes ago
        1) Fresh UUID
        2) Client's choice
- cocoto 1 hour ago
  In your example, idempotency means same request + same state = same response. State becomes part of the request, that’s why it is hard.
  [-]
  - shiandow 1 hour ago
    That's just deterministic behaviour.
    For idempotency you literally just want f(state) = f(f(state)). Whether you achieve this by just doing the same thing twice (no external effects) or doing the thing exactly once (if you do have side effects) is not important.
    But if you have side effects and need something to happen exactly once it seems a lot more useful to communicate this, rather than pretending you did the thing.
    [-]
    - adrianmsmith 50 minutes ago
      > But if you have side effects and need something to happen exactly once it seems a lot more useful to communicate this, rather than pretending you did the thing.
      I think it depends on whether the sender needs to know whether the thing was done during the request, or just needs to know that the thing was done at all. If the API is to make a purchase then maybe all the caller really needs to know is "the purchase has been done", no matter whether it was done this time or a previous time.
      And in terms of a caller implementing retry logic, it's easier for the caller to just retry and accept the success response the second time (no matter if it was done the second time, or actually done the first time but the response got lost triggering the retry).
    - huflungdung 1 hour ago
      [dead]
raffael_de 41 minutes ago
Idempotency means f(x) = f(f(x)).*
Here x is interpreted as state and f an action acting on the state.
State is in practice always subjected to side effects and concurrency. That's why if x is state then f can never be purely idempotent and the term has to be interpreted in a hand-wavy fashion which leads to confusions regarding attempts to handle that mismatch which again leads to rather meandering and confusing and way too long blog posts as the one we are seeing here.
*: I wonder how you can write such a lengthy text and not once even mention this. If you want to understand idempotency in a meaningful way then you have to reduce the scenario to a mathematical function. If you don't then you are left with a fuzzy concept and there isn't much point about philosophizing over just accepting how something is practically implemented; like this idempotency-key.
[-]
- Hendrikto 12 minutes ago
  > That's why if x is state then f can never be purely idempotent
  That is simply not true. f could be, for example, “set x.variable to 7”, which is definitely idempotent.
- toolslive 8 minutes ago
  > *
  I wondered about this too. Also, why was it framed in the context of JSON based RPC over HTTP ?
calmoo 14 minutes ago
I think this article (and the author's previous articles on their blog) is quite clearly AI written. It has such a frustratingly punctuated cadence and really does not serve the reader anything valuable.
mrkeen 50 minutes ago
The point of idempotency is safe retries. Systems are completely fallible, all the way down to the network cables.
The user wants something + the system might fail = the user must be able to try again.
If the system does not try again, but instead parrots the text of the previous failure, why bother? You didn't build reliability into the system, you built a deliberately stale cache.
[-]
- mrkeen 41 minutes ago
  "Idempotency" feels like "encapsulation" all over again.
  Take a good principle like 'modules should keep their inner workings secret so the caller can't use it wrong', run it through the best-practise-machine, and end up with 'I hand-write getters and setters on all my classes because encapsulation'.
zinkem 1 hour ago
Idempotency is easy if you don't use mutable state in your middleware.
Auth, logging, and atomicity are all isolated concerns that should not affect the domain specific user contract with your API.
How you handle unique keys is going to vary by domain and tolerance-- and its probably not going to be the same in every table.
It's important to design a database schema that can work independently of your middleware layer.
[-]
- ahoka 22 minutes ago
  So idempotency is easy if your service does not do anything useful?
chaz6 1 hour ago
You keep the hash of the request so that you can reject a subsequent request with a different body. This has helped me surface bugs and data issues in other systems.
syntex 29 minutes ago
yes I always thought it's an easy thing. but I changed my mind recently when I had to deal with it.
A lot little things you need to think of. For example.
Client sends a request. The database is temporarily down. The server catches the exception and records the key status as FAILED. The client retries the request (as they should for a 500 error). The server sees the key exists with status FAILED and returns the error again-forever. Effectively "burned" the key on a transient error.
others like:
- you may have Namespace Collisions for users... (data leaks) - when not using transactions only redis locking you have different set of problem - the client needs to be implmented correctly. Like client sees timout and generates a new key, and exactly once processing is broken - you may have race conditions with resource deletes - using UUID vs keys build from object attributes (different set of issues)
I mean the list can get very long with little details..
WilcoKruijer 40 minutes ago
I really hate the POST verb for RESTish APIs because it cannot be idempotent without implementing an idempotency layer. Other verbs are naturally idempotent. Has anyone tried foregoing POST routes entirely? Theoretically you can let the client generate an ID and have it request a PUT route to create new entities. This would give you a tiny amount of extra complexity on the client, but make the server simpler as a trade-off.
stavros 2 hours ago
Half of the mentioned issues are issues of atomicity, not idempotency. If I make a request, and the server crashes midway and doesn't send some crucial events, that's an issue whether or not I send a second request.
From a cursory read, only the part up to "what if the second request comes while the first is running" is an idempotency problem, in which case all subsequent responses need to wait until the first one is generated.
Everything else is an atomicity issue, which is fine, let's just call it what it is.
[-]
- behaviors 25 minutes ago
  If the atomic action is idempotent, you don't need a layer for repeating yourself. You hit the nail on the head. So much idempotency efforts are made because they never made the actions idempotent in the first place.
- adamvendel 1 hour ago
  [dead]
villgax 54 minutes ago
skill issue lol, it's not idempotent anymore, same key for different requests? Heard of a nonce?
vanviegen 34 minutes ago
> If you’re still in school, here’s a fact: you will learn as much or more every year of your professional life than you learned during an entire university degree—assuming you have a real engineering job.
This rubs me the wrong way. It's stated as fact without any trace of evidence, it is probably false, and it seems to serve no purpose but to make struggling students feel worse (and make the author feel superior).
[-]
- frwickst 15 minutes ago
  What you learn at a uni is not really about learning a trade, sure it gives you a taste of the basics in many areas, but you will never be an supeb developer (or another profession) when you get out by only attending classes. However, what uni teaches you is how to learn, how to think critically, how important sources are, what to look for to get the most knowledge out of what you read. Or at least that is what it has always been about for me, the process of learning effective learning.
- cjfd 14 minutes ago
  "assuming you have a real engineering job" does a lot of work there. You could also do a lot of work the other way by stating "assuming you are getting a real education". I studied physics when I was young and that field is a lot deeper than my current work in programming. Computer science can also be quite deep if one considers things like the halting problem, type theory and proof assistants.
- riz_ 28 minutes ago
  You're commenting on the wrong article.
- echelon 20 minutes ago
  I think it's that the things learned in school are academic (red-black trees, dynamic programming, writing toy OS and programming languages, etc.)
  In the real world you're faced with building five nines active-active systems that interface across various stakeholders, behaviour has to be eventually consistent, you've got a long list of requirements and deadlines, etc. It's practical, hands on, and people are there to build the thing with you at a scale that far exceeds the university undergraduate setting.
  It's not a bad thing, it's just different.
  Students shouldn't be afraid of it. Your job and coworkers, if it's a good workplace, are there to help you succeed as you succeed together. You learn and grow a lot.
  You also learn how to deal with people, politics, changing requirements, etc., which I would imagine is difficult or impossible to teach without just throwing yourself into the fire.