Ah yes, the financial services company that runs a travel agency, allows me to book my hotel and rental car weeks in advance, registers a hold for incidentals for both the hotel and car when I check in, then blocks the card when I try to buy dinner that night in that same hotel due to fraud detection.
Last week it required me to take pictures of my face from multiple angles to regain membership privileges. I suspect this may be part Palantir data collection and part Peter Thiel dating service.
Nobody uses Amex for payments, so the system isn't ever under high load.
Just kidding!
I find the idea quite good, and have to assume that the amount of payment fails they experience due to partitions/outages isn't very high and that the post-payment reconciliation and reclamation process gives them the liberty to rank availability a bit higher than correctness.
One thing that looked a bit shaky was the interplay between the global transaction router's state of knowing which cells can handle a particular payment and the asynchronous distribution of the "failover data", which I presume it needs to know to route correctly. To me that seems to create a window where it might route to the wrong cell due to an outdated routing state.
It also doesn't go into the HA setup of the global transaction router itself.
American Express tech is some of the worst in the world among big companies. All of the value in the company is just in the branding. They put some work into the mobile app and the website, but other than that, its a facade.
I wonder how they ensure durability. Is it possible that a cell going down would roll back a payment after it has occurred. Or do they depend on a non cell database?
I would assume nothing related to a given transaction crosses the cell boundary.
We use a cellular architecture to help constrain the blast radius of a modular monolith. Each one of our customers lives in exactly 1 cell. Any kind of cross-customer BI/reporting happens through a data warehouse.
Last week it required me to take pictures of my face from multiple angles to regain membership privileges. I suspect this may be part Palantir data collection and part Peter Thiel dating service.
Just kidding!
I find the idea quite good, and have to assume that the amount of payment fails they experience due to partitions/outages isn't very high and that the post-payment reconciliation and reclamation process gives them the liberty to rank availability a bit higher than correctness.
One thing that looked a bit shaky was the interplay between the global transaction router's state of knowing which cells can handle a particular payment and the asynchronous distribution of the "failover data", which I presume it needs to know to route correctly. To me that seems to create a window where it might route to the wrong cell due to an outdated routing state.
It also doesn't go into the HA setup of the global transaction router itself.
But still, I kind of like the design.
We use a cellular architecture to help constrain the blast radius of a modular monolith. Each one of our customers lives in exactly 1 cell. Any kind of cross-customer BI/reporting happens through a data warehouse.