Design for successful digital software development and engineering architecture

Yet Another Technical UK Banking Failure – Single Point of Failure / Weakness

23 February 2025 Comments off

Reading time:

10 minutes

Word count:

2064

For very long time I have been away in a stew, so please accept my humblest apologies, I am blogging once again, because I am fed up with X/Twitter, Mastadon and LinkedIn. My name is Peter Pilgrim. I’m a Principal Engineer and also a Java champion and I work for a UK technical consultancy. Today I am going to talk to you about this. An event which happens two weeks ago as I recorded on Valentine’s Day 2025.

Investing in finance, the stock market – Courtesy of Nappy https://nappy.co/photo/uXoZj8W9Y7asZ1N6CWXG9

Barclay’s Bank UK had a technical failure – You can read about the background about event here:

FS tech and which says Barclays confirmed services are fixed after it outage and Barclays Bank confirmed that his systems were working after a two-day on Friday the 31st of January and the bank apologisedfor this disruption.
The bank confirmed communication with the HMRC, UK Tax Authority and adviced them of the issue. tha
Barclay’s bank customers would not be charged in late penalty for any late tax returns. HMRC ordinarily furnishes £100 for tax returns that arrived after the yearly deadline 31st January. For some customers the technical fault was severely problematic. Customer were locked out of their accounts. FSTech: ref

Referencing The Guardian Newspaper website you can actually see about these glitch knocked out customers 24 hours as reportedf by by Daniel laddell on Saturday the 1st of February. Many customers had by then expressed frustration and anger on social media due to you said one person “and left about money have a few option to for delivery and in the morning which has now got cancelled legal my four kids with no food and it’s no joke it’s my money”. The incidient points out really highlight severely how we as society are just dependant on digital currency and Systems. It’s no longer easy to get and spread about cash (ATM).
The Guardian: Ref

Watch now! Principal Technology Episode 2025.1

Let’s for moment pretend that we are that lay-person. For you I’m a technical person I know many of people are not the majority of people on the air for not technical so this run through the facts again.

31st of January 2025

customer report that access failed and across mobile tablets and desktop

1st February 2025

the last day of self-assessment revenue service that’s in the UK
Barclay statement – suggested for affected customers to use food banks and leverage family and friends
Barclay statement – working to resolve the issue

2nd February 2025

SKY, The Guardian and BBC News online reported and confirmed across the news channels that Barclays outage was resolved by 11:00 a.m.

FCS does fine banks for their outages:

That is enough of the background. As a solutions designer / architect, let’s look at the Golden Eagle view of what could be the cause of the failure?

So the first thing that everybody knows or at least technical people know is that We’re in the transition, so called Digital Transformation (yuck!). It is highly likely in the year 2025. that Barclays Bank UK relies on encumbrance to modern technology that is decades old. The name for this system is called a MAIN FRAME. A Mainframe was the first generation of electronic computers available to commercial business. MAIN FRAME actually dates from the 1960, the main computer frame, where one computer took up the space of an entire gymnasium. Trying to imagining that for a minute. Main frames were designed with one to four central processing units (CPU). Comparised to the technology now, a general microprocessor with hundreds cores and experimentally proven at 1000 cores.
When the original programmers wrote their banking in 1970s, 1980s and probably in the 1990s, they wrote code in the COBOL programming language. These machines were never designed to high volumne processing in concurrent fashion. They were designed for batch operationds.

Mainframe

Migrating Mainframes to new Cloud Computing technology and one of the last architecture / infrastrucutre parts that needs to be upgraded. Because they the mainframe code is probably still written in COBOL and what I mean that mainframes do not scale if you think of how many users are accessing these systems electronically sometimes called digitally. Mainframes are mostly batch processing units and this is why you when you are sending money from one account to another it used to be it couldn’t be guaranteed for two hours, because there was this transitional two-hour waiting for confirmation. In the UK transferring money from account to account is called BACS, which it’s a lot faster now aware. As soon as a customer make a credit then it’s debited and that shows up in the source account and the target account and so I reckon that is the 35% probability that something went wrong in the back processing

Networking

All modern computing service are distributed and even mainframes are surrounded by a bridge architecture, which simultaneously protects, secure and most of the time (sic) acts a communication conduit, a buffer of flowing data going from there and yonder. So networking failure could well be outage’s root cause. Network failure that means there’s a hardware failure in the router, a single point of failure could be the main name in service the DNS (Domain Naming Service) blew up that is the for each computer out on the internet or the internal private networking of Barclays bank or any other bank in the world they will be a system that knows the IP address of every other computer in that organization. If your address book, suddenly, for any reason then critical communication stop. This could be hardware reason. Routers physically do fail, because no electronic component is ever 100% resilient. It could also be a software protocol. Sometimes business upgrade systems and there could be an accidental poison pill in that new protocol and and I’ve just described that as a 7% probability.

User Authentication

Remember this is speculation from me intuition the other way that it could fail is User Record Authentication and Authority update / change rollout. It might be in because computers and the old way of working software is using ancient BACS Main Frame processing that somebody somewhere rolled out a batching script that runs at the end of the month to do some profit and loss accountancy and they accidentally do something that script s not meant to do. So in the actualtrsl time in that it happened right at the End of the Settlement Day. Sometimes these batch systems run and take several hours several hours to run. In the ancient banking system, such a script that executes at three o’clock in the morning then it’s assume nobody’s using it takes five hours to run dancing punches a hole in the memory or in the transaction log, all of a sudden you’ve got a single point of failure. Because every system in the bank relies on user authentication. The user central authority or authentication has vanished then one has millions of customer, very hard working ordinaary people cannot get access to their salary waages. I’m gonna describe that 13% probability.

Data Loss

I could think that could be higher that would be explained why people couldn’t get access across devices mobile tablet and desktop and what’s not clear if internally there was a Knockout because that’s not being revealed and that’s what the UK Treasury Selection Committee is going to ask of UK Bank CEOs.

The final one for me is data loss and data loss it happens sometimes in banking systems where a message to get stuck in Message Queue. A message queue can be thought of very long list of instructions: credit, debits, account to account and final records. Perhaps think of it as a million Royal Mail letters in horizontal conveyer belt. Inside every letter is an instruction of a transaction and if suddenly there is a fault with the conveyor belt then the Bank has trouble. That the Mainframe cannot work or it blocks, the system ceases up. Modern banks do not rely conveyer belts of letter. They leverage large databases, data stores and multiple message queue for information processing. In order to handle collapse, all institution will have architecture quirk: replication of data. This is pragmatic software enterprise design to help alleviate outages. So what can possibly go wrong? I will describe this as 45% probability.

The message queue replication could have failed. Databases can also fail, through table space exhaustion, or corruption. The trick for availability is to have redundant system including databases to resilient with replication. For a FAANG company, they have global coverage through regions and availability zone. Amazon and Netflix can survive an earthquake, tsunami or other disaster. With replication, you press a button and should be able play back all the transactions and then recover what went wrong minus any maybe a small percentage of where people have lost actually records of transactions but you should be able to get your people back and this is really important for commercial banking environments, because they have to report every transaction now to the UK Financial Conduct Authority (FCA) [ or the equivalent USA SEC], especially if the transactions are investment trades. Probably worth your finding out about the highly infleuntial Senators Dodd and Frank on commercial bank behaviour and trading. In 2008, the financial world went into a meltdown and crashed.

Summary

So Data Loss is 45% probabilty, Mainframe is 35%, Networking is 13% and User Authentication Authority is issue 7% as probable root cause related single point of failure. One can surmise further, beyond these four reasons, true root cause could be intermixed. There is also the possibly of Knock On effects. A broken user access script to enhance the security, actually was detrimental, could have caused a data loss by removing proper and correct privilege access to say message queue replication.

Thank you Dear Reader for reading once again.

Peter Pilgrim,

+PP+, February 2025

There is an extra Fazit to this article: The Treasury Select Committee of the UK British Houses of Parliament has demanded an answer to the Bank Technical Failure, especially at Barclays Bank and other institutions. REF: Bank Outages 10th February 2025

References

Barclays tells customers to contact food banks as IT glitch disruption enters third day
- Conversation Top, Barclays UK Help X https://x.com/BarclaysUKHelp/status/1885295932561826086
- Barclays UK Help X https://x.com/BarclaysUKHelp/status/1885385379047514274 “We’re working to fix this as a priority”
- Olive Simmo Triggering X https://x.com/olive_simmo/status/1885392782547726601 “be aware that it can be SO triggering to ask someone if they have family or friends. No I don’t and if I did I would have obviously asked them”
- Barclays UK Help X https://x.com/BarclaysUKHelp/status/1885402505841987878 “👉 https://trusselltrust.org 👉 https://citizensadvice.org.uk”
Barclays says IT glitch that locked customers out of accounts is fixed – “Barclays says it has fixed the IT glitch that left thousands of customers locked out of their accounts on Friday and Saturday, and promised to compensate them for any losses incurred.”
FCA fines Starling Bank £29m for failings in their financial crime systems and controls – “The FCA has fined Starling Bank Limited £28,959,426 for financial crime failings related to its financial sanctions screening.”
Barclays’ IT outage: The curse of ancient systems – “Today, Barclays employs hundreds of mainframe engineers working in COBOL, a language over 60-years-old that is notoriously disliked and thus notoriously hard to hire for”
Barclays confirms services are fixed after IT outage – “Martin Greenfield, chief executive of cybersecurity company Quod Orbis, said that the incident highlighted some of the vulnerabilities in banking infrastructure.”
Barclays IT glitch locks customers out of accounts for almost 24 hours – Ruth, 39, self-employed cleaner told BBC, “We need the money to do shopping, our money is all in savings,” “I’ve got my granddaughter here who’s 11 months old, also a one-year-old, two-year-old, 12-year-old, 13-year-old, 15-year-old all at home. There could be many single mums in the same situation with no access to money.”
Most Businesses Overlook One Common Mainframe Security Vulnerabilities, 2019 – https://www.infosecurity-magazine.com/opinions/common-mainframe-vulnerability-1-1/ “Mainframe professionals need to recognize that application scanning alone won’t identify every system flaw, and traditional scanning tools simply aren’t capable of picking up on OS-level capabilities.”

Peter Pilgrim :: Java Champion :: Digital Developer Architect

PEAT Online Course -
How To Improve Your Personal Performance in Java work Technology

Yet Another Technical UK Banking Failure – Single Point of Failure / Weakness

Mainframe

Networking

User Authentication

Data Loss

Summary

References

PEAT Online Course - How To Improve Your Personal Performance in Java work Technology

Mainframe

Networking

User Authentication

Data Loss

Summary

References

PEAT Online Course -
How To Improve Your Personal Performance in Java work Technology