Thursday, April 09, 2020

2020-04-09 Thursday - Failure To Perform Root Cause Analysis


There is so much that is factually wrong in these two articles - due to a material lack of understanding of the root-cause of the problem:
    • "...struggling to process the large volume of unemployment claims "

 This article provides a deeper examination of the root-causes:
  • "Why New Jersey’s Unemployment Insurance System Uses a 60-Year-Old Programming Language"
    • https://slate.com/technology/2020/04/new-jersey-unemployment-cobol-coronavirus.html
    • " The state’s unemployment insurance application website had broken under the weight of the more than 200,000 applications it received in a single week,
    • "New Jersey is hardly the only state stuck with technology from the last millennium. The New York Times reported that New York’s and Connecticut’s unemployment systems are also both more than 40 years old and haven’t been able to keep up with the flood of new applications.
    • "Washington, D.C., asks unemployed workers to file their applications using Internet Explorer, a browser that Microsoft officially retired five years ago"
    • "States have been starved of funding they need for running their unemployment insurance systems, money that under the 1935 Social Security Act is supposed to come from the federal government"
    • "But while New Jersey’s unemployment system is undoubtedly buckling under the weight of the COVID-19 jobs crisis, the COBOL programming language, or the mainframes it runs on, are probably not to blame. After all, COBOL systems process trillions of dollars of transactions daily for the world’s largest banks, which are clearly not strapped for the cash they’d need to make upgrades. COBOL might be deeply uncool, but it’s hardly a dead language."


A few points/conjectures I would like to share - from my perspective:
1) COBOL is not necessarily obsolete - the vendors that offer this language continue to make improvements and enhancements - and new versions are still being released - with new features and capabilities (1, 2, 3, 4, 5, 6).
2) A given company's installation may be of an obsolete version of COBOL - but to classify the language itself as obsolete - is a gross mischaracterization.
3) If the problem, as alluded to in the article, is one of scalability - the code itself may not be the root-cause problem - it may be the lack of scalability in the underlying infrastructure.
4) If the root-cause problem is a lack of flexibility/configurability in the design of the application code - then it is much more than just an issue of obsolete COBOL code - it speaks to a larger problem of an obsolete underlying data architecture.
5) COBOL is hardly an obscure programming language. Archaic, undoubtedly. Verbose, certainly. Unfashionable, absolutely. But, in almost any large company that has been in business for more than 30-40 years - and certainly in most government agencies - it is more correct to say it is endemic across IT organizations.

6) Unless a programmer was previously working on a specific application's code base - any system's code is going to be initially OBSCURE to them - regardless of the age of the programming language (and, it is highly likely that it will be MORE obscure - the newer the language - and certainly EVEN MORE obscure if the business domain is new/foreign to the retired programmer).
7) Ipso facto: Relative complexity of legacy monolithic architectures vs modern distributed architectures. I know this will be a controversial statement - and it may seem counter-intuitive - but, obscurity is a word perhaps best reserved for applications built on modern highly distributed architecture patterns - with thousands of microservices.

8) It is highly unlikely that the code itself is the root cause of the issue. Typically, applications written in COBOL, for such large-scale problems (i.e. unemployment claims) are designed as back-end, nightly batch processing jobs - and an ancient/legacy COBOL program would typically be written to process one record at a time. The greater likelihood of the root cause is in the ARCHITECTURE of the application written in COBOL - and would have nothing to do with the programming language capabilities/limitations itself. Greater still, is the likelihood of limitations of the underlying computing hardware and data storage/access mechanisms - which again - have little to do with the choice of programming language itself.

9) With the exception of modern cloud-native application architectures - very, very few legacy systems (regardless of programming language) - would have had a justifiable reason for designing/adopting an application architecture (AND the attendant investment in the necessary underlying infrastructure) that would accommodate obscene/extreme highly elastic scalability requirements of 3x, 4x, 5x, ..., 20x, ... volume during their nightly batch processing windows - it would be inconceivable that business stakeholders would have justified such additional architecture complexity, cost, and effort - when these legacy systems were originally proposed/funded/constructed.

10) Many legacy/older batch-based processing systems (regardless of their programming language) have trouble meeting their NORMAL batch processing windows (SLAs) - based on existing application architecture and underlying hardware infrastructure limitations/constraints - which are usually NOT germane to their particular choice of programming language.

11) You can't QUICKLY fix those outdated systems by just throwing retired COBOL programmers at the problem. Just as you can't make a baby in one month with 9 mothers and 9 fathers.

12) It is highly likely that MANY of those in the potential pool of aging (or, retired) COBOL programmers never evolved beyond the ideas of how they designed similar original systems - and so, many of them will lack {awareness of | experience with} the significant advances that have been made in technologies and architecture patterns for high performance distributed computing -  the fundamental technical and conceptual tools and ideas necessary to fix the underlying performance/scalability design issues of those legacy COBOL systems.

Architecture decisions are a complex calculus of constraints - however, little luxury is afforded by business constraints/priorities - to allow their continual re-assessment (or, alteration) on an ongoing basis - or, in any form approaching near real-time. UNLESS, you have already performed the refactoring and migration work to a Cloud Native Architecture.

And thus, architecture decisions - made at fixed points (for the design of these problematic legacy systems) - are bounded by constraints of available time, budget, resources - against an ever-changing landscape of shifting business requirements (and priorities) - and ever-climbing mountains of technical debt - that frequently experience seismic upheavals due to regulatory or competitive market forces.

When a Black Swan event arrives - those who cry "You should have modernized!" - are ignoring the very real business constraints under which these systems have been maintained - often by long-suffering heroic actions of unsung COBOL programmers and IT Managers.

As NJ Governor Murphy's administration tackles this complex problem - it is important that the following concerns are considered:
  • Don't over-engineer the interim solution
  • Have experienced senior level architects review any proposed changes that impact stability, security, scalability, or performance.
  • Develop a long-term modernization plan - this doesn't (and shouldn't) necessarily mean a rip-and-replace. An incremental upgrade/update in-place is likely going to be the far more cost-efficient approach.
  • Beware of any consultants that are pushing (selling) any wholesale rewrite/replacement of the entire system/application. 
    • Complete rewrites often fail - spectacularly.
    •  Complete rewrites usually take much longer than estimated - and likewise for the estimated costs.
    • There is an estimated 200 billion lines of COBOL code - it is likely much of that code can be reused - and may only require some level of refactoring.

Additional articles that may be of interest:


2021-04-12 Update:
  • 2021-04-12 Reflections on “Darth Misty the Mainframe Sith” by Misty Decker (Product Marketing Director at Micro Focus) - who also shared with me this recent news item:
    • 2021-03-27 COVID unemployment hearing probes state computer system, claims process
      • "The state unemployment insurance system remains dysfunctional a full year after New Jersey shut down parts of the private sector due to the COVID-19 pandemic, according to testimony at a virtual hearing held by Republican lawmakers on Friday."
      • "Computer consultant Bill Hinshaw, founder and head of Cobol Cowboys, said a thorough modernization could be done in no more than several months and probably for no more than $25 million"

No comments:

Copyright

© 2001-2021 International Technology Ventures, Inc., All Rights Reserved.