The Technical Debt Virus: Attacking an IT Contagion Across Federal, DoD, Army Data Center Ecosystems

Ted McLaughlan
6 min readNov 16, 2020
Technical Debt in DoD Army Data Centers is an ROI/TCO Virus like COVID

Examining the universe of the U.S. Federal Government and DoD data center communities, including mission-linked facilities like the Army Enduring Data Center (AEDC) network, the depth and breadth of “technical debt” (TD) to be repaid appears immense. Technical debt in commercial or government data center application hosting environments is most commonly associated with trade-offs, shortcuts, customizations, duplicative investments made during software and hardware maintenance and sustainment periods, to extend the usefulness or features of the technologies in ways they (or its interfacing systems) may not have originally been designed to support.

TD is in fact a contagion, an enterprise IT investment management virus attacking ROI/TCO health.

Technical debt examples beyond generally well-known SW engineering shortcuts include over-customization of ERP systems, addition of new converged infrastructure without legacy data management refactoring, and installation of duplicative monitoring tools without existing PowerShell script and log data movement consolidations. Separation of short-lived SW engineering improvements from enterprise data governance and management alignment is particularly insidious, exacerbating the gap between legacy data islands and future data insights.

This all results in a growing, expensive backlog of technical debt remediation and change that will be necessary to remove, update or replace not only application software components, but also underlying IT Operations Management and IT Infrastructure Management (ITOM/ITIM) software and hardware as mission needs (and the applications that support them) rapidly flex and change. The backlog manifests as a repayment term model as principle (remediation costs), interest (supporting, corollary costs), liability (mission risk) and opportunity costs (deferred modernization).

Many DoD and Army data center IT infrastructure portfolios and programs are struggling to transition to “infrastructure-as-code” (IAC) solutions in their hybrid IT models (i.e. integrated on-premises and cloud-oriented IT services, supporting DevSecOps and DataSecOps engineering). IAC’s promise is that software-based infrastructure is simpler, faster and less complex to operate than hardware with embedded software, but this also introduces new, fast-growing vectors for non-standard, uncontrolled integration software and operational script development with additional cybersecurity risk. IAC does enable “immutable infrastructure” outcomes and more control of “snowflakes” or configuration drift (in server platforms, for example), but doesn’t usually consider the end-to-end data management lifecycle of complex applications and IT services — plus the customizations engineers introduce to keep upstream and downstream components synched with quickly-changing server or network configurations. In other words, reactive change outside of the IAC plans or playbooks, vs. predictive or proactive change anticipating or driving the use of common configurations.

While the vision and demand is growing quickly for SDX/SDDC “software-defined everything”/“software-defined data centers” (for example a cloud IT infrastructure target described recently by the Army’s ECMO in response to DoD JEDI cloud delays) — TD is being introduced only more quickly with automated, fast-changing SDN/NFV tools (for example) when they aren’t part of an integrated enterprise plan, aligned to a trusted enterprise, vendor-informed roadmap. “The basic idea of TD is straightforward: quickly built, simpler systems solutions will be faster to market but often require much more sustaining operational maintenance, and therefore much higher associated costs, over their planned useful systems life. (“Towards a taxonomy of technical debt for COTS-intensive cyber physical systems”).

This “Taxonomy of Technical Debt” conference research paper (that focuses on cyber-physical systems, but has obvious correlations to most IT systems), organizes technical debt by categories and attributes to address (much more granular than the general model mentioned earlier). There’s an understanding expressed that enterprise IT leadership and systems need to maintain pace with industry, using the latest technology. However, rapid, siloed, nearsighted IT procurement decisions on sustainment contracts that will outlive program leadership changes (due to long-term contracts) only accelerate technical debt accumulation. “As vendors evolve their COTS products to keep up with customer demands, system integrators and operators are often forced to repeatedly choose between two strategic directions: upgrade the component now and deal with the integration issues, or stay with the existing component and risk future obsolescence, at some unknown or unconsidered level, that may compromise functionality, reliability, availability, maintainability, and/or cost.”

The paper also describes categories of technical debt to monitor and manage, as well as a suggested model to represent and track COTS TD items (much as we do in standard “IT Asset/Configuration Management” practices) to protect against excessive, prolonged support costs. “Preferably, the earlier this information is specified, e.g., at the start of a new design/ development/ modernization effort, the more feasible it is to appropriately assign TD monitoring and repayment responsibility within reasonable span of authority/control. This corresponds to a possible means to offset “obsolescence” cost through more informed early COTS decision making, as well as more proactive COTS TD management across all system life cycle phases.”

A very interesting attribute of this TD tracking model, the paper explains, is “Contagion” — i.e. “the degree of spreading of the TD item through COTS interfaces with other system components, if this TD is allowed to continue to exist.” There’s no “TCO herd immunity” (reflecting a facet of the current COVID crises) developed across the IT enterprise as short-sighted IT investment decisions increase sustainment costs — nor can the IT operations department sustainment programs inoculate themselves against it. But they can develop ways to participate in the earliest TD epidemiology research and planning, so to speak.

What’s the answer for the DoD and its data center communities (DISA, Army, Air Force, etc.) to avoid the technical debt plague in the first place, closing the borders and using early “virus” testing and tracking methods to control that which seeps through? The virology of short-sighted IT procurement decisions driven by scattershot introduction of quick-win COTS?

The Taxonomy of TD Debt paper’s principal conclusions include the view to “concurrently consider multiple perspectives from systems design, development, procuring, and sustaining organizations, and synthesize them into value-added, insightful output for use in decision-making associated with new technology and systems introductions, throughout their planned life spans. The velocity of this need is driven by the short-life expectancy of COTS components”.

In other words, include the sustainment & operations team with their industry partners in enterprise IT portfolio planning and design, when selecting new COTS products.

Inclusion is critical for sustainment organizations and contractors, in early planning and design for enterprise architecture (EA) mission change, as well as in the EA change feedback loop emanating (as technical refresh plans, continual service improvements (CSI), etc.) from proactive, predictive IT infrastructure problem and change management. This includes not only those sustaining the current data center systems, but commercial industry leaders and partners sustaining similar capabilities, running large data centers themselves. Extremely valuable lessons-learned, best practices and actual managed capacity can be leveraged, from those experts already dealing with the TD threat in the global universe of commercial, Federal and DoD data center hybrid IT infrastructure management COTS products and services.

Where this thought leadership originates

Global hybrid IT infrastructure managed services and capacity providers like NTT DATA Federal Services (NDFS) and the global NTT Ltd. family of affiliates operate more than 200 data centers around the world, for financial, healthcare, defense, civilian government and many more customer business segments. The battle against, quarantining and rapid response to IT technical debt contagions and resulting program investment TCO/ROI threats is being fought on a daily basis across this ecosystem of very modern, vendor-aligned environments — to optimize performance and costs for all customers. AWS managed capacity services from NTT DATA, for example (enhanced by our recent acquisition of Flux7), regularly deliver continual operational agility improvement without increasing technical debt . This is exactly the sort of help (through a variety of experience-based attack vectors) DoD and Army CIO programs need to leverage, to balance the demand for the most current technology, against the introduction and proliferation of the technical debt virus.

--

--

Ted McLaughlan

30+ years as an IT Enterprise Architect — for Commercial, Public Sector, Product Vendor, and Small Business/Startup companies, customers and communities.