STP
SBOM Observer/

Legacy Systems

SBOM strategies for systems without modern build automation

Modern software development with automated build pipelines, dependency managers, and CI/CD integration makes SBOM generation straightforward. But many organizations maintain legacy systems predating these practices—applications built with manual compilation, dependencies managed through shared network drives, no version control, custom build scripts understood by one retiring engineer. These systems serve critical business functions but lack infrastructure supporting automated SBOM generation.

Legacy systems can't be ignored in SBOM programs. Often they're most business-critical applications precisely because they've been running successfully for years. Excluding legacy from SBOM coverage creates dangerous visibility gaps—attackers don't care if your vulnerable component exists in modern microservice or decade-old monolith. Comprehensive SBOM programs require strategies for legacy systems acknowledging their constraints while establishing practical transparency.

Legacy System Challenges

Missing Build Automation

Legacy systems often lack automated build processes. Developer compiles code manually, copies dependencies from shared folders, creates deployment packages through documented (or undocumented) manual procedures. No build pipeline to inject SBOM generation into.

Impact: Standard SBOM tools integrate with build systems (Maven, npm, pip). Without build automation, these tools can't automatically enumerate dependencies during compilation.

Adaptation required: Manual documentation, binary analysis, or retrofitting minimal automation specifically for SBOM generation even if full build automation isn't feasible.

Dependency Management Gaps

Modern applications use dependency managers (npm, Maven, pip) that maintain machine-readable manifests of dependencies. Legacy systems often manage dependencies manually—JAR files copied into lib/ directory, DLLs in system32, shared objects scattered across file systems.

Impact: No authoritative dependency declaration to parse. Component inventory must be reconstructed through file system analysis, source code inspection, or institutional knowledge.

Adaptation required: Archaeological discovery of what components exist, establishing manual documentation practices, implementing basic dependency tracking even without full dependency management migration.

Unclear Component Versions

Legacy systems may use components without clear version tracking. Files named "database-driver.jar" without version indication. Components compiled from source where original version tags are lost. Libraries so old that version information was never systematically tracked.

Impact: SBOM requires component versions for vulnerability correlation. "We use PostgreSQL driver" without version is insufficient—need to know if it's vulnerable 8.x or patched 15.x.

Adaptation required: Version archaeology through file analysis, release date correlation, binary inspection, or pragmatic "best effort" version documentation.

Documentation Gaps

Legacy system documentation often exists primarily in retiring engineer's head. What dependencies exist? Why were they chosen? Where did they come from? Answers lost to time as personnel change.

Impact: Creating SBOM requires knowing what components exist. Lack of documentation means starting from discovery phase rather than documentation phase.

Adaptation required: Reverse engineering, knowledge capture from remaining experts, systematic documentation initiatives beyond just SBOM generation.

Legacy SBOM Strategies

Strategy 1: Binary Analysis

Analyze deployed artifacts or compiled binaries to discover embedded components.

Application scenarios:

  • Compiled languages (Java JAR/WAR, .NET assemblies, C/C++ binaries)
  • Container images without build history
  • Deployed systems where source access limited

Tools and techniques:

Java applications:

# Analyze JAR/WAR files
jar -tf application.war | grep ".jar€"

# Extract version information from manifests
unzip -p application.war META-INF/MANIFEST.MF

# Use specialized tools
syft packages application.war -o cyclonedx-json

Binary analysis:

# Linux shared libraries
ldd /usr/local/bin/legacy-app

# Library versions
strings /lib/x86_64-linux-gnu/libssl.so.1.1 | grep "OpenSSL"

# Package queries
dpkg -l | grep libssl
rpm -qa | grep openssl

.NET assemblies:

# Assembly inspection
Get-ChildItem -Recurse -Filter "*.dll" | ForEach-Object {
    [Reflection.Assembly]::LoadFile(€_.FullName).GetName()
}

Advantages: Works without source access, discovers actual deployed components.

Limitations: May miss build-time-only dependencies, version detection unreliable for some component types, can't detect source-level dependencies.

Strategy 2: Manual Documentation

Systematic documentation of components through expert interviews and code review.

Process:

Step 1: Component enumeration Work with developers/operators to identify all external dependencies:

  • Third-party libraries (commercial and open source)
  • Framework components
  • Database drivers
  • Network libraries
  • Utility components

Step 2: Version identification For each component, establish version through:

  • Installation records or procurement history
  • Release date correlation (when was system built vs. when were component versions released)
  • File hashes matched against known version databases
  • Support contract records showing licensed versions
  • Binary inspection for embedded version strings

Step 3: Documentation creation Create manual SBOM document:

{
  "bomFormat": "CycloneDX",
  "specVersion": "1.6",
  "serialNumber": "urn:uuid:...",
  "version": 1,
  "metadata": {
    "component": {
      "name": "legacy-financial-system",
      "version": "3.2.1"
    },
    "properties": [
      {
        "name": "cdx:sbom:generation-method",
        "value": "manual-documentation"
      },
      {
        "name": "cdx:sbom:confidence-level",
        "value": "medium"
      }
    ]
  },
  "components": [
    {
      "name": "oracle-jdbc-driver",
      "version": "10.2.0.4",
      "type": "library",
      "description": "Identified through file system analysis. Version determined from support contract records dated 2008-03-15.",
      "properties": [
        {
          "name": "cdx:documentation:source",
          "value": "manual-inventory"
        },
        {
          "name": "cdx:confidence",
          "value": "high"
        }
      ]
    }
  ]
}

Advantages: Can document components even without technical analysis capabilities, captures institutional knowledge before it's lost.

Limitations: Labor intensive, accuracy depends on expert knowledge availability, difficult to maintain as system evolves.

Strategy 3: Source Code Analysis

Analyze source code to identify dependencies through import statements, include directives, configuration files.

Techniques:

Grep-based discovery:

# Java imports
grep -r "^import " src/ | sort -u

# Python imports
grep -r "^import \|^from " *.py | sort -u

# C/C++ includes
grep -r "^#include" src/ | sort -u

# Configuration files
find . -name "*.properties" -o -name "*.xml" -o -name "*.conf"

Parser-based analysis: Use language-specific parsers to extract dependency information:

# Python AST parsing for imports
import ast
import os

def extract_imports(source_file):
    """Parse Python file and extract imports"""
    with open(source_file) as f:
        tree = ast.parse(f.read())

    imports = []
    for node in ast.walk(tree):
        if isinstance(node, ast.Import):
            for alias in node.names:
                imports.append(alias.name)
        elif isinstance(node, ast.ImportFrom):
            imports.append(node.module)

    return imports

# Scan all Python files
for root, dirs, files in os.walk('src'):
    for file in files:
        if file.endswith('.py'):
            imports = extract_imports(os.path.join(root, file))
            # Process imports...

Version determination: Source code often lacks explicit version declarations. Correlate with:

  • Lock files if they exist
  • Build timestamp and component release dates
  • Internal documentation
  • Change control records

Advantages: Identifies all code-level dependencies, works for any language with source access.

Limitations: Doesn't capture runtime-only dependencies, version identification challenging, requires source code access.

Strategy 4: Hybrid Approach

Combine multiple techniques for comprehensive coverage.

Layered analysis:

  1. Binary analysis discovers deployed components and some versions
  2. Source analysis identifies code-level dependencies and intentions
  3. Manual documentation fills gaps and adds context
  4. Expert interviews validate findings and resolve ambiguities

Confidence scoring: Document confidence level for each component:

  • High confidence: Version confirmed through multiple methods, component presence verified
  • Medium confidence: Version estimated through correlation, component presence confirmed
  • Low confidence: Component suspected but not definitively confirmed, version uncertain
{
  "component": {
    "name": "suspected-xml-parser",
    "version": "unknown",
    "properties": [
      {
        "name": "cdx:confidence",
        "value": "low"
      },
      {
        "name": "cdx:discovery:method",
        "value": "source-code-hints"
      },
      {
        "name": "cdx:discovery:notes",
        "value": "XML parsing functionality present but specific library unclear. Could be built-in language features or external library."
      }
    ]
  }
}

Advantages: Maximizes accuracy by corroborating multiple sources, documents uncertainty honestly.

Limitations: Most labor-intensive approach, requires multiple skillsets.

Incremental Improvement Strategy

Legacy SBOM doesn't need to be perfect immediately. Establish baseline and improve incrementally.

Phase 1: Initial Inventory

Goal: Document known components to best of current knowledge.

Acceptance criteria:

  • Major components identified (frameworks, databases, middleware)
  • Best-effort version information
  • Documented uncertainties and gaps
  • SBOM exists even if incomplete

Deliverable: "V1.0" SBOM with known limitations documented in metadata.

Phase 2: Gap Filling

Goal: Resolve unknowns through targeted investigation.

Activities:

  • Binary analysis of suspect components
  • Version archaeology for unversioned components
  • Documentation review for historical context
  • Expert interviews with original developers (if available)

Deliverable: "V2.0" SBOM with reduced uncertainty and higher completeness.

Phase 3: Verification

Goal: Validate SBOM accuracy through testing and operational correlation.

Activities:

  • Runtime testing—does system actually use documented components?
  • Vulnerability scanning—do scan results align with SBOM contents?
  • Licensing review—do license findings match SBOM declarations?
  • Comparison with similar systems

Deliverable: "V3.0" SBOM with verified accuracy, suitable for operational use.

Phase 4: Maintenance

Goal: Keep SBOM current as legacy system changes.

Activities:

  • Change control integration—any system modification triggers SBOM review
  • Periodic re-validation (annually)
  • Documentation of any component additions, removals, updates
  • Version updates when security patches applied

Deliverable: Living SBOM that tracks legacy system evolution.

Minimal Build Automation for SBOM

Even without full CI/CD transformation, minimal automation specifically for SBOM generation provides value.

Script-based generation:

#!/bin/bash
# legacy-sbom-generator.sh - Manual execution when changes occur

echo "Generating SBOM for legacy-system..."

# Binary analysis of deployment
echo "Analyzing deployed artifacts..."
syft packages /opt/legacy-system -o cyclonedx-json > sbom-auto.json

# Manual component additions (things binary analysis misses)
echo "Merging manual component documentation..."
python merge-sbom-components.py \
  --auto sbom-auto.json \
  --manual manual-components.json \
  --output sbom-complete.json

# Validate
echo "Validating SBOM..."
cyclonedx-cli validate --input-file sbom-complete.json

# Sign
echo "Signing SBOM..."
gpg --armor --detach-sign sbom-complete.json

echo "SBOM generation complete: sbom-complete.json"
echo "Signature: sbom-complete.json.asc"

Scheduled execution: Even if not triggered by builds, scheduled SBOM generation (quarterly, semi-annually) establishes regular refresh cycles.

Change control integration: Make SBOM update part of change control process. Any modification to legacy system triggers SBOM review and update if changes affect components.

Communicating Legacy SBOM Limitations

Transparency about SBOM quality builds trust more than false precision.

Documentation patterns:

{
  "metadata": {
    "properties": [
      {
        "name": "cdx:sbom:type",
        "value": "analyzed"
      },
      {
        "name": "cdx:sbom:accuracy",
        "value": "best-effort"
      },
      {
        "name": "cdx:generation:notes",
        "value": "SBOM generated through combination of binary analysis, source code review, and expert documentation. Legacy system built 2006-2012 without modern dependency management. Component versions determined through correlation with deployment dates and support records. Some uncertainties documented in component-level properties."
      },
      {
        "name": "cdx:update-frequency",
        "value": "semi-annual manual review"
      },
      {
        "name": "cdx:last-verified",
        "value": "2024-01-15"
      }
    ]
  }
}

Consumer communication: "Legacy-system SBOMs are generated through manual analysis and best-effort documentation due to system age and lack of modern build automation. We've invested significant effort in component inventory accuracy, but some version information is estimated rather than definitively confirmed. We recommend additional verification for critical dependency decisions. Please contact us with questions about specific components."

Justifying Investment in Legacy SBOM

Legacy system SBOM generation is more expensive than modern system automation. Justify investment through risk and compliance framing.

Risk argument: Legacy systems often serve critical business functions. Log4j affected systems regardless of age—legacy systems were just as vulnerable as modern ones, but harder to assess. SBOM enables faster incident response even for legacy assets.

Compliance argument: Regulatory requirements (NIS2, CRA) don't exempt legacy systems. "System is old" doesn't satisfy compliance obligations. SBOM demonstrates due diligence even for legacy infrastructure.

Operational argument: Legacy system modification becomes necessary eventually (security patches, integration changes, platform migration). Understanding component composition before forced changes prevents crisis-mode scrambling.

Business continuity argument: Key personnel retiring threaten institutional knowledge loss. SBOM documentation captures component knowledge before it walks out door, protecting business continuity.

Migration Planning Context

Legacy SBOM efforts often reveal modernization needs. Using SBOM as modernization planning input:

Component age analysis: SBOM showing 15-year-old components with no available patches provides concrete evidence for modernization business case.

Licensing clarity: Discovering legacy system uses components with incompatible licenses or expired licenses justifies replacement planning.

Dependency complexity: SBOM revealing 200 components in complex dependency web demonstrates technical debt and modernization value.

Vendor support: Components from vendors who no longer exist or no longer support versions in use indicates unsustainable situation.

Next Steps

On this page