STP
SBOM Observer/

Enrich with Metadata

Adding context, provenance, and operational details beyond basic component lists

Automated SBOM generation produces foundation: component names, versions, relationships. This satisfies minimum requirements but misses opportunity to provide context that dramatically increases SBOM utility. Enriched SBOMs tell stories about software composition rather than just listing ingredients. Where did components come from? How were they obtained? What evidence supports their authenticity? What known issues exist? What operational context matters for consumers trying to assess risk?

The difference between basic and enriched SBOMs parallels the difference between ingredient list and nutritional label. Ingredient list tells you what's present. Nutritional label adds context enabling informed decisions. Enriched SBOMs enable consumers to make better security, operational, and business decisions based on comprehensive component context rather than just identity.

Types of Metadata Enrichment

Component Pedigree and Provenance

Pedigree describes component history—where it came from, how it was obtained, what transformations occurred. Critical for supply chain security and authenticity verification.

Source repository information: Link components to source repositories where code originated. For open source: GitHub repository URL, commit hash that was built, branch name. For internal components: internal repository location, commit hash, build ID.

{
  "component": {
    "name": "express",
    "version": "4.18.2",
    "purl": "pkg:npm/express@4.18.2",
    "externalReferences": [
      {
        "type": "vcs",
        "url": "https://github.com/expressjs/express",
        "comment": "Source repository"
      },
      {
        "type": "distribution",
        "url": "https://registry.npmjs.org/express/-/express-4.18.2.tgz",
        "comment": "Package distribution location"
      }
    ],
    "properties": [
      {
        "name": "cdx:npm:package:commit",
        "value": "8368dc17af842441d2bf8d1fb8fec73a8edd7739"
      }
    ]
  }
}

Provenance enables verification: "SBOM claims this component came from official npm registry at specific commit. Consumer can verify package hash matches distribution, commit matches expected source."

Supplier information: Distinguish between component author, distributor, and supplier. Author wrote code. Distributor published package. Supplier provided it to you (might be same as distributor, or might be intermediary).

{
  "component": {
    "name": "lodash",
    "version": "4.17.21",
    "supplier": {
      "name": "npm, Inc.",
      "url": ["https://www.npmjs.com"]
    },
    "author": "John-David Dalton",
    "publisher": "npm, Inc."
  }
}

Supplier tracking matters for trust decisions and incident response. Known-trustworthy supplier carries more weight than unknown distributor. Security incident involving npm registry requires different response than incident involving obscure mirror.

Acquisition method: How did component enter your software? Downloaded from package manager? Copied from vendor? Built from source? Extracted from third-party installer?

{
  "properties": [
    {
      "name": "cdx:acquisition:method",
      "value": "package-manager"
    },
    {
      "name": "cdx:acquisition:timestamp",
      "value": "2024-01-15T14:23:45Z"
    },
    {
      "name": "cdx:acquisition:source",
      "value": "https://registry.npmjs.org"
    }
  ]
}

Acquisition context aids investigation: "This component downloaded from official registry two weeks ago. If similar component in different product came from unofficial mirror, that's higher risk."

Hashes and Integrity Verification

Cryptographic hashes enable verification that components haven't been tampered with between SBOM generation and consumer analysis.

Multiple hash algorithms:

{
  "component": {
    "name": "axios",
    "version": "1.6.0",
    "hashes": [
      {
        "alg": "SHA-256",
        "content": "9a6a0b5b8a3e8b0d5c3e5f5a6d5c3e2e1d5c3e2e1d5c3e2e1d5c3e2e1d5c3e2e"
      },
      {
        "alg": "SHA-512",
        "content": "7e3c4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3..."
      }
    ]
  }
}

Multiple algorithms provide defense-in-depth. SHA-256 collision attack discovered? SHA-512 still provides integrity verification. Consumer can verify downloaded component matches SBOM-documented hash, detecting tampering or corruption.

File-level hashes: For components with multiple files, document hashes of individual files within component, not just component package hash. Enables granular integrity verification and change detection.

Licensing Details

Basic SBOMs include SPDX license identifiers. Enriched SBOMs add licensing context supporting compliance decisions.

License files and URLs:

{
  "component": {
    "name": "react",
    "version": "18.2.0",
    "licenses": [
      {
        "license": {
          "id": "MIT",
          "url": "https://opensource.org/licenses/MIT",
          "text": {
            "contentType": "text/plain",
            "encoding": "base64",
            "content": "TWl0LiBMaWNlbnNlCgpDb3B5cmlnaHQg..."
          }
        }
      }
    ]
  }
}

Embedded license text enables compliance verification without external lookups. URL points to authoritative license definition. Consumer can verify component license against stated terms.

License expression details: For complex license situations (dual licensing, license with exceptions), provide clarifying context.

{
  "properties": [
    {
      "name": "cdx:license:expression:notes",
      "value": "Project offers choice of Apache-2.0 OR MIT. We selected MIT for compatibility with project licensing."
    },
    {
      "name": "cdx:license:selected",
      "value": "MIT"
    }
  ]
}

Documentation of license choice prevents future confusion about which license applies when component offers options.

Evidence and Attestations

How do consumers trust SBOM accuracy? Evidence provides supporting documentation.

Build evidence:

{
  "component": {
    "name": "internal-auth-lib",
    "version": "2.1.0",
    "evidence": {
      "occurrences": [
        {
          "location": "/lib/auth/authentication.jar"
        }
      ],
      "callstack": {
        "frames": [
          {
            "module": "AuthenticationService",
            "function": "validateToken"
          }
        ]
      }
    },
    "properties": [
      {
        "name": "cdx:build:timestamp",
        "value": "2024-01-15T10:30:00Z"
      },
      {
        "name": "cdx:build:job",
        "value": "https://jenkins.example.com/job/auth-lib/123"
      }
    ]
  }
}

Evidence documents: component exists at specific location in build artifacts, is actually invoked (not just present), was built by specific CI/CD job at specific time. Strengthens confidence in SBOM accuracy.

SLSA provenance integration: SLSA (Supply chain Levels for Software Artifacts) provides framework for build provenance attestation. Enriched SBOMs can reference SLSA attestations linking components to verified build processes.

{
  "externalReferences": [
    {
      "type": "attestation",
      "url": "https://artifacts.example.com/auth-lib-2.1.0.provenance",
      "comment": "SLSA provenance attestation"
    }
  ]
}

SLSA attestation proves component was built by verified pipeline from verified source, not injected by attacker or corrupted during build.

Vulnerability Context

SBOMs enumerate components. VEX documents describe vulnerability status. But some vulnerability context fits naturally in SBOM enrichment.

Known vulnerability references:

{
  "component": {
    "name": "log4j-core",
    "version": "2.17.1",
    "properties": [
      {
        "name": "cdx:vulnerability:advisory",
        "value": "https://logging.apache.org/log4j/2.x/security.html"
      }
    ],
    "externalReferences": [
      {
        "type": "advisories",
        "url": "https://nvd.nist.gov/vuln/detail/CVE-2021-44228",
        "comment": "Log4Shell - Fixed in version 2.17.0+"
      }
    ]
  }
}

Not claiming component is vulnerable (that's VEX's role), but documenting that component has been subject of security advisories consumers should be aware of. Enables consumers to verify they're receiving VEX documents for relevant issues.

Operational Metadata

Context about how component is used within your product aids consumer risk assessment.

Deployment scope:

{
  "properties": [
    {
      "name": "cdx:scope",
      "value": "runtime"
    },
    {
      "name": "cdx:environment",
      "value": "backend-api"
    },
    {
      "name": "cdx:exposure",
      "value": "internet-facing"
    }
  ]
}

Scope indicates whether component is present in production artifacts (runtime) vs. only used during build (development, test). Environment describes where component operates (backend, frontend, mobile). Exposure indicates attack surface (internet-facing vs. internal). Consumer can prioritize vulnerability assessment: internet-facing runtime components are higher priority than build-time test dependencies.

Component criticality:

{
  "properties": [
    {
      "name": "cdx:criticality",
      "value": "high"
    },
    {
      "name": "cdx:criticality:reasoning",
      "value": "Handles authentication and authorization for all API endpoints"
    }
  ]
}

Producer assessment of component importance to overall system security and functionality. Helps consumers prioritize incident response: critical component vulnerabilities demand immediate attention, non-critical components can be addressed in normal maintenance cycles.

Enrichment Strategies

Automated Enrichment

Many metadata types can be added automatically during SBOM generation without manual effort.

Build-time enrichment:

def enrich_sbom_during_build(sbom, build_context):
    """Add metadata available from build environment"""

    # Add build timestamp
    sbom['metadata']['timestamp'] = datetime.utcnow().isoformat() + 'Z'

    # Add build environment details
    sbom['metadata']['properties'] = sbom['metadata'].get('properties', [])
    sbom['metadata']['properties'].extend([
        {
            'name': 'cdx:build:id',
            'value': build_context['build_id']
        },
        {
            'name': 'cdx:build:pipeline',
            'value': build_context['pipeline_url']
        },
        {
            'name': 'cdx:git:commit',
            'value': build_context['git_commit']
        },
        {
            'name': 'cdx:git:branch',
            'value': build_context['git_branch']
        }
    ])

    # Enrich each component with hashes
    for component in sbom['components']:
        artifact_path = find_component_artifact(component, build_context)
        if artifact_path:
            component['hashes'] = calculate_hashes(artifact_path, ['SHA-256', 'SHA-512'])

            # Add source repository if available
            repo_url = lookup_repository_url(component)
            if repo_url:
                component['externalReferences'] = component.get('externalReferences', [])
                component['externalReferences'].append({
                    'type': 'vcs',
                    'url': repo_url
                })

    return sbom

Automated enrichment scales efficiently and ensures consistency. Every build gets same metadata types without relying on human memory.

Manual Enrichment

Some metadata requires human judgment and cannot be fully automated.

Risk and criticality assessment: Determining whether component is "critical" vs. "moderate" importance requires understanding system architecture and business context. Developer or architect must assess: "This logging library failure would be inconvenient (moderate). Authentication library failure would be catastrophic (critical)."

License interpretation: Complex license scenarios (dual licensing with choice, licenses with exceptions, custom licenses) may require legal team interpretation and documentation in enriched SBOM.

Operational context: Whether component is internet-facing or internal-only depends on deployment architecture which automated tools may not fully understand. Architect documents: "This API gateway component is internet-facing. Database library behind gateway is internal-only."

Semi-automated approach: Initial SBOM generated automatically. Enrichment workflow presents components to subject matter experts for assessment. SMEs provide judgments, system records in SBOM metadata. Future SBOMs for same components reuse previous assessments unless marked for review.

Progressive Enrichment

Organizations don't need to add all enrichment immediately. Start minimal, add metadata types progressively as capabilities mature.

Level 1 enrichment (basic):

  • Component hashes (SHA-256)
  • Build timestamps
  • Source repository URLs for open source components

Level 2 enrichment (intermediate):

  • Supplier information
  • Multiple hash algorithms
  • License file embedding
  • Basic external references (distribution URLs, security advisories)

Level 3 enrichment (advanced):

  • SLSA provenance attestations
  • Evidence of component presence and usage
  • Criticality and risk assessments
  • Operational context (scope, environment, exposure)
  • Detailed pedigree tracking

Progressive approach prevents overwhelming initial implementation. Demonstrate value with basic enrichment, expand based on consumer feedback and operational experience.

Enrichment for Different Audiences

Different SBOM consumers value different metadata types. Tailor enrichment to your primary audiences.

Security teams: Prioritize: vulnerability references, hashes for integrity verification, criticality ratings, operational exposure context. Security teams assess risk—give them data supporting risk calculation.

Compliance teams: Prioritize: license details with embedded text, supplier information, pedigree tracking. Compliance teams prove regulatory adherence—give them audit trail evidence.

Procurement teams: Prioritize: supplier information, acquisition methods, component freshness (timestamps). Procurement assesses vendor risk—give them supply chain context.

Development teams (consumers): Prioritize: source repository links, version recommendations, evidence of actual usage vs. dead code. Developers assess upgrade paths and compatibility—give them actionable technical details.

Quality and Consistency

Enriched metadata is only valuable if accurate and consistent.

Validation:

  • URLs in externalReferences must be valid and accessible
  • Hashes must match actual component artifacts
  • Timestamps must use consistent timezone (UTC preferred)
  • License identifiers must use valid SPDX IDs
  • Property names should follow conventions (cdx: prefix for CycloneDX extensions)

Consistency across releases: Same component in different product versions should have consistent metadata structure. Don't represent criticality as property in one SBOM and component field in another. Consistency enables automated processing.

Documentation: Document your enrichment conventions. "We use cdx:criticality property with values: critical, high, moderate, low. Determined by security team based on component role in authentication, data handling, or business logic." Consumers need to understand metadata meaning to use it effectively.

Tooling Integration

Most SBOM tools support extensibility for custom metadata.

CycloneDX properties: Arbitrary key-value pairs can be added to metadata, components, services. Use namespaced property names (org.example:key) to avoid conflicts.

SPDX annotations: SPDX supports annotation objects for adding context not captured in core specification.

External references: Both formats support external reference objects linking to resources like repositories, advisories, attestations.

Tool-specific extensions: Some tools provide proprietary extension mechanisms. Evaluate whether using tool-specific extensions limits interoperability vs. using standard extension points.

Common Enrichment Mistakes

Mistake: Enriching with sensitive information Adding internal system details (internal IP addresses, internal tool URLs, employee names) that shouldn't be disclosed to external consumers.

Prevention: Review enrichment content from external consumer perspective. What information is helpful vs. exposing internal details unnecessarily?

Mistake: Inconsistent metadata across products Each product team enriches differently. Consumers receiving SBOMs from multiple products face inconsistent metadata structures and semantics.

Prevention: Establish organizational enrichment standards. Document conventions in internal guidelines. Validate SBOMs against standards before distribution.

Mistake: Stale enrichment Metadata added at SBOM generation but never updated. Component criticality assessed once, never revised as architecture evolves.

Prevention: Establish enrichment refresh cycles. Critical metadata reviewed quarterly or when significant changes occur.

Mistake: Over-enrichment Adding so much metadata that SBOMs become enormous and difficult to process. 50KB basic SBOM becomes 5MB enriched SBOM due to embedded license texts, excessive properties, redundant external references.

Prevention: Balance comprehensiveness with pragmatism. Focus enrichment on high-value metadata actually used by consumers. Embed license text for custom licenses; link to standard licenses rather than embedding.

Next Steps

On this page