STP
SBOM Observer/

Ingest and Store

Receiving, processing, and managing SBOM repositories

Requesting SBOMs from suppliers is first step. Converting vendor-provided documents into actionable intelligence requires systematic ingestion, validation, storage, and lifecycle management. Ad-hoc SBOM handling—files scattered across email attachments, SharePoint folders, developer laptops—delivers minimal value. Systematic repository management transforms SBOMs from interesting documents into operational data source supporting vulnerability management, compliance reporting, and risk assessment.

Effective SBOM repositories function like libraries: standardized cataloging, easy searching, version tracking, accessibility controls, preservation of historical records. Poor repositories function like garages: everything thrown in boxes, impossible to find anything, no organization, eventual abandonment. The difference between value and waste is systematic management.

Repository Requirements

Functional SBOM repository must satisfy multiple operational requirements beyond merely storing files.

Storage and Versioning

Persistent storage: SBOMs must be retained long-term, not just until next version arrives. Historical SBOMs enable retrospective vulnerability analysis—"CVE published today, but we deployed that product version two years ago, was it vulnerable then?" Without historical SBOMs, cannot answer.

Version correlation: Link SBOMs to specific software versions. Product X version 1.2.3 has different SBOM than version 1.2.4. Repository must maintain version relationships: "For Product X v1.2.3 deployed in production, retrieve corresponding SBOM."

Update tracking: When vendor provides updated SBOM for same product version (component vulnerability analysis changed, metadata enriched), repository must preserve previous versions while marking latest as current. Audit trail shows "SBOM v1 received January 15, SBOM v2 received February 1, what changed?"

Search and Query

Component-level search: "Show all products containing component-x versions 1.0.0 through 1.2.5." This is the critical query when CVE published affecting specific component versions. Must complete in seconds, not minutes—incident response timing matters.

License queries: "Show all products containing GPL-licensed components." Supports compliance audits and policy enforcement.

Supplier queries: "Show all products from vendor-Y." Enables supplier risk management and vendor performance tracking.

PURL-based search: Package URL (PURL) provides standardized component identifier. Repository must support PURL queries: pkg:npm/express@4.17.1, pkg:maven/org.apache.logging.log4j/log4j-core@2.15.0

Access Control

Product-based permissions: Users should access SBOMs for products they're authorized to manage. Product A team sees Product A SBOMs, not unrelated Product B SBOMs. Prevents sensitive component composition disclosure to unauthorized personnel.

Role-based access: Security team needs read access to all SBOMs for vulnerability analysis. Procurement team needs access for supplier assessment. Developers need access for products they build. Legal team needs access for license compliance. Different roles, different access patterns.

Audit logging: Track who accessed which SBOMs when. Audit log supports security investigations, compliance attestation, and usage analysis.

Integration APIs

Programmatic access: Manual web UI is insufficient. Vulnerability scanners, compliance tools, incident response automation need API access for querying and retrieving SBOMs.

Webhook notifications: When new SBOM ingested or existing SBOM updated, notify interested systems via webhooks. Enables reactive automation: "New SBOM arrived, trigger vulnerability scan."

Bulk operations: Export all SBOMs for compliance reporting. Import batch of SBOMs from multi-product vendor. Bulk delete outdated SBOMs based on retention policies.

Repository Implementation Options

Dedicated SBOM Repository Tools

Dependency-Track: Open source platform purpose-built for SBOM management. Ingests CycloneDX and SPDX formats. Provides vulnerability analysis, policy enforcement, metrics, and API access. Active community and regular updates.

Advantages: Purpose-built functionality, continuous vulnerability monitoring, policy engines, comprehensive APIs, established user community, no custom development required.

Considerations: Infrastructure hosting requirements (server, database), learning curve for administration, may include features beyond basic repository needs.

Setup example:

# Docker Compose deployment
docker-compose up -d

# API-based SBOM upload
curl -X PUT "http://dependency-track:8080/api/v1/bom" \
  -H "X-Api-Key: €API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F "project=product-api" \
  -F "bom=@sbom.json"

OSS Review Toolkit (ORT): Comprehensive supply chain tooling including SBOM storage and analysis. Strong focus on compliance and license management alongside vulnerability tracking.

SBOM Observer (Commercial): Enterprise-focused commercial platform with advanced analytics, compliance reporting, and vendor management features.

General-Purpose Solutions

Document management systems: SharePoint, Confluence, or similar document repositories can store SBOMs as versioned documents. Provides basic storage and access control but limited SBOM-specific functionality.

Advantages: Leverages existing infrastructure, familiar to users, no new tools to learn, immediate availability.

Disadvantages: No component-level searching, no automated vulnerability correlation, manual version management, limited API access. Acceptable for small-scale initial implementations, doesn't scale to comprehensive programs.

Object storage with metadata: S3 buckets or similar object storage with metadata tagging. Store SBOM files, tag with product/version/date metadata, query via tags.

# S3-based SBOM storage pattern
import boto3
import json

def store_sbom(sbom_data, product_id, version):
    """Store SBOM in S3 with searchable metadata"""
    s3 = boto3.client('s3')

    key = f"sboms/{product_id}/{version}/sbom-{timestamp}.json"

    # Extract component summary for tagging
    component_count = len(sbom_data.get('components', []))
    licenses = extract_licenses(sbom_data)

    s3.put_object(
        Bucket='sbom-repository',
        Key=key,
        Body=json.dumps(sbom_data),
        Metadata={
            'product-id': product_id,
            'version': version,
            'component-count': str(component_count),
            'sbom-format': 'CycloneDX',
            'ingestion-date': datetime.utcnow().isoformat()
        },
        Tags=[
            {'Key': 'product', 'Value': product_id},
            {'Key': 'version', 'Value': version}
        ]
    )

Advantages: Simple, scalable, low cost, flexible, no specialized infrastructure.

Disadvantages: Limited query capabilities (can't search component-level within SBOMs), no built-in vulnerability correlation, requires custom tooling for analysis.

Database-Backed Custom Solutions

Organizations with specific requirements may build custom SBOM repositories using databases (PostgreSQL, MongoDB) for storage and custom application logic for ingestion and querying.

Schema design approach:

-- Products and versions
CREATE TABLE products (
  id UUID PRIMARY KEY,
  name TEXT NOT NULL,
  description TEXT
);

CREATE TABLE product_versions (
  id UUID PRIMARY KEY,
  product_id UUID REFERENCES products(id),
  version TEXT NOT NULL,
  release_date DATE
);

-- SBOMs
CREATE TABLE sboms (
  id UUID PRIMARY KEY,
  product_version_id UUID REFERENCES product_versions(id),
  sbom_format TEXT NOT NULL, -- 'CycloneDX' or 'SPDX'
  sbom_version TEXT NOT NULL,
  content JSONB NOT NULL,
  ingestion_timestamp TIMESTAMP DEFAULT NOW(),
  is_current BOOLEAN DEFAULT TRUE
);

-- Components (extracted from SBOMs for queryability)
CREATE TABLE components (
  id UUID PRIMARY KEY,
  sbom_id UUID REFERENCES sboms(id),
  name TEXT NOT NULL,
  version TEXT NOT NULL,
  purl TEXT,
  license TEXT,
  supplier TEXT
);

CREATE INDEX idx_components_purl ON components(purl);
CREATE INDEX idx_components_name_version ON components(name, version);

Advantages: Complete control, customization to exact requirements, integration with existing systems, query performance optimization for specific use cases.

Disadvantages: Development and maintenance burden, requires database expertise, must implement features that dedicated tools provide out-of-box, ongoing support costs.

Ingestion Workflow

Automated Ingestion

Vendor API integration: If vendors provide SBOM APIs, implement scheduled polling or webhook subscriptions for automatic SBOM retrieval.

# Automated vendor SBOM fetching
import requests
import schedule

def fetch_vendor_sboms():
    """Poll vendor API for updated SBOMs"""
    vendors = load_vendor_configurations()

    for vendor in vendors:
        response = requests.get(
            f"{vendor['api_url']}/sboms/latest",
            headers={'Authorization': f"Bearer {vendor['api_token']}"}
        )

        if response.status_code == 200:
            sboms = response.json()
            for sbom in sboms:
                if is_newer_than_stored(sbom):
                    ingest_sbom(sbom, vendor['name'])
                    log_ingestion(sbom, vendor['name'])

# Schedule daily polling
schedule.every().day.at("02:00").do(fetch_vendor_sboms)

Email monitoring: Many vendors still distribute SBOMs via email. Implement email monitoring service extracting SBOM attachments for automatic ingestion.

Portal scraping: If vendor publishes SBOMs on customer portal but lacks API, implement portal scraping (with vendor permission) for retrieval automation.

Manual Ingestion

Automated ingestion isn't always feasible. Provide manual upload workflows for SBOMs received through non-automated channels.

Web UI upload: Simple form accepting SBOM file upload with required metadata (product name, version, vendor). Validates format, stores in repository.

CLI tool: Command-line utility for batch operations or developer workflows:

# SBOM CLI ingestion
sbom-tool ingest \
  --file vendor-product-v1.2.3-sbom.json \
  --product "vendor-product" \
  --version "1.2.3" \
  --vendor "acme-corp" \
  --validate

# Batch ingestion
for sbom in *.json; do
  sbom-tool ingest --file "€sbom" --auto-detect-metadata
done

Validation During Ingestion

Don't blindly trust and store SBOMs. Validate during ingestion to catch problems early.

Format validation: Confirm SBOM conforms to CycloneDX or SPDX schema. Reject malformed SBOMs immediately.

Completeness checks: Component count reasonable? Required metadata fields populated? If SBOM lists only 3 components for complex enterprise application, flag for review.

Duplicate detection: Is identical SBOM already stored? Duplicate ingestion wastes storage and creates confusion.

Metadata correlation: Does SBOM metadata match product/version being associated? SBOM claiming to describe Product A version 2.0 shouldn't be stored under Product B version 1.0.

Quarantine workflow: SBOMs failing validation go to quarantine queue for manual review rather than being rejected outright or stored as-is. Human can assess whether validation failure is legitimate problem or overly strict check.

Storage Organization

Hierarchical Structure

Organize SBOMs hierarchically reflecting product relationships:

sbom-repository/
├── products/
│   ├── product-api/
│   │   ├── v1.0.0/
│   │   │   ├── sbom-2023-06-15.json
│   │   │   └── metadata.yaml
│   │   ├── v1.1.0/
│   │   │   ├── sbom-2023-09-01.json
│   │   │   ├── sbom-2023-09-15.json  # Updated SBOM
│   │   │   └── metadata.yaml
│   │   └── v1.2.0/
│   │       ├── sbom-2024-01-10.json
│   │       └── metadata.yaml
│   └── product-web/
│       └── v2.0.0/
│           ├── sbom-2024-01-05.json
│           └── metadata.yaml
└── vendors/
    ├── acme-corp/
    │   └── widget-platform/
    │       └── v3.2.1/
    │           └── sbom-2023-12-01.json
    └── beta-systems/
        └── analytics-engine/
            └── v1.5.0/
                └── sbom-2024-01-08.json

Hierarchy enables intuitive navigation and clear product/version correlation.

Metadata Files

Store metadata alongside SBOMs documenting context:

# metadata.yaml
product_id: "product-api"
version: "1.1.0"
release_date: "2023-09-01"
sbom_versions:
  - file: "sbom-2023-09-01.json"
    ingested: "2023-09-01T10:30:00Z"
    source: "ci-cd-pipeline"
    status: "superseded"
  - file: "sbom-2023-09-15.json"
    ingested: "2023-09-15T14:20:00Z"
    source: "manual-enrichment"
    status: "current"
deployment_status: "production"
criticality: "high"
owner_team: "platform-engineering"
customer_count: 342
notes: "Version 1.1.0 includes security updates. All customers advised to upgrade from 1.0.x."

Metadata provides operational context enriching SBOM utility.

Retention Policies

Define how long to retain SBOMs and under what conditions to archive or delete.

Active retention: Keep SBOMs for currently-deployed software versions indefinitely while in use. As long as v1.1.0 is deployed anywhere, retain its SBOM.

Historical retention: After version retirement, retain SBOM for minimum period (e.g., 5 years) supporting potential retrospective investigations.

Archival: Move old SBOMs to archival storage (cheaper, slower access) after active retention period.

Deletion: Only delete SBOMs after legal/compliance retention requirements satisfied and no ongoing investigations reference them.

Query Optimization

Component-level searching is most critical repository capability. Optimize for this use case.

Indexing strategy:

-- Optimize component searches
CREATE INDEX idx_components_purl ON components(purl);
CREATE INDEX idx_components_name ON components(name);
CREATE INDEX idx_components_version ON components(version);
CREATE INDEX idx_components_license ON components(license);

-- Optimize version range queries
CREATE INDEX idx_product_versions_product_version
  ON product_versions(product_id, version);

-- Optimize temporal queries
CREATE INDEX idx_sboms_ingestion_timestamp
  ON sboms(ingestion_timestamp DESC);

Caching: Frequently-accessed SBOMs (latest version of critical products) should be cached in memory for sub-millisecond access.

Pre-computed summaries: Component counts, license distributions, vulnerability summaries pre-computed and cached rather than calculated on every query.

Integration Examples

Vulnerability Scanner Integration

def integrate_sbom_with_scanner(repository, scanner_api):
    """Push SBOMs to vulnerability scanner"""

    # Get all current production SBOMs
    sboms = repository.query(
        deployment_status='production',
        is_current=True
    )

    for sbom in sboms:
        # Upload to scanner
        scanner_api.upload_sbom(
            product_id=sbom['product_id'],
            version=sbom['version'],
            sbom_content=sbom['content']
        )

        # Scanner automatically begins vulnerability analysis
        print(f"Synced {sbom['product_id']} v{sbom['version']} to scanner")

Compliance Reporting

def generate_compliance_report(repository, start_date, end_date):
    """Generate compliance report from repository data"""

    report = {
        'period': f"{start_date} to {end_date}",
        'sbom_coverage': {},
        'license_compliance': {},
        'supplier_diversity': {}
    }

    # Calculate SBOM coverage
    products = repository.get_all_products()
    products_with_sbom = repository.get_products_with_sbom(
        date_range=(start_date, end_date)
    )
    report['sbom_coverage']['rate'] = len(products_with_sbom) / len(products)

    # Analyze license compliance
    prohibited_licenses = ['GPL-3.0', 'AGPL-3.0']
    violations = repository.query_components(
        licenses=prohibited_licenses,
        date_range=(start_date, end_date)
    )
    report['license_compliance']['violations'] = len(violations)

    return report

Common Ingestion Pitfalls

Pitfall: No validation during ingestion Blindly storing whatever files arrive. Low-quality or malformed SBOMs corrupt repository.

Prevention: Implement validation gates. Reject or quarantine problematic SBOMs for review.

Pitfall: Overwriting without version tracking New SBOM for same product/version replaces old one completely. No audit trail of what changed or why.

Prevention: Preserve all SBOM versions, mark latest as "current", maintain change history.

Pitfall: No component-level indexing SBOMs stored as opaque files. Must parse entire repository to answer component queries. Slow and impractical at scale.

Prevention: Extract component data during ingestion, store in indexed database for fast searching.

Pitfall: Insufficient access controls All users can access all SBOMs regardless of need. Sensitive component information disclosed inappropriately.

Prevention: Implement role-based and product-based access controls from beginning.

Next Steps

On this page