Generate SBOMs

SBOM generation is the foundational producer workflow. Without reliable generation, all downstream activities—distribution, VEX coordination, vulnerability management—lack the data foundation they require. Getting generation right matters more than getting it fast.

This page covers both automated approaches for modern CI/CD environments and manual approaches for systems where automation isn't viable. Most organizations need both—automation for actively developed products, manual processes for legacy systems or low-frequency releases.

Automated Generation

Automated SBOM generation integrates into build pipelines, producing SBOMs as naturally as compiling code or running tests. This approach scales sustainably and maintains quality consistency that manual processes struggle to achieve.

CI/CD Integration Pattern

The standard automation pattern integrates SBOM generation as a pipeline stage occurring after dependency resolution but before artifact packaging.

Build Pipeline Flow:

1. Checkout source code
2. Resolve dependencies (npm install, mvn dependency:resolve, pip install)
3. Run tests
4. Generate SBOM ← Integration point
5. Validate SBOM
6. Build artifacts
7. Sign and publish

Positioning matters. Generating before dependency resolution produces incomplete SBOMs missing transitive dependencies. Generating after artifact packaging risks missing build-time components. The sweet spot: immediately after dependency resolution completes, capturing the exact dependency state that will be compiled.

Tool Selection Considerations

Dozens of SBOM generation tools exist, varying in ecosystem support, detection capabilities, and format flexibility. No single tool excels at everything.

Language-specific tools typically produce higher quality for their target ecosystem than generic tools. For Node.js projects, CycloneDX-npm understands npm's dependency resolution nuances better than language-agnostic scanners. Python projects benefit from tools deeply integrated with pip and Poetry. Java applications work best with Maven or Gradle plugins aware of Java packaging conventions.

Polyglot projects using multiple languages present challenges. A backend using Java + Python + Node.js might need three different tools, with results merged into a single SBOM. This merging process requires careful deduplication and relationship mapping to avoid listing the same component multiple times or missing cross-language dependencies.

Container scanning tools like Syft or Trivy excel at analyzing container images layer-by-layer, capturing base image components, application code, and system libraries. Essential for containerized applications but insufficient for source code analysis—container SBOMs show what's deployed, not how it was built.

Configuration for Quality

Default tool configurations rarely produce optimal results. Deliberate configuration dramatically improves SBOM quality.

Scope control determines what's included. Should test dependencies appear in production SBOMs? Probably not—they increase noise without representing runtime security surface. Configure tools to exclude test scope, generating separate SBOMs for development environments if needed.

Depth configuration controls transitive dependency traversal. Some tools default to shallow scans showing only direct dependencies unless explicitly configured for deep traversal. Incomplete dependency trees create dangerous blind spots. Configure for complete depth, capturing the entire dependency graph.

Metadata enrichment varies by tool. Some automatically include license information, supplier details, and provenance data. Others provide bare component names requiring manual enrichment. Evaluate metadata completeness and supplement where tools fall short.

Format and version selection requires decisions. CycloneDX 1.6 or SPDX 2.3? JSON or XML encoding? Align with your consumers' capabilities and your internal tooling rather than choosing arbitrarily. Once selected, standardize across products to simplify downstream consumption.

Practical Example: Node.js Application

A real-world implementation for a Node.js application in GitHub Actions:

name: Build and Generate SBOM

on:
  push:
    branches: [main, release/*]
  release:
    types: [created]

jobs:
  build-with-sbom:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3

      - name: Setup Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '18'

      - name: Install dependencies
        run: npm ci  # Lock file ensures reproducibility

      - name: Run tests
        run: npm test

      - name: Generate SBOM
        run: |
          npm install -g @cyclonedx/cyclonedx-npm
          cyclonedx-npm --output-file sbom.json

      - name: Validate SBOM
        run: |
          # Schema validation
          npx ajv-cli validate -s cyclonedx-schema.json -d sbom.json
          # Component count sanity check
          COMPONENT_COUNT=€(jq '.components | length' sbom.json)
          if [ "€COMPONENT_COUNT" -lt 10 ]; then
            echo "Warning: Suspiciously low component count"
            exit 1
          fi

      - name: Sign SBOM
        run: |
          # Sign with Cosign or similar
          cosign sign-blob --key cosign.key sbom.json > sbom.json.sig

      - name: Upload SBOM artifact
        uses: actions/upload-artifact@v3
        with:
          name: sbom-€{{ github.sha }}
          path: |
            sbom.json
            sbom.json.sig
          retention-days: 90

This pattern ensures SBOMs generate automatically on every relevant build, undergo quality validation, and are signed for authenticity—all without human intervention.

Handling Edge Cases

Real-world codebases present complications that simple tool execution doesn't address.

Vendored dependencies copied into source control rather than managed via package managers require special handling. Tools scanning package manifests won't detect these components. Solutions include maintaining separate manual inventory of vendored code, using binary scanning to detect known libraries, or restructuring to eliminate vendoring (ideal but often impractical).

Private registries containing internal or licensed components need authentication configuration. Tools must access private npm registries, Maven repositories, or PyPI mirrors to resolve component metadata. Configure credentials securely through environment variables or secret management rather than hardcoding in configuration files.

Monorepo structures with multiple projects in a single repository require per-project SBOM generation. A repository containing backend, frontend, and mobile applications needs three separate SBOMs, not one aggregate. Configure tooling to process each project independently, maintaining clear boundaries.

Dynamic dependencies resolved at runtime (plugin systems, feature flags loading components conditionally) challenge static analysis. SBOMs generated at build time can't know which optional components will actually load. Document these as optional dependencies with appropriate metadata indicating runtime-conditional nature.

Manual Generation

When automation isn't viable—legacy systems without build pipelines, embedded systems with complex toolchains, or one-off documentation for stable products—manual SBOM generation becomes necessary.

Manual Process Framework

Systematic manual generation follows repeatable steps even without automation:

Step 1: Enumerate direct dependencies Start with project manifests—pom.xml, package.json, requirements.txt—listing explicitly declared dependencies. These form your SBOM foundation. Record component name, version, and ecosystem (Maven, npm, PyPI).

Step 2: Resolve transitive dependencies Execute package manager resolution commands (npm list, mvn dependency:tree, pip show) to reveal indirect dependencies. Transitive components often outnumber direct by 10:1 or more—don't skip this step thinking direct dependencies suffice.

Step 3: Identify system dependencies Applications often depend on system libraries, runtime environments, or operating system components not captured by language package managers. Python applications need Python runtime. Node applications need Node.js. Containerized applications include base image components. Document these explicitly.

Step 4: Capture metadata For each component, gather:

Exact version (not version range)
License identifier (SPDX format where possible)
Supplier/manufacturer
Download location or repository URL
Cryptographic hash if available

This metadata transforms component lists into proper SBOMs meeting minimum standards.

Step 5: Document relationships Record dependency relationships—which components depend on which others. Simple flat lists suffice at Level 1; hierarchical dependency trees better represent reality. At minimum, distinguish direct dependencies from transitive.

Step 6: Generate SBOM document Use online generators, spreadsheet templates, or simple JSON/XML editing to create formatted SBOM. Validate against schema before considering complete.

Manual Process Example

A small internal application with quarterly releases:

Component Inventory:

Direct Dependencies (from requirements.txt):
- Flask 2.3.0 (pkg:pypi/flask@2.3.0)
- SQLAlchemy 2.0.0 (pkg:pypi/sqlalchemy@2.0.0)
- Celery 5.3.0 (pkg:pypi/celery@5.3.0)

Transitive Dependencies (from pip show):
- Werkzeug 2.3.0 (via Flask)
- Jinja2 3.1.0 (via Flask)
- Click 8.1.0 (via Flask, Celery)
- [... 20 more ...]

System Dependencies:
- Python 3.11.4
- Ubuntu 22.04 base image (container deployment)

Transform into CycloneDX JSON format using template, validate with schema validator, document as SBOM v1 for Application v1.2.0. Regenerate quarterly when releasing new versions or when dependencies update.

Time investment: 2-4 hours per SBOM initially, 1-2 hours for updates. Sustainable for low-volume products but not scalable to dozens of applications or monthly releases.

Quality Assurance During Generation

Regardless of automation level, validate output before distribution.

Completeness Checks

Component count reasonableness: Modern applications typically have 50-500 dependencies. Ten components suggests incomplete scanning; 10,000 suggests noise or misconfiguration. Know your expected range.

Dependency depth: Single-level SBOMs showing only direct dependencies miss most components. Verify transitive dependencies appear. Simple check: are there components not listed in your project manifest? If not, transitive capture likely failed.

Critical component presence: Manually verify several known dependencies appear correctly. If you know you use specific libraries, search for them in the generated SBOM. Their absence indicates tool configuration or detection problems.

Accuracy Verification

Version matching: Spot-check that listed versions match deployed artifacts. Deploy version discrepancies mean SBOMs don't represent reality—dangerous for vulnerability management.

Identifier formatting: Validate PURLs and CPEs parse correctly. Malformed identifiers break automation downstream. Use validation libraries to check syntax.

License information: Compare listed licenses against component documentation. License detection errors are common—tools guess based on file patterns, sometimes incorrectly.

Hybrid Approaches

Many organizations use both automated and manual processes strategically.

Automated for active development products with frequent releases, modern CI/CD, and well-supported technology stacks. These generate SBOMs automatically on every build without human effort.

Manual for stable/legacy products with rare releases, older systems without build automation, or complex toolchains where automation reliability is questionable. Quarterly or annual manual generation suffices when code changes infrequently.

Semi-automated with manual review runs tools automatically but requires human review before publication. Combines automation efficiency with human judgment for edge cases tools handle poorly.

The hybrid model matches effort to need—investing automation where it pays off, accepting manual work where automation isn't cost-effective.

Integration with SBOM Lifecycle

Generation integrates with broader SBOM lifecycle management:

Versioning: Each generation creates new SBOM version with unique identifier and timestamp. See SBOM and VEX Lifecycle.

Storage: Generated SBOMs feed into repository or management system. See Distribute to Customers.

Validation: Generated SBOMs undergo quality checks before distribution. See Validate and Sign.

VEX coordination: Software releases trigger both SBOM generation and VEX review. See Publish VEX Documents.

Understanding these connections ensures generation doesn't operate in isolation from other lifecycle stages.

Common Generation Mistakes

Mistake 1: Generating from source instead of resolved dependencies Reading package.json shows ^1.0.0 but build actually installed 1.0.5. SBOM shows wrong version, breaking vulnerability correlation.

Solution: Generate from lock files (package-lock.json, Gemfile.lock) or resolved dependency tree, never from abstract version declarations.

Mistake 2: Running tools in wrong directory Monorepo with multiple projects—tool runs at repository root generating single SBOM mixing all projects. Results are nonsensical mashup.

Solution: Execute tools at correct project root for each discrete product, generating separate SBOMs.

Mistake 3: Ignoring tool warnings Tools emit warnings about detection failures, unusual patterns, or configuration issues. Teams ignore warnings, generating incomplete SBOMs unknowingly.

Solution: Treat warnings seriously. Investigate causes, fix configuration, or document known limitations explicitly.

Mistake 4: Inconsistent tool versions Team members use different tool versions producing different results for identical code. SBOMs vary unpredictably.

Solution: Pin tool versions in build configuration. Update deliberately across all projects rather than allowing drift.

Next Steps

Ensure quality through Validate and Sign
Publish generated SBOMs via Distribute to Customers
Coordinate with Publish VEX Documents for complete transparency
Address edge cases in Advanced Topics - Legacy Systems

Generate SBOMs

On this page