Introduction

A software supply chain is the series of steps performed when writing, testing, packaging, and distributing application software to end consumers. Given the increased prominence of software supply chain exploits and attacks, the Cloud Native Computing Foundation (CNCF) Technical Advisory Group for Security published a whitepaper titled “Software Supply Chain Best Practices” , which captures over 50 recommended practices to securing the software supply chain. That document is considered a prerequisite for the content described in this reference architecture.

This publication is a follow-up to that paper, targeted at system architects, developers, operators, and engineers in the areas of software development, security and compliance. This reference architecture adopts the “Software Factory” model¹ for designing a secure software supply chain.

This reference architecture and accompanying prototype have been produced after a thorough evaluation of available tooling as of early 2022. The components selected are open source, cloud native, and prioritise security.

Problem Scope: Software Supply Chain Security

The practices that the “Software Supply Chain Best Practices” whitepaper captures are predicated on four overarching principles:

Defence in depth (Layered end-to-end security controls)
Signing and Verification
Artefact Metadata Analytics
Automation

Those four principles are in turn applied and organised around five functional areas deemed as the entities in a software factory:

When thinking about how to secure those entities, there are two broad ways of organising security controls:

Around three critical concerns:
1. Provenance verification: assurance that existing evidence assumptions of where and how an artefact originates from are true and that the artefact or its accompanying metadata have not been tampered with during the build or delivery processes.
2. Trustworthiness: assurance that a given artefact and its contents can be trusted to do what it is purported to do (ie, is suitable for a purpose). This involves judgement on whether the code is safe to execute and making an informed decision about accepting the risk that executing the code presents.
3. Dependencies: recursive checking of an artefact’s dependency tree for trustworthiness and provenance of the artefacts it uses.
By stages of activity (see diagram): 4. Pre-Build: principally concerned with development and handling of the source code and with the collection and storage of dependencies. 5. Build: the process of building, testing, and packaging an artefact according to its build specifications. 6. Post-Build: principally concerned with the storage, delivery, deployment, continuous verification~~.~~

alt_text

In the matrix below, we attempt to overlay these entities, concerns, and activity stages with one another:

Stages:	Pre-Build	Build	Post-Build
Entities:	Source Code: Development and Handling Materials: Selection, Collection, and Storage	Source Code and Dependencies: As Inputs Build Pipelines: Components performing the build Artefacts: As outputs	Artefacts: Storage and Verification Deployments: Verification of artefacts
Concerns:	Provenance: Developer Contributions, Dependency Definitions Trustworthiness: Developer Contributions Dependencies: Dependency provenance and trustworthiness	Provenance: Integrity of the build, collection of metadata and attestations, signing of artefacts	Provenance: Verification of Attested Metadata Trustworthiness: Consumer judgement of artefact’s worth Dependencies: Recursive analysis of both Provenance and Trustworthiness by consumers

This reference architecture focuses specifically on the critical concern of provenance and primarily on the activity stage of the “build.” There are numerous other publications and guides which address issues around trustworthiness, including practices like SAST/DAST scanning, code signing, etc, including the CNCF Software Supply Chain Best Practices Paper . We direct readers to these documents for more information on those facets of supply chain security.

Our decision to emphasize provenance and the build pipeline in this paper is based on the foundational role provenance verification plays in other supply chain security concerns. Provenance provides the evidence, for example, that SAST/DAST scanning was completed as claimed. If you are relying on the results of SAST/DAST scans of a software artefact to inform your decision on its trustworthiness, you need to know that those claims are accurate. Provenance provides that assurance. It also provides assurance that an artefact which claims to be the product of a specific codebase and a specific build process is in fact the product it claims to be or that the artefact downloaded from a remote source is the same one you expected to receive. All of these claims are foundational to being able to make informed decisions about an artefact’s trustworthiness: you must be able to trust the evidence presented about an artefact’s trustworthiness is valid evidence before you can trust the claims that evidence makes about the artefact.

How to read this document

This paper offers a high level treatment of a secure software factory. This is designed to explain the necessary interfaces and control structures for each component of a software factory to generate verifiable provenance. Throughout the document, we make reference to specific recommended tools, listed in Appendix B. These tools reflect our reference implementation/prototype, which can be found at https://github.com/thesecuresoftwarefactory/ssf . While these tools are what the writers and designers of this reference architecture have chosen to recommend, the intention of this architecture is to be adaptable to other tools. The theoretical treatment in this architecture should provide guidance on what features and/or configurations are required to substitute your own tool choices.

As the tools we recommend are all under active development, the reader must keep in mind that these details are valid only as of the time of publication, MM/YYYY. We provide versioning information for your reference. Upon implementation, consult the official documentation for each tool for the appropriate version to make use of.

A Word About the Prototype

The CNCF Security TAG supply chain working group is working on a prototype of the architecture as presented in this document. This prototype acts as a proof of concept to help illustrate the architecture put forward and to exercise the several integration points of the secure software factory.

The source can be found here: https://github.com/thesecuresoftwarefactory/

The Secure Software Factory

_“Architects look at thousands of buildings during their training, and study critiques of those buildings written by masters. In contrast, most software developers only ever get to know a handful of large programs well—usually programs they wrote themselves—and never study the great programs of history. As a result, they repeat one another’s mistakes rather than building on one another’s successes.” _- The Architecture of Open Source Applications

The subsequent sections detail how a Secure Software Factory ought to be structured and how its different parts interact.

Key Diagrams

Secure Software Factory Landscape

alt_text

The Secure Software Factory sits in a larger System Delivery Lifecycle process. Within that process, the SSF has both upstream and downstream dependencies. Upstream, the SSF depends on Identity and Access Management for both human users and other software services. During a pipeline run the SSF relies on Source Code Control for fetching the code to be built and on Artefact Storage for dependencies required for the build. Downstream, the SSF is depended on for providing attestations and signatures regarding artefacts which can be used by production systems to determine artefact provenance and make policy decisions about artefact deployment.

Secure Software Factory Components/Elements

alt_text

The above diagram shows how the various services running inside of the Secure Software Factory interact with each other, and a portion of the external services they depend on. The diagram is simplified, and doesn’t show every interaction between each tool. For example, in a real environment, Runtime Visibility monitors more than just the Build Environment. The remainder of this document illustrates how the services interact and function in further detail.

Pipeline Run Example

alt_text

This diagram is intended to show an example Pipeline Run inside the SSF. Some tasks might interact with other external services outside the scope of the SSF. The exact number of tasks depends on the requirements of your project.

There are a few important takeaways from the above diagram.

The Pipeline Observer records what Tasks occur in what order.
The Tasks interact with some type of Runtime Build Storage during normal operation. The storage in some cases might be shared between tasks, while in other cases it might not. Other areas of this architecture document go into further detail on shared storage.

Not every task will provide attestation or additional metadata, but those that do support this capability should be signed and securely stored in a source of truth.

Components of the SSF

The SSF that manufactures secure software can be broken down into several categories of components, like that of a regular factory. These are the core components, the management components and the distribution components. The core components are responsible for the central task of the Secure Software Factory: taking the inputs of the factory and processing them to create the output artefacts. The management components ensure that the factory runs in accordance with policy. It ensures that the processes of the factory are validated in the right way, and provides evidence and documentation of the outputs of the factory. The distribution components are in charge of moving the products of the factory to where they can be made available for usage, as well as to provide guidance and tools to ensure that outputs of the factory are consumed safely.

The “Core” Components

The core components can further be classified into 3 stages: the Scheduling and Orchestration Platform, which runs all the other components, the Pipeline Framework, which details the basic layout of the build pipeline, and the Build Environments, which perform the actions defined in the pipeline.

Scheduling and Orchestration Platform

A Secure Software Factory seeks to run its components in the most minimal and isolated way possible. All other components of the SSF leverage this platform to schedule their jobs to perform their respective actions. The prototype relies on Kubernetes as its Scheduling and Orchestration platform.

See <kubernetes hardening guide> for best security practises for Kubernetes. Follow similar guides for a different scheduling and orchestration platform.

Pipeline Framework and Tooling

Pipelines are a core part of the SSF as they encode the concrete workflow for building the software artefacts. This typically follows a Continuous Integration (CI) workflow, i.e. repeatable sets of tasks intended to download, build, and test code. In a cloud native context, the pipeline tooling can use the scheduling and orchestration platform to run each task in a container. For the prototype, we are using Tekton Pipelines to fill this role, which leverages Kubernetes as its scheduling platform

Given that the pipeline is running on the scheduling and orchestration platform, it should be considered/treated as any other workload the platform manages, including being subject to the same security requirements and measures. At minimum, all container images used in the pipeline should be subject to signature verification and scanned for any known vulnerabilities.

Build Environments

The build environment is the actual container(s) or worker(s) where the source code is turned into a machine-usable software product, which we refer to as an artefact. Existing CI frameworks typically follow ephemeral execution patterns, wherein they create a new instance for every execution job. This pattern may even be extended to create a new instance of the scheduling platform to host every new build pipeline. The build environment should generate evidence and an automated attestation about the input parameters, actions and tools used during the build, such that they can be independently validated to provide assurance for build security.

The “Management” Components

A SSF will use a Policy Management Framework to enforce various controls and gates. This may include policies around identities of users who may invoke the pipeline, worker nodes where pipeline should be executed and container images that can be used in the pipeline. It will then utilise a series of monitoring components to verify conformity with these policies: Node Attestors, Workload Attestors, and Pipeline Observers.

Policy Management Framework

A SSF needs policies that define the actors for each step in the build process. For example, a policy might define the actor (human or otherwise) authorized to sign metadata for a particular task. These policies are important at the time of verification within, for example, an admission controller, where they are used to validate that the right actors performed the respective tasks.

Policies should follow cloud native and supply chain security best practises: <insert best practises docs>

For more information on Policy Management see: https://github.com/kubernetes/sig-security/blob/main/sig-security-docs/papers/policy/CNCF_Kubernetes_Policy_Management_WhitePaper_v1.pdf

Attestors and Observers

There are three basic components of the SSF which monitor or attest to policy adherence:

Node Attestors, which certify the identity of nodes
Workload Attestors, which certify the identity of workload processes
Pipeline Observers, which capture the verifiable metadata from pipeline processes.

Node attestors and workload attestors work in conjunction to ensure the node selected for running the work is authorised to host that workload and it is not compromised. Pipeline observers then build upon this evidence by generating additional metadata about individual tasks executed in the pipeline to provide comprehensive assurance across the build process. This synthesis allows later steps to validate that previous steps were completed as expected and a level of guarantee around the provenance and legitimacy of the final artefacts from the SSF.

All metadata from Node Attestors, Workload Attestors and the Pipeline Observer should be signed and included as part of the metadata documents output from the SSF.

The “Distribution” Components

Upon completion of a pipeline run, the SSF outputs several artefacts. Artefacts must be available to downstream consumers and securely stored. Signatures for artefacts should also be stored such that they can easily be found and verified. These signatures can be stored alongside the artefact for convenient discoverability and distribution or in a separate location.

Artefact Repository

The Artefact Repository stores artefacts the SSF outputs. This repository should be accessible from both the build and deploy environments. The stored artefacts may include container images, Helm charts, SBoMs, and their corresponding signatures. In some cases, the artefact repository can also serve as the storage location for metadata, such as SBoMs, attestations, and signatures. In other cases, users may prefer to store these items separately or in multiple locations.

Admission Controller

An Admission Controller in the SSF limits what artefacts can be deployed on a Scheduling and Orchestration Platform. “Admission control”, in a general sense, is the act of enforcing policies around the consumption of components in a system.

In the SSF, there are multiple levels at which admission control must occur:

Enforcing policies on the sources and packages pulled into a build, including “intermediate artefacts” passed between steps in the build pipeline. For example, evaluating whether these objects have been properly signed or came from a known and trusted party.
Enforcing policies around the components of the factory itself. The scheduling and orchestration platform should perform admission checks to ensure all such components are trusted and verifiable.

Enforcing policies on the build steps. This typically includes verifying pipeline definitions and all the referenced images to be used during execution.

In order of execution, admission control proceeds as follows:

When admitting the build request, the Admissions Controller validates that steps satisfy defined policies.
When steps that fetch dependencies are executed, the Admission Controller must enforce policies on the dependencies that are sourced into the environment (e.g. source, binary dependencies, base images).
When steps execute user-provided code, the Admission Controller uses a network jail to enforce an “admit nothing” policy because we do not trust that code to self-regulate.
When steps that publish artefacts are executed, they must produce attestations to satisfy the Admission Controllers that may be encountered downstream.

Outside of simple build execution, relevant areas to admission control include:

The components that are “admitted” to the node host environments
Policy enforcement on the build control plane (incl. admission control), which recurses (who watches the watchers?).

In addition to the above inputs, it is assumed that the following checks are being handled when deploying to production.

Security controls for admission controller itself (identity of the controller and validation)
Metadata inputs for different policies
Diff signatures or policies validation (interface with CA’s for validating certs), Notary services
Enforcement points
Interfaces with Signing services/notary service/signature validation services
Mutating the definition of workloads to include additional metadata
Outputs or error messages after enforcement/blocking admission
Signing check as a label that could be used by a workload attestor to grant access to signing keys.

Note: Artefact signatures should be verified against the associated public keys before deployment. Any generated provenance information should also be verified.

The variables - Inputs and Outputs to and from the SSF

Inputs

Source Code

Source code encompasses the human readable representation of applications being built by the Secure Software Factory, associated dependencies being built from source or that are interpreted instead of compiled, code for the build pipelines (Pipeline-as-Code) and infrastructure (Infrastructure-as-Code). Source code is the primary input for the SSF. The users and operators of the SSF must decide what programming languages they support, where to host source code, and what tools to integrate for testing and scanning. The SSF assumes that source code uses version control systems like Git, which have a preserved history, and that the repository has an appropriate regime for review and testing in place that is appropriate for the needs and use cases of the repository. For securing the source code see recommendations that can be found in the “Source Code” section of the Software Supply Chain Best Practices paper.

Software Dependencies

Almost all software depends on other software which needs to be collected prior to building the target software. These dependencies should be validated against a security policy. It is recommended to pin to validate attestations or signatures of any dependencies if available. In addition it is recommended to pin to the checksum of upstream dependencies.

For both security and availability, it’s recommended to maintain a local mirror of any external dependencies. This mirror may be limited to only dependencies that have passed a security scan or trusted source of truth. The mirror also prevents downtime if the upstream repository becomes unavailable.

More recommendations and specifics on securing dependencies can be found in the “Materials” section of the “Software Supply Chain Best Practices” paper.

User Credentials

User credentials are identifiers for both human users and services (e.g. automation agents), and can authenticate these actors at multiple points in the SSF and its supporting services. Credentials should meet baseline security requirements as defined in Software Supply Chain Best Practices.

Cryptographic Material

Cryptographic material input into the SSF fall into two categories:

Materials used for identification of a particular entity.
Materials used for attestation/verification of a particular activity.

The first category includes certificates, tokens, and keys used for authenticating nodes, scheduling and orchestration platforms, workloads, services, and users. It might also include certificates corresponding with recognized Certificate Authorities and trust bundles for validating and cross-authenticating all of these materials.

The second category includes material such as signing keys deployed by users or services to attest to the work they have performed. Unlike traditional signing architectures, the modern software factory doesn’t directly use a single signing key. Multiple signing keys have trust delegated to specific domain, processes/users/services.

All cryptographic material must conform and comply with standards for their type and purpose and are generated in a cryptographically secure manner. We assume that they are securely distributed to the necessary entities and are properly configured for use by those entities. The specific mechanisms for producing, signing, and distributing these certificates will be left to the user to implement.

Pipeline Definitions

CI/CD pipelines define the steps in the application build process. The specific implementation of a pipeline will vary from organization to organization. However, all pipeline definitions should follow security best-practices that include:

Persistence & Source Control: Pipeline definitions should be defined as “code” (Pipeline-as-Code) in a declarative fashion, and as such, should meet all the security expectations for source code defined above. Additionally, pipeline definitions should be managed through a source control process (ie, git) that limits changes to only authorized users following standard protocols (ie, submitting changes via a pull request) and code reviews which include at least one security engineer who is experienced in Continuous Integration (CI) security best practices along with the particular tools being used. Once your pipeline assembly is complete, make sure to persist all relevant artefacts.
**Sign Pipeline Definitions: **Sign your pipeline definitions to ensure non-repudiation. During signing, sign pipeline specifications including all the images used for execution.
**Pipeline Audit: **Perform regular audits of your pipeline definitions to ensure the integrity of the pipeline is maintained.
**Static Scan: **Pipelines typically need access to various user credentials that are provided to the pipeline at runtime (e.g. git-token, OCI-registry-token, etc.). Make sure these credentials are not hard-coded in the definitions. In general, limit the use of hard-coded configurations in the definitions.

Outputs

Artefacts

A software artefact is the principal output of the Secure Software Factory. Artefacts may include binaries, software packages, container images, signatures, and attestations. They are what will be consumed by downstream users. Artefacts should be accompanied by the appropriate metadata to demonstrate their provenance (described below), stored in a secure artefact repository, and distributed through secured and well understood mechanisms. The exact nature of the artefact itself and the implementation of these requirements will vary depending on factors like language, package type, and target platform(s). Therefore, these implementation details are beyond the scope of the Secure Software Factory.

Public Signing Keys

In order to verify the signatures included in a software factory’s metadata, downstream consumers will need the public keys associated with those signatures.² The root certificates may be included as an output from the SSF, but they should be distributed separately from the artefact and the metadata itself to allow additional verification of the certificate authenticity. Certificate chains linking the signing key to a root certificate should be included as an output from the SSF, and they should be distributed with the artefact being signed, allowing verifiers to validate a signature is trusted by an approved root certificate. As these keys should be identical to the cryptographic material used as an input to the pipeline, the security considerations already discussed for cryptographic material as inputs apply.

Metadata Documents

Throughout execution of the pipeline, a number of metadata documents are generated. Examples include test reports, vulnerability reports, and Software Bills of Material (SBOMs). These documents are a snapshot of the build that produced them. For example, a vulnerability report reflects CVEs known at the time of the build, but might become outdated as new vulnerabilities are discovered and shared. Similarly, an SBOM reflects what is in a particular build. It will always be valid for that build, but future builds with slightly different dependencies/version constraints must generate a new/updated SBOM. The following practices are recommended for managing metadata documents:

Timestamp inclusion: Always explicitly include a timestamp associated with the document.³
Persistence: Make sure when stored that documents are immutable, version controlled and signed.
Metadata Links: Link all metadata documents to the final deliverable artefact. For example, for a microservice application build pipeline, link the test, vulnerability, and SBOM record to the particular container image they are generated from.

Secure Software Factory Functionality

This section goes through the primary actions that the SSF performs in normal operation. It describes how a project runs through the SSF and how the SSF helps secure the supply chain by establishing and tracing provenance through the build pipeline.

All Stages: Attesting Identity of Nodes, Pipeline orchestration, Tasks and Workloads and Establishing Provenance

Actors:

Scheduling and Orchestration Platform
Pipeline
Pipeline Observer
Node Attestor
Workload Attestor
Metadata Storage

It is important to call out this sub-action as it happens in most other actions of the SSF. This is the key piece of the SSF in establishing and tracing provenance from source code to artefact of a given project. This provenance can then be used in conjunction with other tooling and auditing to better make claims on the veracity of software.

In general the following is how the action works though there might be a few caveats specified in the other actions:

Initial Setup:

Spin up a node
Node Attestor establishes identity of node.

Action Steps:

Pipeline or Pipeline task is triggered/orchestrated
Workload Attestor establishes identity of Pipeline or task
Pipeline Observer captures metadata for Pipeline or task.
1. This includes inputs, timestamps, outputs, as well as other metadata
Pipeline Observer signs metadata with key or cert based on identity provided by Workload Attestor

All Stages: Admissions Control for the SSF itself

Actors:

Scheduling and Orchestration Platform
Pipeline
Pipeline Observer
Node Attestor
Workload Attestor
Metadata Storage
Admission Controller
Artefact Storage

As noted in the discussion of the Admissions Controller above, both build workers (the containers performing pipeline steps) and intermediate artefacts (the outputs of previous steps passed along to the next steps in a build) should be verified before they are admitted into the SSF. This should be part of every stage in the pipeline.

Stage 1: Secure the data flow in the pipeline

As tasks execute inside a pipeline, they typically produce some new artefacts like an image, binary or evidence report. These artefacts are then consumed by subsequent tasks to perform their respective functions. Such sharing of artefacts between tasks normally achieved through shared storage resources. It is important to regulate access to these shared resources across tasks.

To achieve this objective, avoid using a single storage workspace across all tasks in the pipeline. Create multiple storage workspaces that are exclusively shared between the tasks that need to communicate some data/results. For instance, for a simple pipeline shown below, avoid using a single shared storage for all tasks and use exclusive storage sharing. And when possible set access-policies (RW, RO) while mounting these storage in the tasks.

alt_text

Stage 2: Configuration of Pipeline

Actors:

Developer
Tech Lead
Security Engineer
Scheduling and Orchestration Platform
Pipeline Platform

The primary component configured as part of normal operation of the SSF is the Pipeline. Both creation of a new Pipeline as well as modification of an existing Pipeline have similar modes of operation and so this section represents both.

The secure software factory expects that you store pipeline configuration as code and that the code is stored in a secure source code repository with adequate controls.See both “Source Code” and “Pipeline Definitions” in the inputs section above for more information about the SSF’s expectations regarding both of these types of inputs. The goal of these controls is to make sure that the pipeline definition itself has trustworthy provenance. In a cloud native context, these components are often deployed as containers and treated as artefacts in their own right. Ensuring we have adequate provenance for those components increases our assurance about the provenance of the artefacts they build.

When configuring and designing the pipeline, there consider that:

Individual tasks and steps should have limited in scope and are well defined. sing templates and linting rules during the development of the pipeline itself aids this.
Configuring the pipeline to respond automatically to well-defined triggers in the Software Development Life Cycle.

Stage 3: Trigger Pipeline

Actors:

Developer
Scheduling and Orchestration Platform
Pipeline Platform
Pipeline Observer
Node Attestor
Workload Attestor
Metadata Storage

The first step in the SSF is that something triggers a build. This can be a manual, event-driven, or timed trigger. Common triggers are web hooks and manual triggering through an API call or dashboard.

The SSF secures this by capturing and validating the inputs and other metadata like timestamps through the Pipeline Observer. This is then signed by a key or certificate provided by the Workload Attestor that is associated with the identity of the workload. The Workload Attestor then has its identity attested to by the Node Attestor. This signed metadata is then pushed to Metadata Storage where it becomes a supply chain link that other parts of the SSF can link to and can later be used to validate and audit veracity of the artefact(s) built in the SSF.

Stage 4: Ingest Source for Project

Actors:

Scheduling and Orchestration Platform
Pipeline Platform
Pipeline Observer
Node Attestor
Workload Attestor
Metadata Storage
External Source Code Control

After a build is triggered, the next step is ingesting the code for the project. This is usually something like a call to a source code control system to pull down a specific commit. It then hands the code over to downstream pipeline tasks via shared storage for things like the build stage.

Stage 5: Ingest Dependencies for Project

Actors:

Scheduling and Orchestration Platform
Pipeline Platform
Pipeline Observer
Node Attestor
Workload Attestor
Metadata Storage
Internal/External dependency repos

After ingesting source code, the next step is to download the dependencies for the artefact you are building. This is a separate step from the ingestion of the source for a couple of reasons. In line with the build best practices in this document(reference here) and the CNCF Supply Chain Security whitepaper, the pipeline steps should be kept as minimal and atomic as possible. In the case of this step it allows you to download the source and sign it as a single atomic action. Then you can validate after downloading dependencies that the source code wasn’t changed by a compromised dependency install. Some package managers can run arbitrary execution actions on the system without adequate controls.

Once dependencies are installed on shared storage they are hashed and that metadata is signed and pushed to Metadata Storage.

Stage 6: Run Build for Project

Actors:

Scheduling and Orchestration Platform
Pipeline Platform
Pipeline Observer
Node Attestor
Workload Attestor
Metadata Storage

This is arguably the most critical step of the Pipeline. This step is the one that performs common “build” actions to generate an artefact such as compilation, building an image, etc. The build is a common attack vector in supply chain attacks, therefore it is crucial to keep this step atomic, minimal and, most importantly, hermetic. When available you should strive for reproducible builds.

The build process performs code compilation or transformation (e.g. source code to byte code for compiled languages). Leverage pipeline observers to record the command, options and parameters used during compilation.

Given the need for the build to be hermetic the task running the build should have no network or most other external capabilities and have build parameters pushed at the task level. (Cite build best practice from white paper that explains that the more branching the logic of your build script has the harder it is to reason about what your build is doing.) The only external access the task should have is to shared storage containing the source and dependencies required.The build must write the artefact to new shared storage explicitly for the artefact

After the operation of the build the metadata associated with the build, e.g. input parameters, hash of produced artefact, etc. are signed and pushed to Metadata Storage.

Stage 7: Publish Artefact

Actors:

Scheduling and Orchestration Platform
Pipeline Platform
Pipeline Observer
Node Attestor
Workload Attestor
Metadata Storage
Artefact Storage

In the final build stage, compiled artefacts are packaged into appropriate distribution format (container image, rpm, tar.gz, etc.). As these new artefacts are produced, they should be signed.

Signed artefacts are published to an artefact store, external from the SSF. They are then hashed and signed along with any applicable metadata that can be pulled from the artefact. That signed metadata is then stored in Metadata Storage.

Appendix A: Inputs and Outputs Summary

Inputs

Inputs of the SSF	Assumptions/Recommendations About those Inputs	What We’re Not Specifying in this Version
Source Code	Version controlled with stored history Commits are signed History cannot be overwritten (no force merges) Has an appropriate testing and code review regime in place	Where code is hosted Specific test types or tooling to use
Dependencies	Defined with version and immutable reference (e.g. hash) constraints (ideally) something approximating an SBOM and/or source of provenance Have appropriate update and review procedures in place	Format of SBOM/Provenance for dependencies Types of testing to perform on dependencies Source repositories allowed for dependencies
User Credentials	Users use MFA Users use SSH or PATs for repository access Users have signing certificates	User roles/permissions Key/Certificate Rotation Policy How users are authenticated
Machine/Workload Credentials	Automatically rotated short-lived credentials to identity application services
Signing keys	Meet or exceed current NIST guidelines for the type of key/certificate with regards to length, randomness, etc.	How keys/certificates are generated and by whom? How keys/certificates are distributed and by whom?
Pipeline Definitions	Maintained as Infrastructure-as-Code/Pipeline-as-Code meeting all the above specs for Source Code, Dependencies, User Credentials, etc. Builds task definitions
Build Images	Either bootstrapped or created by the SSF Signatures are verified by the SSF Admission Controller

Outputs

Ouputs of the SSF	Assumptions/Reccomendations About those Inputs	What We’re Not Specifying in this Version
Artefacts (Requires addition)	Includes signed and validated metadata in an appropriate storage mechanism	What storage mechanism to use (unless we find there are really good reasons to recommend one)
Public Signing Keys	Meet or exceed current NIST guidelines for the type of key/certificate with regards to length, randomness, etc.	How keys/certificates are generated and by whom? How keys/certificates are distributed and by whom?
Metadata Documents (Requires addition)
Metadata Chain (Requires addition)

Appendix B: Mapping of entities to projects/technologies

In accordance with CNCF guidelines, we prioritize our recommendations as follows: first, CNCF tools when they fit the need and are of sufficient maturity; second, well known and mature open source tools; and finally, in the absence of either CNCF or open source options, commercial offerings. In the event that we name commercial offerings, the reader should understand that this does not reflect an endorsement by CNCF. Instead, these offerings should be taken merely as an example and point of reference so that you can see potential paths for real world implementation.

Secure Supply Chain Rec Arc Requirement	Reference Architecture Component	Alternate Component(s)
Components
Scheduling and orchestration platform (CRDs + Controllers)	Kubernetes	Nomad, Your own orchestrator
Pipeline Framework And Tooling	Tekton Pipelines
Policy Management Framework	In-toto (and other?) policies distributed via TUF
Identity Attestation for nodes and workloads	SPIRE
Pipeline Observer	Tekton Chains + in-toto	In-toto + Custom Code?
Metadata Storage	OCI registry, Rekor, Docdn	Grafeas
Admission Controller	OPA/Gatekeeper	Kyverno (for some pieces)
Runtime Visibility	Falco	Tracee, openbpf tools (misc)

Appendix C: Best practices x Reference Architecture

Stage	Practice	Categories	Reference Architecture
Securing the Source Code	Verification: Require signed commits	Assurance: Moderate to high Risk: Moderate to high
	Verification: Enforce full attestation and verification for protected branches	Assurance: High Risk: High
	Automation: Prevent committing secrets to the source code repository	Assurance: Moderate to high Risk: Moderate to high
	Automation: Define individuals/teams that are responsible for code in a repository and associated coding conventions	Assurance: High Risk: High
	Automation:Automate software security scanning and testing	Assurance: Moderate to high Risk: Moderate to high
	Controlled Environments: Establish and adhere to contribution policies	Assurance: Moderate to high Risk: Moderate to high
	Controlled Environments: Define roles aligned to functional responsibilities	Assurance: Moderate to high Risk: Moderate to high
	Controlled Environments: Enforce an independent four-eyes principle	Assurance: Moderate to high Risk: Moderate to high
	Controlled Environments: Use branch protection rules	Assurance: Moderate to high Risk: Moderate to high
	Secure Authentication: Enforce MFA for accessing source code repositories	Assurance: Moderate to high Risk: Moderate to high
	Secure Authentication: Use SSH keys to provide developers access to source code repositories	Assurance: Moderate to high Risk: Moderate to high
	Secure Authentication: Have a Key Rotation Policy	Assurance: Moderate to high Risk: Moderate to high
	Secure Authentication: Use short-lived/ephemeral credentials for machine/service access	Assurance: Moderate to high Risk: Moderate to high
Securing the Materials	Verification: Verify third party artefacts and open source libraries	Assurance: Moderate to high Risk: Moderate to high
	Verification:Require SBOM from third party supplier	Assurance: Moderate to high Risk: High
	Verification: Track dependencies between open source components	Assurance: Moderate to high Risk: Moderate to high
	Verification: Build libraries based upon source code	Assurance: High Risk: High
	Verification: Define and prioritize trusted package managers and repositories	Assurance: High Risk: High
	Verification: Generate an immutable SBOM of the code	Assurance: Moderate to high Risk: Moderate to high
	Automation: Scan software for vulnerabilities	Assurance: Moderate to high Risk: Moderate to high
	Automation: Scan software for license implications	Assurance: Moderate to high Risk: Moderate to high
	Automation: Run software composition analysis on ingested software	Assurance: Moderate to high Risk: Moderate to high
Securing the Build Pipelines	Verification: Cryptographically guarantee policy adherence	Assurance: High Risk: High
	Verification: Validate environments and dependencies before usage	Assurance: Moderate to high Risk: Moderate to high
	Verification: Validate runtime security of build workers	Assurance: Moderate to high Risk: Moderate to high
	Verification: Validate Build artefacts through verifiably reproducible builds	Assurance: High Risk: High
	Reproducible Builds: Lock and Verify External Requirements From The Build Process	Assurance: Moderate to high Risk: Moderate to high
	Reproducible Builds: Find and Eliminate Sources Of Non-Determinism	Assurance: Moderate to high Risk: Moderate to high
	Reproducible Builds:.Record The Build Environment	Assurance: High Risk: High
	Reproducible Builds:Automate Creation Of The Build Environment	Assurance: High Risk: High
	Reproducible Builds: Distribute Builds Across Different Infrastructure	Assurance: High Risk: High
	Automation: Build and related continuous integration/continuous delivery steps should all be automated through a pipeline defined as code	Assurance: Moderate to high Risk: Moderate to high
	Automation: Standardize pipelines across projects	Assurance: Moderate to high Risk: Moderate to high
	Automation: Provision a secured orchestration platform to host software factory	Assurance: Moderate to high Risk: Moderate to high
	Automation: Build Workers Should be Single Use	Assurance: High Risk: Moderate
	Controlled Environments: Ensure Software Factory has minimal network connectivity	Assurance: High Risk: High
	Controlled Environments: Segregate the Duties of Each Build Worker	Assurance: High Risk: High
	Controlled Environments: Pass in Build Worker Environment and Commands	Assurance: High Risk: High
	Controlled Environments: Write Output to a Separate Secured Storage Repo	Assurance: High Risk: High
	Secure Authentication/Access: Only allow pipeline modifications through “pipeline as code”	Assurance: Moderate to high Risk: Moderate to high
	Secure Authentication/Access: Define user roles	Assurance: Moderate to high Risk: Moderate to high
	Secure Authentication/Access: Follow established practices for establishing a root of trust from an offline source	Assurance: High Risk: High
	Secure Authentication/Access: Use short-lived Workload Certificates	Assurance: High Risk: High
Securing the Artefacts	Verification: Sign Every Step in the Build Process	Assurance: Moderate to high Risk: Moderate to high
	Verification:Validate the Signatures Generated at Each Step	Assurance: Moderate to high Risk: Moderate to high
	Automation: Use TUF/Notary to manage signing of artefacts	Assurance: Moderate to high Risk: Moderate to high
	Automation: Use a store to manage metadata from in-toto	Assurance: Moderate to high Risk: Moderate to high
	Controlled Environments: Limit which artefacts any given party is authorized to certify	Assurance: High Risk: High
	Controlled Environments:Build in a system for rotating and revoking private keys	Assurance: High Risk: High
	Controlled Environments: Use a container registry that supports OCI image-spec images	Assurance: High Risk: High
	Encryption: Encrypt artefacts before distribution & ensure only authorized platforms have decryption capabilities	Assurance: High Risk: High
Securing Deployments	Verification: Ensure clients can perform Verification of Artefacts and associated metadata	Assurance: Moderate to high Risk: Moderate to high
	Verification: Ensure clients can verify the “freshness” of files	Assurance: Moderate to high Risk: Moderate to high
	Automation: Use The Update Framework	Assurance: High Risk: High

Authors

Aditya Sirish A Yelgundhalli (NYU)

Alexander Floyd Marshall (Raft)

Andres Vega (VMware)

Aradhna Chetal (TIAA)

Axel Simon (Red Hat)

Brandon Lum (Google)

Brandon Mitchell (IBM)

Cole Kennedy (TestifySec)

Dan Papandrea (Sysdig)

Glaicimar Aguiar (Hewlett Packard Enterprise)

Jason Hall (Red Hat)

John Kjell (VMware)

Marina Moore (NYU)

Matt Moore (Chainguard)

Michael Lieberman (Citi)

Priya Wadhwa (Chainguard)

Shripad Nadgowda (IBM T.J. Watson Research Center)

Acknowledgements

The Cloud Native Computing Foundation supported the creation of this reference architecture. As with the “Best Practices for Supply Chain Security”, the authors followed a “collaborative knowledge production” methodology. This effort took place over the span of five months of weekly online meetings. The majority of authors are members of the CNCF Technical Advisory Group for Security, which you can join. Go to the TAG repository site.

This was a remarkable collaboration between large technology companies and startups.

The coordination and facilitation was provided by Andres Vega (VMware), Brandon Lum (Google), Dan “Pop” Papandrea (Sysdig) and Michael Liebermann (Citi).

We’d also like to thank a number of contributors from whom we had excellent input and feedback and as leading practitioners in the field did much of the work that we write about in this document:

Aeva Black

Allan Friedman

Andrew Block

Dan Lorenc

David Wheeler

Ed Warnicke

Emily Fox

Frederick Kautz

Jacques Chester

Jonathan Meadows

Remy Greinhofer

Tiffany Jordan

References:

Notes

https://en.wikipedia.org/wiki/Software_factory ↩︎
By using identity federation, it is possible for verification to be achieved without actual proof of possession of the keys. In cases where this is the method of choice, public signing keys will not need to be provided. ↩︎
Note that for Reproducible Builds, the timestamp may be extra metadata included alongside the document so that the content can be checked for reproducibility. ↩︎

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.