Reproducible Research with Verdifax
Bind a cryptographic seal to your computational result and to the declared environment fingerprint that produced it. Anyone, reviewer, regulator, replicator, can independently recompute the seal and detect any tampering, omission, or drift.
What this is for
A growing share of scientific, regulatory, and AI-governance work
depends on a piece of code, run by a specific person, on a specific
machine, producing a specific number. The standard answers to "can I
trust this result?", a Docker image, a pip freeze, an RStudio
session info appendix, are useful but unsealed. None of them tell a
third party: "this number sealed to this cryptographic hash, the
producer declared this environment, and the transparency log
recorded it at this time."
Verdifax fills that gap with a small, technology-neutral primitive: a manifest hash that binds your declared reproducibility context, runtime version, pinned dependencies, git SHA, declared random seeds, platform, and optional container image hash, into the audit bundle as Category 6.
When the orchestrator runs the same payload twice, the manifest hash is byte-identical. When it doesn't, the diff tells you which field moved. On Rekor-anchored deployments, every attestation is published to the Sigstore transparency log within seconds.
Who this is for
- Academic researchers publishing computational results that reviewers or replicators will need to verify months or years later.
- Regulated-research teams (clinical trials, FDA submissions, EU AI Act conformity assessments) where the audit trail must cryptographically link a model's output to a declared environment.
- Reproducibility-focused organizations, research labs, journals, preprint servers, registries, that want a third-party-verifiable proof artifact attached to each computational result.
- Engineering teams in finance, healthcare, or governed AI who need to retire "trust me, it ran the same" and replace it with "here is the manifest hash; recompute it yourself."
The two-language stack
We ship official SDKs in the two languages that cover the overwhelming majority of computational research:
| Python | R | |
|---|---|---|
| Install | pip install verdifax | remotes::install_github("Verdifax/verdifax-sdk-r") |
| Client | VerdifaxClient() | verdifax_client() |
| Capture environment | capture_environment() | verdifax_capture_environment() |
| Attest a result | client.attest(...) | verdifax_attest(client, ...) |
| Prove determinism | verify_determinism(...) | verdifax_verify_determinism(client, ...) |
| Reference workflow | reproducible_research.ipynb | reproducible-research.Rmd |
Both SDKs implement the same wire protocol against the same orchestrator, so a Python-authored attestation can be independently re-verified by an R-using auditor and vice versa. The manifest hash is the lingua franca.
What capture_environment records
| Field | Python source | R source |
|---|---|---|
runtime_name | hardcoded "python" | hardcoded "R" |
runtime_version | sys.version_info | R.version$major.minor |
pinned_dependencies | importlib.metadata (name + version, sorted) | installed.packages() (name + version, sorted) |
git_commit_sha | git rev-parse HEAD (best-effort) | git rev-parse HEAD (best-effort) |
random_seeds | caller-supplied dict, sorted | caller-supplied list, sorted |
platform | platform.system() / platform.machine() → GOOS/GOARCH form | Sys.info() → GOOS/GOARCH form |
container_image_hash | /proc/self/cgroup on Linux (best-effort) | /proc/self/cgroup on Linux (best-effort) |
Every field is optional. Auto-detection failures silently leave the
field null. The orchestrator records null as "not declared"
rather than fabricating a claim, there is no path by which the
orchestrator invents an environment fingerprint on the producer's
behalf.
A 30-second example (Python)
import verdifax
from verdifax.research import capture_environment, verify_determinism
client = verdifax.VerdifaxClient() # env: VERDIFAX_API_URL, VERDIFAX_API_KEY
# 1. Declare what you ran.
ctx = capture_environment(declared_seeds={"numpy": 42, "torch": 1337})
# 2. Attest a result with the environment bound in.
receipt = client.attest(
payload="my-analysis-output",
program_id="0" * 64,
route_id="paper-figure-3",
registry_record_hash="0" * 64,
reproducibility_context=ctx,
)
print("ManifestHash:", receipt.manifest_hash)
# 3. Prove the pipeline is deterministic.
det = verify_determinism(
client=client,
payload="my-analysis-output",
program_id="0" * 64,
route_id="paper-figure-3-verify",
registry_record_hash="0" * 64,
reproducibility_context=ctx,
)
assert det.deterministic, det.diff.differing_fields
The R equivalent reads identically; see the R Markdown template for the full version.
What the determinism check answers
verify_determinism runs your payload through the pipeline twice
and compares the resulting canonical manifest hashes. The top-level
deterministic flag is grounded on manifest-hash equality, the
seal of the pipeline output.
Bundle-hash differences, when surfaced in diff.differing_fields,
indicate server-observed timing variation (e.g. LatencyTotalMs,
EnvSnapshot.Time) and are labeled informational. They never cause
deterministic to flip false on their own.
In production end-to-end testing, both Python and R SDKs have
demonstrated deterministic == true with byte-identical manifest
hashes across replays.
What to do with the manifest hash
- Cite it. Treat the manifest hash like a DOI for the computational result, short, unambiguous, machine-checkable.
- Pin it in supplementary materials. Reviewers and replicators can recompute the hash if they have your inputs and declared environment.
- Bind it to the artifact. Sign your dataset or paper PDF over the manifest hash so the link between paper and result is non-repudiable.
- Anchor it. On Rekor-anchored deployments, every attestation publishes to the Sigstore transparency log. Any subsequent attempt to alter or omit the record is detectable.
Reference workflows
We provide runnable, end-to-end demos in both ecosystems:
- Jupyter notebook:
verdifax-sdk-python/examples/reproducible_research.ipynb, fits a logistic regression with a fixed seed, declares the environment, attests the result, verifies determinism, and explains what to do with the resulting manifest hash. - R Markdown template:
verdifax-sdk-r/examples/reproducible-research.Rmd, fits a GLM with the same narrative arc. Renders to a single self-contained HTML file you can link as supplementary material.
Related reading
- What the Manifest Hash proves, the cryptographic primitive that backs every attestation.
- Independent verification, how a third party recomputes the manifest hash without trusting Verdifax's servers.
- EU AI Act, Article 13, why declared reproducibility context maps directly to transparency and record- keeping obligations.
