Siegfried development benchmarks
Tue, 01 Feb 2022 21:33:38 UTC
Environment
These benchmarks were run on a t1.small.x86 machine that was automatically provisioned.
Specs for the t1.small.x86: 4 Physical Cores @ 2.4 GHz; 8 GB DDR3 RAM; 80 GB SSD.
You can inspect the commands that were run to generate these benchmarks here.
Tool | Version |
---|---|
master | siegfried 1.9.1 /root/siegfried/default.sig (2020-10-06T19:13:40+02:00) identifiers: - pronom: DROID_SignatureFile_V97.xml; container-signature-20201001.xml |
develop | siegfried 1.9.2 /root/siegfried/dev.sig (2022-02-01T21:33:38Z) identifiers: - pronom: DROID_SignatureFile_V97.xml; container-signature-20201001.xml |
iPRES Systems Showcase
A corpus created for the 2014 iPRES conference comprising 2,206 files (5GB). Represents a range of formats, including AV and some uncommon types. Sourced from http://www.webarchive.org.uk/datasets/ipres.ds.1/.
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 28.984159686s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 29.165599577s |
The tools differed in output for 0 files in the corpus.
PRONOM files
A corpus created by Greg Lepore and comprising 1,205 files (2.1GB). Includes a single sample of as many of the PRONOM IDs (PUIDs) that Greg could find.
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 5.773934041s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 5.777038989s |
The tools differed in output for 0 files in the corpus.
Govdocs (Selected)
A selection from the Govdocs1 corpus comprising 26,124 files (31.4GB). Represents typical office formats, including approx. 15,000 PDFs. Originally sourced from http://openpreservation.org/blog/2012/07/26/1-million-21000-reducing-govdocs-significantly/.
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 5m59.678424618s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 6m1.913498474s |
The tools differed in output for 0 files in the corpus.
The Deluxe
This benchmark checks multi-ID identification using the deluxe.sig signature file which contains four identifiers: PRONOM, LOC FDDs, freedesktop.org and tika-mimetypes. This benchmark is run against the PRONOM files corpus.
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 11.644131386s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 11.648313624s |
The tools differed in output for 0 files in the corpus.
Unzipping
This benchmark checks the `sf -z` command (scans within zip files and other container formats) when run against the iPres corpus.
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 1m19.624492509s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 1m19.062288777s |
The tools differed in output for 0 files in the corpus.