Siegfried development benchmarks
Fri, 04 Feb 2022 10:02:54 UTC
Environment
These benchmarks were run on a t1.small.x86 machine that was automatically provisioned.
Specs for the t1.small.x86: 4 Physical Cores @ 2.4 GHz; 8 GB DDR3 RAM; 80 GB SSD.
You can inspect the commands that were run to generate these benchmarks here.
Tool | Version |
---|---|
master | siegfried 1.9.1 /root/siegfried/default.sig (2020-10-06T19:13:40+02:00) identifiers: - pronom: DROID_SignatureFile_V97.xml; container-signature-20201001.xml |
develop | siegfried 1.9.2 /root/siegfried/dev.sig (2022-02-04T10:02:54Z) identifiers: - pronom: DROID_SignatureFile_V97.xml; container-signature-20201001.xml |
iPRES Systems Showcase
A corpus created for the 2014 iPRES conference comprising 2,206 files (5GB). Represents a range of formats, including AV and some uncommon types. Sourced from http://www.webarchive.org.uk/datasets/ipres.ds.1/.
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 29.183533158s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 29.141738423s |
The tools differed in output for 0 files in the corpus.
PRONOM files
A corpus created by Greg Lepore and comprising 1,205 files (2.1GB). Includes a single sample of as many of the PRONOM IDs (PUIDs) that Greg could find.
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 5.359023773s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 5.386734792s |
The tools differed in output for 0 files in the corpus.
Govdocs (Selected)
A selection from the Govdocs1 corpus comprising 26,124 files (31.4GB). Represents typical office formats, including approx. 15,000 PDFs. Originally sourced from http://openpreservation.org/blog/2012/07/26/1-million-21000-reducing-govdocs-significantly/.
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 5m14.416048978s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 5m15.48152952s |
The tools differed in output for 0 files in the corpus.
The Deluxe
This benchmark checks multi-ID identification using the deluxe.sig signature file which contains four identifiers: PRONOM, LOC FDDs, freedesktop.org and tika-mimetypes. This benchmark is run against the PRONOM files corpus.
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 11.706162498s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 11.66826506s |
The tools differed in output for 0 files in the corpus.
Unzipping
This benchmark checks the `sf -z` command (scans within zip files and other container formats) when run against the iPres corpus.
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 1m17.988521224s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 1m16.179767265s |
The tools differed in output for 0 files in the corpus.