Siegfried development benchmarks
Wed, 09 Sep 2020 03:41:50 UTC
Environment
These benchmarks were run on a t1.small.x86 machine that was automatically provisioned.
Specs for the t1.small.x86: 4 Physical Cores @ 2.4 GHz; 8 GB DDR3 RAM; 80 GB SSD.
You can inspect the commands that were run to generate these benchmarks here.
Tool | Version |
---|---|
master | siegfried 1.8.0 /root/siegfried/default.sig (2020-01-21T23:30:42+01:00) identifiers: - pronom: DROID_SignatureFile_V96.xml; container-signature-20200121.xml |
develop | siegfried 1.8.0 /root/siegfried/dev.sig (2020-09-09T03:41:51Z) identifiers: - pronom: DROID_SignatureFile_V96.xml; container-signature-20200121.xml |
The Deluxe
This benchmark checks multi-ID identification using the deluxe.sig signature file which contains four identifiers: PRONOM, LOC FDDs, freedesktop.org and tika-mimetypes. This benchmark is run against the PRONOM files corpus.
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 0s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 27.880927276s |
One or more of the tools failed, so a comparison is not possible.
iPRES Systems Showcase
A corpus created for the 2014 iPRES conference comprising 2,206 files (5GB). Represents a range of formats, including AV and some uncommon types. Sourced from http://www.webarchive.org.uk/datasets/ipres.ds.1/.
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 1m13.654002064s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 1m13.866229481s |
The tools differed in output for 0 files in the corpus.
PRONOM files
A corpus created by Greg Lepore and comprising 1,205 files (2.1GB). Includes a single sample of as many of the PRONOM IDs (PUIDs) that Greg could find.
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 12.658955217s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 13.055577196s |
The tools differed in output for 2 files in the corpus.
Govdocs (Selected)
A selection from the Govdocs1 corpus comprising 26,124 files (31.4GB). Represents typical office formats, including approx. 15,000 PDFs. Originally sourced from http://openpreservation.org/blog/2012/07/26/1-million-21000-reducing-govdocs-significantly/.
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 11m44.813255981s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 11m36.053702237s |
The tools differed in output for 0 files in the corpus.
Unzipping
This benchmark checks the `sf -z` command (scans within zip files and other container formats) when run against the iPres corpus.
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 3m12.300940258s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 3m11.380243752s |
The tools differed in output for 0 files in the corpus.