Siegfried development benchmarks
Sat, 11 Mar 2023 16:48:08 UTC
Environment
These benchmarks were run on a m3.small.x86 machine that was automatically provisioned.
Specs for the m3.small.x86: 8 cores @ 2.8 GHz, 64GB RAM, 960 GB SSD.
You can inspect the commands that were run to generate these benchmarks here.
Tool | Version |
---|---|
master | siegfried 1.9.6 /root/siegfried/default.sig (2022-11-06T17:44:52+01:00) identifiers: - pronom: DROID_SignatureFile_V109.xml; container-signature-20221102.xml |
develop | siegfried 1.9.6 /root/siegfried/dev.sig (2023-03-11T16:48:09Z) identifiers: - pronom: DROID_SignatureFile_V109.xml; container-signature-20221102.xml |
iPRES Systems Showcase
A corpus created for the 2014 iPRES conference comprising 2,206 files (5GB). Represents a range of formats, including AV and some uncommon types. Sourced from http://www.webarchive.org.uk/datasets/ipres.ds.1/.
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 8.523974401s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 6.1491848s |
The tools differed in output for 0 files in the corpus.
PRONOM files
A corpus created by Greg Lepore and comprising 1,205 files (2.1GB). Includes a single sample of as many of the PRONOM IDs (PUIDs) that Greg could find.
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 2.4849831s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 2.81459307s |
The tools differed in output for 0 files in the corpus.
Govdocs (Selected)
A selection from the Govdocs1 corpus comprising 26,124 files (31.4GB). Represents typical office formats, including approx. 15,000 PDFs. Originally sourced from http://openpreservation.org/blog/2012/07/26/1-million-21000-reducing-govdocs-significantly/.
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 1m38.255584323s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 1m21.271394891s |
The tools differed in output for 0 files in the corpus.
The Deluxe
This benchmark checks multi-ID identification using the deluxe.sig signature file which contains four identifiers: PRONOM, LOC FDDs, freedesktop.org and tika-mimetypes. This benchmark is run against the PRONOM files corpus.
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 8.164152783s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 4.291704763s |
The tools differed in output for 0 files in the corpus.
Unzipping
This benchmark checks the `sf -z` command (scans within zip files and other container formats) when run against the iPres corpus.
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 24.56926464s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 19.338825617s |
The tools differed in output for 0 files in the corpus.