Siegfried development benchmarks
Sun, 06 Nov 2022 16:53:56 UTC
Environment
These benchmarks were automatically run on a m3.small.x86 machine provisioned from https://www.packet.net/.
Specs for the m3.small.x86: 8 cores @ 2.8 GHz, 64GB RAM, 960 GB SSD.
You can inspect the commands that were run to generate these benchmarks here.
Tool | Version |
---|---|
master | siegfried 1.9.6 /root/siegfried/default.sig (2022-11-06T17:44:52+01:00) identifiers: - pronom: DROID_SignatureFile_V109.xml; container-signature-20221102.xml |
develop | siegfried 1.9.6 /root/siegfried/dev.sig (2022-11-06T16:53:56Z) identifiers: - pronom: DROID_SignatureFile_V109.xml; container-signature-20221102.xml |
iPRES Systems Showcase
A corpus created for the 2014 iPRES conference comprising 2,206 files (5GB). Represents a range of formats, including AV and some uncommon types. Sourced from http://www.webarchive.org.uk/datasets/ipres.ds.1/
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 8.463717426s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 8.577763438s |
The tools differed in output for 0 files in the corpus.
PRONOM files
A corpus created by Greg Lepore and comprising 1,205 files (2.1GB). Includes a single sample of as many of the PRONOM IDs (PUIDs) that Greg could find.
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 2.239392571s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 2.162862555s |
The tools differed in output for 0 files in the corpus.
Govdocs (Selected)
A selection from the Govdocs1 corpus comprising 26,124 files (31.4GB). Represents typical office formats, including approx. 15,000 PDFs. Originally sourced from http://openpreservation.org/blog/2012/07/26/1-million-21000-reducing-govdocs-significantly/
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 1m28.387702183s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 1m28.236516045s |
The tools differed in output for 0 files in the corpus.
The Deluxe
This benchmark checks multi-ID identification using the deluxe.sig signature file which contains four identifiers: PRONOM, LOC FDDs, freedesktop.org and tika-mimetypes. This benchmark is run against the PRONOM files corpus.
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 7.982655336s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 7.940861595s |
The tools differed in output for 0 files in the corpus.
Unzipping
This benchmark checks the `sf -z` command (scans within zip files and other container formats) when run against the iPres corpus.
Tool | Description | Duration |
---|---|---|
master | Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. | 23.498679577s |
develop | Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. | 23.596123203s |
The tools differed in output for 0 files in the corpus.