Back to siegfried

Siegfried development benchmarks

Mon, 27 Aug 2018 06:10:52 UTC

Environment

These benchmarks were automatically run on a t1.small.x86 machine provisioned from https://www.packet.net/.

Specs for the t1.small.x86: 4 Physical Cores @ 2.4 GHz; 8 GB DDR3 RAM; 80 GB SSD.

You can inspect the commands that were run to generate these benchmarks here.

iPRES Systems Showcase

A corpus created for the 2014 iPRES conference comprising 2,206 files (5GB). Represents a range of formats, including AV and some uncommon types. Sourced from http://www.webarchive.org.uk/datasets/ipres.ds.1/

Results

Tool Description Duration
master Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. 30.252042862s
develop Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. 30.14638458s

The tools differed in output for 2 files in the corpus.

filemasterdevelop
/root/corpora/ipres-systems-showcase-files/EXPENSES.XLSfmt/473fmt/56
/root/corpora/ipres-systems-showcase-files/XYMISCx-fmt/111UNKNOWN

Raw output

PRONOM files

A corpus created by Greg Lepore and comprising 1,205 files (2.1GB). Includes a single sample of as many of the PRONOM IDs (PUIDs) that Greg could find.

Results

Tool Description Duration
master Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. 3.539713984s
develop Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. 3.690308102s

The tools differed in output for 1 files in the corpus.

filemasterdevelop
/root/corpora/pronom-files/fmt_128_OpenDocument Text_de.qwerkop.www_projects_mspace_doku.sdwfmt/136fmt/128

Raw output

Govdocs (Selected)

A selection from the Govdocs1 corpus comprising 26,124 files (31.4GB). Represents typical office formats, including approx. 15,000 PDFs. Originally sourced from http://openpreservation.org/blog/2012/07/26/1-million-21000-reducing-govdocs-significantly/

Results

Tool Description Duration
master Master branch of github.com/richardlehane/siegfried. Corresponds to latest production release. 3m11.555133027s
develop Develop branch of github.com/richardlehane/siegfried. Tip of development and potentially unstable. 5m37.181790494s

The tools differed in output for 23 files in the corpus.

filemasterdevelop
/root/corpora/govdocs-selected/DOC_49/892946.unkx-fmt/111UNKNOWN
/root/corpora/govdocs-selected/DOC_87/312134.txtx-fmt/111UNKNOWN
/root/corpora/govdocs-selected/HTML_143/558860.htmlx-fmt/394fmt/99
/root/corpora/govdocs-selected/PDF_1562/661753.pdffmt/134fmt/15
/root/corpora/govdocs-selected/PDF_1616/553738.pdffmt/134fmt/18
/root/corpora/govdocs-selected/PDF_1631/825237.pdffmt/134fmt/17
/root/corpora/govdocs-selected/PDF_169/915424.pdffmt/134fmt/17
/root/corpora/govdocs-selected/PDF_246/960747.pdffmt/134fmt/16
/root/corpora/govdocs-selected/PDF_3230/900709.pdffmt/134fmt/18
/root/corpora/govdocs-selected/PDF_608/113057.pdffmt/134fmt/19
/root/corpora/govdocs-selected/SWF_12/554614.swffmt/134fmt/505
/root/corpora/govdocs-selected/SWF_12/628689.swffmt/134fmt/505
/root/corpora/govdocs-selected/TEXT_11/051294.txtx-fmt/111UNKNOWN
/root/corpora/govdocs-selected/TEXT_123/269384.txtx-fmt/111UNKNOWN
/root/corpora/govdocs-selected/TEXT_18/899924.txtx-fmt/111UNKNOWN
/root/corpora/govdocs-selected/TEXT_190/228439.unkx-fmt/111UNKNOWN
/root/corpora/govdocs-selected/TEXT_202/261899.textx-fmt/111UNKNOWN
/root/corpora/govdocs-selected/TEXT_27/439105.unkx-fmt/111UNKNOWN
/root/corpora/govdocs-selected/TEXT_82/146173.txtx-fmt/111UNKNOWN
/root/corpora/govdocs-selected/TEXT_82/405424.txtx-fmt/111UNKNOWN
/root/corpora/govdocs-selected/TEXT_82/697585.txtx-fmt/111UNKNOWN
/root/corpora/govdocs-selected/TXT_18/607431.unkx-fmt/111UNKNOWN
/root/corpora/govdocs-selected/__1/617678.unkx-fmt/111UNKNOWN

Raw output

Profile

profiler information for siegfried development branch

History

2018-10-10 11:24:35 +0000 UTC

2018-09-19 01:50:01 +0000 UTC

2018-08-30 07:36:32 +0000 UTC

2018-08-27 06:10:52 +0000 UTC

2018-08-21 06:00:13 +0000 UTC

2018-07-27 05:54:19 +0000 UTC