Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

bup-midx(1) [debian man page]

bup-midx(1)						      General Commands Manual						       bup-midx(1)

NAME
bup-midx - create a multi-index (.midx) file from several .idx files SYNOPSIS
bup midx [-o outfile] <-a|-f|idxnames...> DESCRIPTION
bup midx creates a multi-index (.midx) file from one or more git pack index (.idx) files. Note: you should no longer need to run this command by hand. It gets run automatically by bup-save(1) and similar commands. OPTIONS
-o, --output=filename.midx use the given output filename for the .midx file. Default is auto-generated. -a, --auto automatically generate new .midx files for any .idx files where it would be appropriate. -f, --force force generation of a single new .midx file containing all your This will result in the fastest backup performance, but may take a long time to run. --dir=packdir specify the directory containing the .idx/.midx files to work with. The default is $BUP_DIR/objects/pack and $BUP_DIR/indexcache/*. --max-files maximum number of .idx files to open at a time. You can use this if you have an especially small number of file descriptors avail- able, so that midx can complete (though possibly non-optimally) even if it can't open all your .idx files at once. The default value of this option should be fine for most people. --check validate a .midx file by ensuring that all objects in its contained .idx files exist inside the .midx. May be useful for debugging. EXAMPLE
$ bup midx -a Merging 21 indexes (2278559 objects). Table size: 524288 (17 bits) Reading indexes: 100.00% (2278559/2278559), done. midx-b66d7c9afc4396187218f2936a87b865cf342672.midx DISCUSSION
By default, bup uses git-formatted pack files, which consist of a pack file (containing objects) and an idx file (containing a sorted list of object names and their offsets in the .pack file). Normal idx files are convenient because it means you can use git(1) to access your backup datasets. However, idx files can get slow when you have a lot of very large packs (which git typically doesn't have, but bup often does). bup .midx files consist of a single sorted list of all the objects contained in all the .pack files it references. This list can be binary searched in about log2(m) steps, where m is the total number of objects. To further speed up the search, midx files also have a variable-sized fanout table that reduces the first n steps of the binary search. With the help of this fanout table, bup can narrow down which page of the midx file a given object id would be in (if it exists) with a single lookup. Thus, typical searches will only need to swap in two pages: one for the fanout table, and one for the object id. midx files are most useful when creating new backups, since searching for a nonexistent object in the repository necessarily requires searching through all the index files to ensure that it does not exist. (Searching for objects that do exist can be optimized; for exam- ple, consecutive objects are often stored in the same pack, so we can search that one first using an MRU algorithm.) SEE ALSO
bup-save(1), bup-margin(1), bup-memtest(1) BUP
Part of the bup(1) suite. AUTHORS
Avery Pennarun <apenwarr@gmail.com>. Bup unknown- bup-midx(1)

Check Out this Related Man Page

bup-margin(1)						      General Commands Manual						     bup-margin(1)

NAME
bup-margin - figure out your deduplication safety margin SYNOPSIS
bup margin [options...] DESCRIPTION
bup margin iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two entries. This number, n, identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids. For example, one system that was tested had a collection of 11 million objects (70 GB), and bup margin returned 45. That means a 46-bit hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by its first 46 bits. The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits, that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it's theoretically possible to use many more bits with far fewer objects. If you're paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running bup margin occasionally to see if you're getting dangerously close to 160 bits. OPTIONS
--predict Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer from the guess. This is potentially useful for tuning an interpolation search algorithm. --ignore-midx don't use .midx files, use only .idx files. This is only really useful when used with --predict. EXAMPLE
$ bup margin Reading indexes: 100.00% (1612581/1612581), done. 40 40 matching prefix bits 1.94 bits per doubling 120 bits (61.86 doublings) remaining 4.19338e+18 times larger is possible Everyone on earth could have 625878182 data sets like yours, all in one repository, and we would expect 1 object collision. $ bup margin --predict PackIdxList: using 1 index. Reading indexes: 100.00% (1612581/1612581), done. 915 of 1612581 (0.057%) SEE ALSO
bup-midx(1), bup-save(1) BUP
Part of the bup(1) suite. AUTHORS
Avery Pennarun <apenwarr@gmail.com>. Bup unknown- bup-margin(1)
Man Page