htdig(1) General Commands Manual htdig(1)NAME
htstat - returns statistics on the document and word databases, much like the -s option to htdig or htmerge.
SYNOPSIS
htstat [-v][-a][-c configfile][-u]
DESCRIPTION
Htdig retrieves HTML documents using the HTTP protocol and gathers information from these documents which can later be used to search these
documents. This program can be referred to as the search robot.
OPTIONS -a Use alternate work files. Tells htstat to append .work to database files, causing a second copy of the database to be built. This
allows the original files to be used by htsearch during the run.
-c configfile
Use the specified configfile instead of the default.
-u Give a list of URLs in the document database.
-v Verbose mode. This increases the verbosity of the program. Using more than 2 is probably only useful for debugging purposes. The
default verbose mode (using only one -v) gives a nice progress report while digging.
FILES
/etc/htdig/htdig.conf
The default configuration file.
SEE ALSO
Please refer to the HTML pages (in the htdig-doc package) /usr/share/doc/htdig-doc/html/index.html and the manual pages htdigconfig(8) ,
htdig(1) and htmerge(1) for a detailed description of ht://Dig and its commands.
AUTHOR
This manual page was written by Robert Ribnitz, based on the HTML documentation of ht://Dig.
January 2004 htdig(1)
Check Out this Related Man Page
htmerge(1) General Commands Manual htmerge(1)NAME
htmerge - create document index and word database for the ht://Dig search engine
SYNOPSIS
htmerge [options]
DESCRIPTION
Htmerge is used to create a document index and word database from the files that were created by htdig. These databases are then used by
htsearch to perform the actual searched.
OPTIONS -a Use alternate work files. Tells htdig to append .work to database files, causing a second copy of the database to be built. This
allows the original files to be used by htsearch during the indexing run.
-c configfile
Use the specified configfile instead of the default.
-d Prevent the document index from being created.
-s Print statistics about the document and word databases after htmerge has finished.
-v Run in verbose mode. This will provide some hints as to the progress of the merge. This can be useful when running htmerge interac-
tively since some parts (especially the word database creation) can take a very long time.
-w Prevent the word database from being created.
ENVIRONMENT
TMPDIR In addition to the command line options, the environment variable TMPDIR will be used to designate the directory where intermediate
files are stored during the sorting process.
FILES
/etc/htdig/htdig.conf
The default configuration file.
SEE ALSO
Please refer to the HTML pages (in the htdig-doc package) /usr/share/doc/htdig-doc/html/index.html and the manual pages htdig(1) and
htsearch(1) for a detailed description of ht://Dig and its commands.
AUTHOR
This manual page was written by Christian Schwarz, modified by Stijn de Bekker, based on the HTML documentation of ht://Dig.
21 July 1997 htmerge(1)
Unix based fix-it needed?
Platform and feature: search programs on Apple computers (Leopard or Tiger; 10.4 and above; Spotlight)
Problem: the document search feature of these programs produce hits when keyword(s) used appear anywhere in the document's content.
Change required: we need to... (1 Reply)
Could do with some help on where to get started really. If anyone could help me it would be greatly appreciated.
I have been working on this for a while now and I don't really know where to start.
I am looking into creating a script that will process website hit files and output statistical... (1 Reply)
I have over 10m documents that I want to search through against a list of know keywords, however the documents were produced using a technique that isn't perfect in how the data was presented.
Is there a fuzzy keyword search available in Linux or can anyone think of a way of doing it that isn't... (5 Replies)
I have a folder will a lot of documents (pdf, xls, doc etc.) which users have uploaded but only 20% of them are currently linking from my html files. So my goal is to copy only the files which are linked in my html files from my Document directory into another directory.
Eg: My documents exist... (5 Replies)
Hi All,
Sorry for long topic here.
So the drill goes like that, I need a script which gathers different values from different files/locations.
A_CT=`cat a.dat | awk -F'|' '{print $1}' >> report.txt`
B_CT=`cat b.dat | awk -F'|' '{print $3}' >> report.txt`
C_CT=`cat c.dat | awk -F'|'... (4 Replies)