Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

html2stx(1) [debian man page]

html2stx(1)						      General Commands Manual						       html2stx(1)

NAME
html2stx - convert HTML documents into Stx SYNOPSIS
html2stx [ file ] DESCRIPTION
html2stx takes the given file, which should contain an HTML document, and converts it to structured text (Stx). If no file is given, stan- dard input is read instead. The program does not attempt to convert every possibly convertible piece of markup into Stx. For example, <font> tags are simply ignored. This tends to result in a nice, clean, beautiful document. (If it doesn't, the source document probably does not contain enough informa- tion to start with.) OPTIONS
None. DIAGNOSTICS
html2stx is a python script and will throw an exception if something goes amiss. In this case, the return value will be non-zero. SEE ALSO
stx2any (1), Stx-ref.html BUGS
o The word wrapping algorithm is probably not very clever. o Sometimes there are extra linebreaks in the output. o Probably many others. AUTHOR
This manual page was written by Panu A. Kalliokoski. html2stx is derived from the html2text utility by Aaron Swartz. html2text is a utility for converting html into "Markdown" structured text; the changes required to make it work for Stx were done by Panu Kalliokoski. Panu A. Kalliokoski html2stx(1)

Check Out this Related Man Page

gather_stx_titles(1)					      General Commands Manual					      gather_stx_titles(1)

NAME
gather_stx_titles - gather title declarations from Stx documents SYNOPSIS
gather_stx_titles [ -f from-suffix ] [ -t to-suffix ] [ m4 options ] file [ file ... ] DESCRIPTION
gather_stx_titles digs out Stx metadata declarations from the listed files, and dumps the title and document ID information as m4 defini- tions into standard output. This information can later be used by w_crosslink to link the documents by their metadata. Why is this useful? Well, imagine that you have a large site with a lot of cross-linking. A document's name will appear in many places: in the link menu (if you have one), and in the body of different pages where it is cross-linked from. gather_stx_titles lets you put all the information in one place and where it belongs, i.e. the file itself. You'll be glad if you did, when the time comes to change document titles or move the documents around; especially so if your website has multilingual magic. OPTIONS
gather_stx_titles uses m4 internally and will accept any option m4 accepts. In addition to those, it takes the following options: -f from-suffix In the filename data, substitute away the suffix from-suffix. Actually, from_suffix may be a regular expression; stupid but true, in GNU m4 it is a "traditional" regexp, whereas in BSD m4 it is an "extended" regexp. Default to no suffix (nothing to take away). -t to-suffix In the filename data, substitute the suffix taken away by from-suffix with to-suffix. If from-suffix is nil (the default), append to-suffix to all filenames. -p prefix Strip away the prefix given by (regular expression) prefix from filenames. The equivalent of -t for this does not exist, because you can specify a directory prefix to w_crosslink by w_base. --version, -V Just show version information and exit. --help, -? Just show a short help message and exit. EXAMPLES
I guess most of the time you will want to automate the use of gather_stx_titles, for example with a Makefile like this: SOURCES = $(wildcard *.stx) TARGETS = $(SOURCES:.stx=.html) all: $(TARGETS) titles.m4: $(SOURCES) gather_stx_titles -f stx -t html $^ > $@ %.html: %.stx titles.m4 stx2any -T html titles.m4 $< > $@ If you don't want to be quite so correct, drop the .html dependency on titles.m4 or titles.m4 dependency on SOURCES. Using temporary files is not necessary: this should also work: $ gather_stx_titles *.stx | stx2any - mydoc.stx SEE ALSO
stx2any (1). AUTHOR
This page is written by Panu A. Kalliokoski. Panu A. Kalliokoski gather_stx_titles(1)
Man Page