Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

html::parse(3) [suse man page]

HTML::Parse(3)						User Contributed Perl Documentation					    HTML::Parse(3)

NAME
HTML::Parse - Deprecated, a wrapper around HTML::TreeBuilder SYNOPSIS
See the documentation for HTML::TreeBuilder DESCRIPTION
Disclaimer: This module is provided only for backwards compatibility with earlier versions of this library. New code should not use this module, and should really use the HTML::Parser and HTML::TreeBuilder modules directly, instead. The "HTML::Parse" module provides functions to parse HTML documents. There are two functions exported by this module: parse_html($html) or parse_html($html, $obj) This function is really just a synonym for $obj->parse($html) and $obj is assumed to be a subclass of "HTML::Parser". Refer to HTML::Parser for more documentation. If $obj is not specified, the $obj will default to an internally created new "HTML::TreeBuilder" object configured with strict_comment() turned on. That class implements a parser that builds (and is) a HTML syntax tree with HTML::Element objects as nodes. The return value from parse_html() is $obj. parse_htmlfile($file, [$obj]) Same as parse_html(), but pulls the HTML to parse, from the named file. Returns "undef" if the file could not be opened, or $obj otherwise. When a "HTML::TreeBuilder" object is created, the following variables control how parsing takes place: $HTML::Parse::IMPLICIT_TAGS Setting this variable to true will instruct the parser to try to deduce implicit elements and implicit end tags. If this variable is false you get a parse tree that just reflects the text as it stands. Might be useful for quick & dirty parsing. Default is true. Implicit elements have the implicit() attribute set. $HTML::Parse::IGNORE_UNKNOWN This variable contols whether unknow tags should be represented as elements in the parse tree. Default is true. $HTML::Parse::IGNORE_TEXT Do not represent the text content of elements. This saves space if all you want is to examine the structure of the document. Default is false. $HTML::Parse::WARN Call warn() with an apropriate message for syntax errors. Default is false. REMEMBER! HTML::TreeBuilder objects should be explicitly destroyed when you're finished with them. See HTML::TreeBuilder. SEE ALSO
HTML::Parser, HTML::TreeBuilder, HTML::Element COPYRIGHT
Copyright 1995-1998 Gisle Aas, 1999-2004 Sean M. Burke, 2005 Andy Lester, 2006 Pete Krawczyk. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. AUTHOR
Currently maintained by Pete Krawczyk "<petek@cpan.org>" Original authors: Gisle Aas, Sean Burke and Andy Lester. perl v5.12.1 2006-08-06 HTML::Parse(3)

Check Out this Related Man Page

XML::TreeBuilder(3)					User Contributed Perl Documentation				       XML::TreeBuilder(3)

NAME
XML::TreeBuilder - Parser that builds a tree of XML::Element objects SYNOPSIS
foreach my $file_name (@ARGV) { my $tree = XML::TreeBuilder->new({ 'NoExpand' => 0, 'ErrorContext' => 0 }); # empty tree $tree->parse_file($file_name); print "Hey, here's a dump of the parse tree of $file_name: "; $tree->dump; # a method we inherit from XML::Element print "And here it is, bizarrely rerendered as XML: ", $tree->as_XML, " "; # Now that we're done with it, we must destroy it. $tree = $tree->delete; } DESCRIPTION
This module uses XML::Parser to make XML document trees constructed of XML::Element objects (and XML::Element is a subclass of HTML::Element adapted for XML). XML::TreeBuilder is meant particularly for people who are used to the HTML::TreeBuilder / HTML::Element interface to document trees, and who don't want to learn some other document interface like XML::Twig or XML::DOM. The way to use this class is to: 1. start a new (empty) XML::TreeBuilder object. 2. set any of the "store" options you want. 3. then parse the document from a source by calling "$x->parsefile(...)" or "$x->parse(...)" (See XML::Parser docs for the options that these two methods take) 4. do whatever you need to do with the syntax tree, presumably involving traversing it looking for some bit of information in it, 5. and finally, when you're done with the tree, call $tree->delete to erase the contents of the tree from memory. This kind of thing usually isn't necessary with most Perl objects, but it's necessary for TreeBuilder objects. See HTML::Element for a more verbose explanation of why this is the case. METHODS AND ATTRIBUTES
XML::TreeBuilder is a subclass of XML::Element, which in turn is a subclass of HTML:Element. You should read and understand the documentation for those two modules. An XML::TreeBuilder object is just a special XML::Element object that allows you to call these additional methods: $root = XML::TreeBuilder->new() Construct a new XML::TreeBuilder object. Parameters: NoExpand Passed to XML::Parser. Do not Expand external entities. Deafult: undef ErrorContext Passed to XML::Parser. Number of context lines to generate on errors. Deafult: undef $root->eof Deletes parser object. $root->parse(...options...) Uses XML::Parser's "parse" method to parse XML from the source(s?) specified by the options. See XML::Parse $root->parsefile(...options...) Uses XML::Parser's "parsefile" method to parse XML from the source(s?) specified by the options. See XML::Parse $root->parse_file(...options...) Simply an alias for "parsefile". $root->store_comments(value) This determines whether TreeBuilder will normally store comments found while parsing content into $root. Currently, this is off by default. $root->store_declarations(value) This determines whether TreeBuilder will normally store markup declarations found while parsing content into $root. Currently, this is off by default. $root->store_pis(value) This determines whether TreeBuilder will normally store processing instructions found while parsing content into $root. Currently, this is off (false) by default. $root->store_cdata(value) This determines whether TreeBuilder will normally store CDATA sectitons found while parsing content into $root. Adds a ~cdata node. Currently, this is off (false) by default. SEE ALSO
XML::Parser, XML::Element, HTML::TreeBuilder, HTML::DOMbo. And for alternate XML document interfaces, XML::DOM and XML::Twig. COPYRIGHT AND DISCLAIMERS
Copyright (c) 2000,2004 Sean M. Burke. All rights reserved. This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. AUTHOR
Current Author: Jeff Fearn <jfearn@cpan.org>. Former Authors: Sean M. Burke, <sburke@cpan.org> perl v5.16.3 2014-06-09 XML::TreeBuilder(3)
Man Page