HTML::TreeBuilder::LibXML(3pm) User Contributed Perl Documentation HTML::TreeBuilder::LibXML(3pm)NAME
HTML::TreeBuilder::LibXML - HTML::TreeBuilder and XPath compatible interface with libxml
SYNOPSIS
use HTML::TreeBuilder::LibXML;
my $tree = HTML::TreeBuilder::LibXML->new;
$tree->parse($html);
$tree->eof;
# $tree and $node compatible to HTML::Element
my @nodes = $tree->findvalue($xpath);
for my $node (@nodes) {
print $node->tag;
my %attr = $node->all_external_attr;
}
HTML::TreeBuilder::LibXML->replace_original(); # replace HTML::TreeBuilder::XPath->new
DESCRIPTION
HTML::TreeBuilder::XPath is libxml based compatible interface to HTML::TreeBuilder, which could be slow for a large document.
HTML::TreeBuilder::LibXML is drop-in-replacement for HTML::TreeBuilder::XPath.
This module doesn't implement all of HTML::TreeBuilder and HTML::Element APIs, but enough methods are defined so modules like Web::Scraper
work.
BENCHMARK
This is a benchmark result by tools/benchmark.pl
Web::Scraper: 0.26
HTML::TreeBuilder::XPath: 0.09
HTML::TreeBuilder::LibXML: 0.01_01
Rate no_libxml use_libxml
no_libxml 5.45/s -- -94%
use_libxml 94.3/s 1632% --
AUTHOR
Tokuhiro Matsuno <tokuhirom slkjfd gmail.com>
Tatsuhiko Miyagawa <miyagawa@cpan.org>
Masahiro Chiba
THANKS TO
woremacx++ http://d.hatena.ne.jp/woremacx/20080202/1201927162
id:dailyflower
SEE ALSO
HTML::TreeBuilder, HTML::TreeBuilder::XPath
LICENSE
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
perl v5.14.2 2012-04-02 HTML::TreeBuilder::LibXML(3pm)
Check Out this Related Man Page
WWW::Mechanize::TreeBuilder(3pm) User Contributed Perl Documentation WWW::Mechanize::TreeBuilder(3pm)NAME
WWW::Mechanize::TreeBuilder - Module to optimize WWW::Mechanize and HTML::TreeBuilder use
SYNOPSIS
use Test::More tests => 2;
use Test::WWW::Mechanize;
use WWW::Mechanize::TreeBuilder;
# or
# use WWW::Mechanize;
# or
# use Test::WWW::Mechanize::Catalyst 'MyApp';
my $mech = Test::WWW::Mechanize->new;
# or
#my $mech = Test::WWW::Mechanize::Catalyst->new;
# etc. etc.
WWW::Mechanize::TreeBuilder->meta->apply($mech);
$mech->get_ok('/');
is( $mech->look_down(_tag => 'p')->as_trimmed_text, 'Some text', 'It worked' );
DESCRIPTION
This module combines WWW::Mechanize and HTML::TreeBuilder. Why? Because I've seen too much code like the following:
like($mech->content, qr{<p>some text</p>}, "Found the right tag");
Which is just all flavours of wrong - its akin to processing XML with regexps. Instead, do it like the following:
ok($mech->look_down(_tag => 'p', sub { $_[0]->as_trimmed_text eq 'some text' })
The anon-sub there is a bit icky, but this means that anyone should happen to add attributes to the "<p>" tag (such as an id or a class) it
will still work and find the right tag.
All of the methods available on HTML::Element (that aren't 'private' - i.e. that don't begin with an underscore) such as "look_down" or
"find" are automatically delegated to "$mech->tree" through the magic of Moose.
METHODS
Everything in WWW::Mechanize (or which ever sub class you apply it to) and all public methods from HTML::Element except those where
WWW::Mechanize and HTML::Element overlap. In the case where the two classes both define a method, the one from WWW::Mechanize will be used
(so that the existing behaviour of Mechanize doesn't break.)
USING XPATH OR OTHER SUBCLASSES
HTML::TreeBuilder::XPath allows you to use use xpath selectors to select elements in the tree. You can use that module by providing
parameters to the moose role:
with 'WWW::Mechanize::TreeBuilder' => {
tree_class => 'HTML::TreeBuilder::XPath'
};
# or
# NOTE: No hashref using this method
WWW::Mechanize::TreeBuilder->meta->apply($mech,
tree_class => 'HTML::TreeBuilder::XPath';
);
and class will be automatically loaded for you. This class will be used to construct the tree in the following manner:
$tree = $tree_class->new_from_content($req->decoded_content)->elementify;
You can also specify a "element_class" parameter which is the (HTML::Element sub)class that methods are proxied from. This module provides
defaults for element_class when "tree_class" is "HTML::TreeBuilder" or "HTML::TreeBuilder::XPath" - it will warn otherwise.
AUTHOR
Ash Berlin "<ash@cpan.org>"
LICENSE
Same as Perl 5.8, or at your option any later version of Perl.
perl v5.10.1 2010-12-16 WWW::Mechanize::TreeBuilder(3pm)