RSSLite(3pm) User Contributed Perl Documentation RSSLite(3pm)
NAME
XML::RSSLite - lightweight, "relaxed" RSS (and XML-ish) parser
SYNOPSIS
use XML::RSSLite;
. . .
parseRSS(\%result, $content);
print "=== Channel ===
",
"Title: $result{'title'}
",
"Desc: $result{'description'}
",
"Link: $result{'link'}
";
foreach $item (@{$result{'item'}}) {
print " --- Item ---
",
" Title: $item->{'title'}
",
" Desc: $item->{'description'}
",
" Link: $item->{'link'}
";
}
DESCRIPTION
This module attempts to extract the maximum amount of content from available documents, and is less concerned with XML compliance than
alternatives. Rather than rely on XML::Parser, it uses heuristics and good old-fashioned Perl regular expressions. It stores the data in a
simple hash structure, and "aliases" certain tags so that when done, you can count on having the minimal data necessary for re-constructing
a valid RSS file. This means you get the basic title, description, and link for a channel and its items.
This module extracts more usable links by parsing "scriptingNews" and "weblog" formats in addition to RDF & RSS. It also "sanitizes" the
output for best results. The munging includes:
Remove html tags to leave plain text
Remove characters other than 0-9~!@#$%^&*()-+=a-zA-Z[];',.:"<>?s
Remove leading whitespace from URIs
Use <url> tags when <link> is empty
Use misplaced urls in <title> when <link> is empty
Exract links from <a href=...> if required
Limit links to ftp and http(s)
Join relative item urls (beginning with / or #) to the site base
EXPORT
parseRSS($outHashRef, $inScalarRef)
$inScalarRef is a reference to a scalar containing the document to be parsed, the contents will effectively be destroyed. $outHashRef
is a reference to the hash within which to store the parsed content.
EXPORTABLE
parseXML(\%parsedTree, $parseThis, 'topTag', $comments);
parsedTree - required
Reference to hash to store the parsed document within.
parseThis - required
Reference to scalar containing the document to parse.
topTag - optional
Tag to consider the root node, leaving this undefined is not recommended.
comments - optional
false will remove contents from parseThis
true will not remove comments from parseThis
array reference is true, comments are stored here
CAVEATS
This is not a conforming parser. It does not handle the following
o
<foo bar=">">
o
<foo><bar> <bar></bar> <bar></bar> </bar></foo>
o
<![CDATA[ ]]>
o
PI
It's non-validating, without a DTD the following cannot be properly addressed
entities
namespaces
This may or may not be arriving in some future release.
SEE ALSO
perl(1), "XML::RSS", "XML::SAX::PurePerl", "XML::Parser::Lite", <XML::Parser>
AUTHOR
Jerrad Pierce <jpierce@cpan.org>.
Scott Thomason <scott@thomasons.org>
LICENSE
Portions Copyright (c) 2002,2003,2009 Jerrad Pierce, (c) 2000 Scott Thomason. All rights reserved. This program is free software; you can
redistribute it and/or modify it under the same terms as Perl itself.
perl v5.10.1 2010-10-25 RSSLite(3pm)