Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

xml::rsslite(3pm) [debian man page]

RSSLite(3pm)						User Contributed Perl Documentation					      RSSLite(3pm)

NAME
XML::RSSLite - lightweight, "relaxed" RSS (and XML-ish) parser SYNOPSIS
use XML::RSSLite; . . . parseRSS(\%result, $content); print "=== Channel === ", "Title: $result{'title'} ", "Desc: $result{'description'} ", "Link: $result{'link'} "; foreach $item (@{$result{'item'}}) { print " --- Item --- ", " Title: $item->{'title'} ", " Desc: $item->{'description'} ", " Link: $item->{'link'} "; } DESCRIPTION
This module attempts to extract the maximum amount of content from available documents, and is less concerned with XML compliance than alternatives. Rather than rely on XML::Parser, it uses heuristics and good old-fashioned Perl regular expressions. It stores the data in a simple hash structure, and "aliases" certain tags so that when done, you can count on having the minimal data necessary for re-constructing a valid RSS file. This means you get the basic title, description, and link for a channel and its items. This module extracts more usable links by parsing "scriptingNews" and "weblog" formats in addition to RDF & RSS. It also "sanitizes" the output for best results. The munging includes: Remove html tags to leave plain text Remove characters other than 0-9~!@#$%^&*()-+=a-zA-Z[];',.:"<>?s Remove leading whitespace from URIs Use <url> tags when <link> is empty Use misplaced urls in <title> when <link> is empty Exract links from <a href=...> if required Limit links to ftp and http(s) Join relative item urls (beginning with / or #) to the site base EXPORT parseRSS($outHashRef, $inScalarRef) $inScalarRef is a reference to a scalar containing the document to be parsed, the contents will effectively be destroyed. $outHashRef is a reference to the hash within which to store the parsed content. EXPORTABLE parseXML(\%parsedTree, $parseThis, 'topTag', $comments); parsedTree - required Reference to hash to store the parsed document within. parseThis - required Reference to scalar containing the document to parse. topTag - optional Tag to consider the root node, leaving this undefined is not recommended. comments - optional false will remove contents from parseThis true will not remove comments from parseThis array reference is true, comments are stored here CAVEATS This is not a conforming parser. It does not handle the following o <foo bar=">"> o <foo><bar> <bar></bar> <bar></bar> </bar></foo> o <![CDATA[ ]]> o PI It's non-validating, without a DTD the following cannot be properly addressed entities namespaces This may or may not be arriving in some future release. SEE ALSO
perl(1), "XML::RSS", "XML::SAX::PurePerl", "XML::Parser::Lite", <XML::Parser> AUTHOR
Jerrad Pierce <jpierce@cpan.org>. Scott Thomason <scott@thomasons.org> LICENSE
Portions Copyright (c) 2002,2003,2009 Jerrad Pierce, (c) 2000 Scott Thomason. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.10.1 2010-10-25 RSSLite(3pm)
Man Page