xml::rsslite(3pm) [debian man page]

RSSLite(3pm)						User Contributed Perl Documentation					      RSSLite(3pm)

NAME

       XML::RSSLite - lightweight, "relaxed" RSS (and XML-ish) parser

SYNOPSIS

	 use XML::RSSLite;

	 . . .

	 parseRSS(\%result, $content);

	 print "=== Channel ===
",
	       "Title: $result{'title'}
",
	       "Desc:  $result{'description'}
",
	       "Link:  $result{'link'}

";

	 foreach $item (@{$result{'item'}}) {
	 print "  --- Item ---
",
	       "  Title: $item->{'title'}
",
	       "  Desc:  $item->{'description'}
",
	       "  Link:  $item->{'link'}

";
	 }

DESCRIPTION

       This module attempts to extract the maximum amount of content from available documents, and is less concerned with XML compliance than
       alternatives. Rather than rely on XML::Parser, it uses heuristics and good old-fashioned Perl regular expressions. It stores the data in a
       simple hash structure, and "aliases" certain tags so that when done, you can count on having the minimal data necessary for re-constructing
       a valid RSS file. This means you get the basic title, description, and link for a channel and its items.

       This module extracts more usable links by parsing "scriptingNews" and "weblog" formats in addition to RDF & RSS. It also "sanitizes" the
       output for best results. The munging includes:

       Remove html tags to leave plain text
       Remove characters other than 0-9~!@#$%^&*()-+=a-zA-Z[];',.:"<>?s
       Remove leading whitespace from URIs
       Use <url> tags when <link> is empty
       Use misplaced urls in <title> when <link> is empty
       Exract links from <a href=...> if required
       Limit links to ftp and http(s)
       Join relative item urls (beginning with / or #) to the site base

   EXPORT
       parseRSS($outHashRef, $inScalarRef)
	   $inScalarRef is a reference to a scalar containing the document to be parsed, the contents will effectively be destroyed. $outHashRef
	   is a reference to the hash within which to store the parsed content.

   EXPORTABLE
       parseXML(\%parsedTree, $parseThis, 'topTag', $comments);
	   parsedTree - required
	       Reference to hash to store the parsed document within.

	   parseThis  - required
	       Reference to scalar containing the document to parse.

	   topTag     - optional
	       Tag to consider the root node, leaving this undefined is not recommended.

	   comments   - optional
	       false will remove contents from parseThis
	       true will not remove comments from parseThis
	       array reference is true, comments are stored here

   CAVEATS
       This is not a conforming parser. It does not handle the following

       o

	     <foo bar=">">

       o

	     <foo><bar> <bar></bar> <bar></bar> </bar></foo>

       o

	     <![CDATA[ ]]>

       o

	     PI

       It's non-validating, without a DTD the following cannot be properly addressed

       entities
       namespaces
	   This may or may not be arriving in some future release.

SEE ALSO

       perl(1), "XML::RSS", "XML::SAX::PurePerl", "XML::Parser::Lite", <XML::Parser>

AUTHOR

       Jerrad Pierce <jpierce@cpan.org>.

       Scott Thomason <scott@thomasons.org>

LICENSE

       Portions Copyright (c) 2002,2003,2009 Jerrad Pierce, (c) 2000 Scott Thomason.  All rights reserved. This program is free software; you can
       redistribute it and/or modify it under the same terms as Perl itself.

perl v5.10.1							    2010-10-25							      RSSLite(3pm)
Linux and UNIX Man Pages

xml::rsslite(3pm) [debian man page]