MKDoc::XML(3pm) User Contributed Perl Documentation MKDoc::XML(3pm)NAME
MKDoc::XML - The MKDoc XML Toolkit
SYNOPSIS
This is an article, not a module.
SUMMARY
MKDoc is a web content management system written in Perl which focuses on standards compliance, accessiblity and usability issues, and
multi-lingual websites.
At MKDoc Ltd we have decided to gradually break up our existing commercial software into a collection of completely independent, well-
documented, well-tested open-source CPAN modules.
Ultimately we want MKDoc code to be a coherent collection of module distributions, yet each distribution should be usable and useful in
itself.
MKDoc::XML is part of this effort.
You could help us and turn some of MKDoc's code into a CPAN module. You can take a look at the existing code at
http://download.mkdoc.org/.
If you are interested in some functionality which you would like to see as a standalone CPAN module, send an email to
<mkdoc-modules@lists.webarch.co.uk>.
DISCLAIMER
MKDoc::XML is a low level XML library.
MKDoc::XML::* modules do not make sure your XML is well-formed.
MKDoc::XML::* modules can be used to work with somehow broken XML.
MKDoc::XML::* modules should not be used as high-level parsers with general purpose XML unless you know what you're doing.
WHAT'S IN THE BOX
XML tokenizer
MKDoc::XML::Tokenizer splits your XML / XHTML files into a list of MKDoc::XML::Token objects using a single regex.
XML tree builder
MKDoc::XML::TreeBuilder sits on top of MKDoc::XML::Tokenizer and builds parsed trees out of your XML / XHTML data.
XML stripper
MKDoc::XML::Stripper objects removes unwanted markup from your XML / HTML data. Useful to remove all those nasty presentational tags or
'style' attributes from your XHTML data for example.
XML tagger
MKDoc::XML::Tagger module matches expressions in XML / XHTML documents and tag them appropriately. For example, you could automatically
hyperlink certain glossary words or add <abbr> tags based on a dictionary of abbreviations and acronyms.
XML entity decoder
MKDoc::XML::Decode is a pluggable, configurable entity expander module which currently supports html entities, numerical entities and basic
xml entities.
XML entity encoder
MKDoc::XML::Encode does the exact reverse operation as MKDoc::XML::Decode.
XML Dumper
MKDoc::XML::Dumper serializes arbitrarily complex perl structures into XML strings. It is also able of doing the reverse operation, i.e.
deserializing an XML string into a perl structure.
AUTHOR
Copyright 2003 - MKDoc Holdings Ltd.
Author: Jean-Michel Hiver
This module is free software and is distributed under the same license as Perl itself. Use it at your own risk.
SEE ALSO
Petal: http://search.cpan.org/dist/Petal/
MKDoc: http://www.mkdoc.com/
Help us open-source MKDoc. Join the mkdoc-modules mailing list:
mkdoc-modules@lists.webarch.co.uk
perl v5.10.1 2005-03-10 MKDoc::XML(3pm)
Check Out this Related Man Page
MKDoc::XML::Token(3pm) User Contributed Perl Documentation MKDoc::XML::Token(3pm)NAME
MKDoc::XML::Token - XML Token Object
SYNOPSIS
my $tokens = MKDoc::XML::Tokenizer->process_data ($some_xml);
foreach my $token (@{$tokens})
{
print "'" . $token->as_string() . "' is text
" if (defined $token->text());
print "'" . $token->as_string() . "' is a self closing tag
" if (defined $token->tag_self_close());
print "'" . $token->as_string() . "' is an opening tag
" if (defined $token->tag_open());
print "'" . $token->as_string() . "' is a closing tag
" if (defined $token->tag_close());
print "'" . $token->as_string() . "' is a processing instruction
" if (defined $token->pi());
print "'" . $token->as_string() . "' is a declaration
" if (defined $token->declaration());
print "'" . $token->as_string() . "' is a comment
" if (defined $token->comment());
print "'" . $token->as_string() . "' is a tag
" if (defined $token->tag());
print "'" . $token->as_string() . "' is a pseudo-tag (NOT text and NOT tag)
" if (defined $token->pseudotag());
print "'" . $token->as_string() . "' is a leaf token (NOT opening tag)
" if (defined $token->leaf());
}
SUMMARY
MKDoc::XML::Token is an object representing an XML token produced by MKDoc::XML::Tokenizer.
It has a set of methods to identify the type of token it is, as well as to help building a parsed tree as in MKDoc::XML::TreeBuilder.
API
my $token = new MKDoc::XML::Token ($string_token);
Constructs a new MKDoc::XML::Token object.
my $string_token = $token->as_string();
Returns the string representation of this token so that:
MKDoc::XML::Token->new ($token)->as_string eq $token
is a tautology.
my $node = $token->leaf();
If this token is not an opening tag, this method will return its corresponding node structure as returned by $token->text(),
$token->tag_self_close(), etc.
Returns undef otherwise.
my $node = $token->pseudotag();
If this token is a comment, declaration or processing instruction, this method will return $token->tag_comment(), $token_declaration() or
$token->pi() resp.
Returns undef otherwise.
my $node = $token->tag();
If this token is an opening, closing, or self closing tag, this method will return $token->tag_open(), $token->tag_close() or
$token->tag_self_close() resp.
Returns undef otherwise.
my $node = $token->comment();
If this token object represents a declaration, the following structure is returned:
# this is <!-- I like Pie. Pie is good -->
{
_tag => '~comment',
text => ' I like Pie. Pie is good ',
}
Returns undef otherwise.
my $node = $token->declaration();
If this token object represents a declaration, the following structure is returned:
# this is <!DOCTYPE foo>
{
_tag => '~declaration',
text => 'DOCTYPE foo',
}
Returns undef otherwise.
my $node = $token->pi();
If this token object represents a processing instruction, the following structure is returned:
# this is <?xml version="1.0" charset="UTF-8"?>
{
_tag => '~pi',
text => 'xml version="1.0" charset="UTF-8"',
}
Returns undef otherwise.
my $node = $token->tag_open();
If this token object represents an opening tag, the following structure is returned:
# this is <aTag foo="bar" baz="buz">
{
_tag => 'aTag',
_open => 1,
_close => 0,
foo => 'bar',
baz => 'buz',
}
Returns undef otherwise.
my $node = $token->tag_close();
If this token object represents a closing tag, the following structure is returned:
# this is </aTag>
{
_tag => 'aTag',
_open => 0,
_close => 1,
}
Returns undef otherwise.
my $node = $token->tag_self_close();
If this token object represents a self-closing tag, the following structure is returned:
# this is <aTag foo="bar" baz="buz" />
{
_tag => 'aTag',
_open => 1,
_close => 1,
foo => 'bar',
baz => 'buz',
}
Returns undef otherwise.
my $node = $token->text();
If this token object represents a piece of text, then this text is returned. Returns undef otherwise. TRAP! $token->text() returns a false
value if this text happens to be '0' or ''. So really you should use:
if (defined $token->text()) {
... do stuff...
}
NOTES
MKDoc::XML::Token works with MKDoc::XML::Tokenizer, which can be used when building a full tree is not necessary. If you need to build a
tree, look at MKDoc::XML::TreeBuilder.
AUTHOR
Copyright 2003 - MKDoc Holdings Ltd.
Author: Jean-Michel Hiver
This module is free software and is distributed under the same license as Perl itself. Use it at your own risk.
SEE ALSO
MKDoc::XML::Tokenizer MKDoc::XML::TreeBuilder
perl v5.10.1 2004-10-06 MKDoc::XML::Token(3pm)