ANN: Four modules for RDFa and HTML
Toby Inkster
mail at tobyinkster.co.uk
Thu Dec 3 16:45:22 CET 2009
At lunch time today I released the following modules to CPAN:
- HTML::HTML5::Parser 0.01
- HTML::HTML5::Sanity 0.01
- RDF::RDFa::Parser 0.22
- RDF::RDFa::Parser::Redland 0.22
HTML::HTML5::Parser is a parser for tag soup HTML, which builds an
XML::LibXML::Document object as its output. It uses the HTML5 parsing
algorithm, so the DOM it builds should be more consistent with the DOMs
built by real-world browsers than the other CPAN tag soup parsers. It's
based on the non-CPAN Whatpm::HTML package.
HTML::HTML5::Sanity is used to correct certain oddities with the HTML
DOM produced by HTML::HTML5::Parser. The Parser is "correct" in parsing,
say, an attribute like xml:lang="en" as if it were an attribute called
"xml:lang" in the null namespace - however, the more expected behaviour
is for it to be parsed as an attribute called "lang" in the XML
namespace. Sanity makes the DOM less insane.
RDF::RDFa::Parser and RDF::RDFa::Parser::Redland should hopefully
already be familiar to you. In 0.22, I've added a few more optional
features, like the ability to turn off case-sensitivity for CURIE
prefixes (useful for dealing with HTML, where attribute names are not
case-sensitive). One interesting feature is Auto Config: support for an
HTML <meta> tag that allows the page itself to say which optional
features should be used.
Auto Config is documented here:
http://search.cpan.org/~tobyink/RDF-RDFa-Parser-0.22/lib/RDF/RDFa/Parser.pm#AUTO_CONFIG
An example script using HTML::HTML5::Parser, HTML::HTML5::Sanity and
RDF::RDFa::Parser together:
http://goddamn.co.uk/viewvc/perlmods/rdfa2nt.pl
--
Toby A Inkster
<mailto:mail at tobyinkster.co.uk>
<http://tobyinkster.co.uk>
More information about the Dev
mailing list