Performance boost for RDF::RDFa::Parser with xml catalog file
Neubert Joachim
J.Neubert at zbw.eu
Tue Dec 8 23:20:57 CET 2009
Using locally stored copies of the 28 (!) XHTML+RDFa DTD files and a XML catalog file boosts RDF::RDFa::Parser performance for parsing a tiny XHTML file from ~ 13 sec to less than 1 sec.
Step by step:
1) download http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd and all module files included to e.g. /usr/share/xml/xhtml-rdfa-1
2) add line
<rewriteURI uriStartString="http://www.w3.org/MarkUp/DTD/" rewritePrefix="file:///usr/share/xml/xhtml-rdfa-1/"/>
to /etc/catalog (CentOS/Redhat, may differ for other distributions - have a look at http://xmlsoft.org/catalog.html)
3) create a XML::LibXML parser instance which uses the catalog, parse xhtml into a dom model, and feed this into a RDF::RDFa::Parser instance, e.g.
my $xml_parser = XML::LibXML->new;
$xml_parser->validation(0);
$xml_parser->recover(1);
$xml_parser->load_catalog('/etc/xml/catalog');
$log->debug('XML parser created');
foreach my $lang (@OUTPUT_LANG) {
my $fn = "$dir/about.$lang.html";
my $xhtml = read_file($fn);
$log->debug('file read');
my $dom = $xml_parser->parse_string($xhtml);
$log->debug('XML parsed');
my $parser = RDF::RDFa::Parser->new( $dom, 'http://dummy.org/', undef, $rdf );
$log->debug('new parser built');
$parser->consume();
$log->debug('rdfa consumed');
}
Just a faint remembrance and sudden inspiration while washing the dishes ...
Cheers, Joachim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.perlrdf.org/pipermail/dev/attachments/20091208/baf5e2c7/attachment.htm>
More information about the Dev
mailing list