RDF::RDFa::Generator question

Toby Inkster mail at tobyinkster.co.uk
Tue Jun 22 09:45:45 CEST 2010


On Mon, 21 Jun 2010 21:41:20 -0400
Gregory Williams <greg at evilfunhouse.com> wrote:

> I'm not super familiar with the XML::LibXML API. I've tried both
> load_xml and load_html to parse my HTML template (I understand these
> were just introduced in XML::LibXML 1.70, but guess they're
> convenience functions for already existing functionality?). load_xml
> is taking 7 seconds to parse an 800 byte template file (any idea what
> it could be doing?). load_html takes a fraction of a second, and I'd
> like to go with that, but RDF::RDFa::Generator doesn't seem to be
> able to inject content into html that was parsed with load_html.

I'm only using XML::LibXML 1.69, so not familiar with load_html and
load_xml. I would guess that the reason load_xml is significantly
slower is that it's fetching DTDs. If that's the case, then you ought to
be able to speed it up by using a DTD catalogue (i.e. local copies of
the DTDs).

Without being able to see what load_html does, I'm guessing it handles
namespaces differently to how RDF::RDFa::Generator expects them.
Assuming that $dom is your DOM document, what does the following output?

	print $dom->documentElement->namespaceURI;

Should be 'http://www.w3.org/1999/xhtml', though I'm guessing it's
something else, perhaps undef.

You could try using HTML::HTML5::Parser to parse the HTML.

	use HTML::HTML5::Parser;
	my $dom = HTML::HTML5::Parser->new->parse_string($html);

Perhaps also of interest is HTML::HTML5::Writer, which outputs HTML
from a DOM. Using XML::LibXML's built-in toString methods is probably
not safe if you're intending to send the data as 'text/html'.

-- 
Toby A Inkster
<mailto:mail at tobyinkster.co.uk>
<http://tobyinkster.co.uk>



More information about the Dev mailing list