A Conversation for Website Developer's Forum

some basic xml stuff

Post 1

Dogster

I'm trying to redesign a site I'm running using XML, for reference the site is http://www.fcbob.demon.co.uk/takeissue . What I want is: (1) no duplication of data, (2) quick and easy to change the look of the entire site without having to edit each file. On other sites I've made, I've either used includes or html preprocessors to get this, but as an interesting exercise I'd like to have a go at doing it with XML but I seem to be having some difficulty.

The site is a magazine, and I need (1) a current issue page giving links and brief extracts from each article, (2) a previous issues page giving links to all the articles, (3) a page for each article.

I'm pretty happy with (3), I can write a whole load of XML files for each article and have an XSL that works for each of them. Is there any way of writing an XML or XSL file that will extract information from a list of xml files? For example, in the previous issues page I'd like something that goes through all the article xml files and extracts their titles and authors to put into a list.

I think XInclude might do what I want, but I'm not sure how to turn an XML file with XIncludes into one without, which is definitely necessary because I don't think any browsers support XIncludes yet, nor do most of the parsers (true?). Properly implemented XIncludes would be great though, I could do things like automatically extract the first paragraph of an article to be the extract on the front page using, I think, XPointers.

The best solution I've been able to come up with so far, which is not particularly nice, is to have an articlename.xml and an articlename-summary.xml. articlename.xml includes articlename-summary.xml using external entities in the DTD, articlename-summary.xml includes things like the author, title, date and so forth. Finally, index.xml includes all of the articlename-summary.xml files using the external entity DTD thing.

Any suggestions?


some basic xml stuff

Post 2

Ion the Naysayer

I agree with you that XInclude (or even better - XLink) would do a great job but I'm not currently aware of any working implementations.

My suggestion for extracting specific parts of an XML file would be to use Perl, PHP, etc. to pull out and recombine the pieces you want. XML in Perl is relatively painless as there are LOTS of modules for it. I can't say for other languages because Perl is what I know best at the moment.

If you want to avoid duplication, treat your XML files like tables in a database and normalise them. Have an XML file with a list of authors, each with a unique id and look them up as you need them. You could do the same with titles but it doesn't make as much sense to me as having the title in the article. That would mean extracting it.

To go off on a tangent, have you thought of using a full-blown database instead? Just about any large site uses a database. Any XML system you build would be performing the same function so why reinvent the wheel? I know there are people who will disagree with me but all those huge sites must be doing it for a reason.

Food for thought, anyway.

I have some experience with XML (and SQL) so if you get stuck further along just drop me a note. smiley - smiley


some basic xml stuff

Post 3

Dogster

Thanks Ion,

I'd kind of hoped to avoid doing any programming. Any suggestions on where to start with XML in Perl? I've got ActivePerl 5.6 which comes with XML::Parser and XML::Parser::expat, unfortunately the docs are somewhat obscure.

A database would be fine, but probably overkill for what I'm aiming at. I know almost nothing about databases, so that would be quite a bit of work. It would be useful to know more about them though.


some basic xml stuff

Post 4

Ion the Naysayer

I'm afraid that with XML there isn't much hope of avoiding programming for the moment - it's not mature enough yet. XInclude and XLink don't have a single full implementation yet (not that I know of, anyway).

XML::Parser is "the old way" - it's mostly been replaced by XML::SAX (Simple API for XML). Try XML::Simple to start. XML::Twig is another good place to start. If neither of those does what you need it to, you'll probably have to take the plunge into SAX. You'll probably want to find yourself an XML book - if you're going with Perl I'd suggest "XML and Perl" from O'Reilly. I found it helpful, though a bit vague at times.

Just remember, inside every small project is a large project struggling to get out smiley - winkeye. It's best to think about scalability right at the start. Where is your site hosted? (i.e. how difficult is it going to be to get a DB set up?)


some basic xml stuff

Post 5

Dogster

OK, I'll download those Perl modules and see if the library has a copy of the XML/Perl book. At the moment, the site is hosted at Demon personal homepages thingy, which means no server side stuff, so everything has to be compiled into HTML files before uploading. However, one of my reasons for wanting to convert everything into XML is the possibility that at some point I will move it somewhere better. If I write XML pages for each article, it would presumably be quite simple to turn those into a DB (don't some DBs actually use XML as a native format even?) at a later point if necessary. Actually, one of my main interests is in just learning how to do this stuff rather than this particular site, which will never be particularly massive.


some basic xml stuff

Post 6

Ion the Naysayer

Moving from XML to a Database system in Perl is trivial - DBI is your friend.

I'm not sure that any databases currently use XML as a native format but I'm pretty sure I've seen an XML export feature.

*nod* XML is good to know.


Key: Complain about this post

Write an Entry

"The Hitchhiker's Guide to the Galaxy is a wholly remarkable book. It has been compiled and recompiled many times and under many different editorships. It contains contributions from countless numbers of travellers and researchers."

Write an entry
Read more