A Conversation for Website Developer's Forum

xmlspy 5.0 problem

Post 1

xyroth

I thought I ought to start moving to xml, so I installed this from the pcplus march cover disk.

it says that it can easily convert the html into xml with associated xslt's to convert it back to html, but I can't seem to find any good info on how to do it.

anyone got any ideas?


xmlspy 5.0 problem

Post 2

Ion the Naysayer

Why not just use XHTML instead?

http://www.w3.org/TR/xhtml1/


xmlspy 5.0 problem

Post 3

xyroth

I will be converting the site templates into xhtml, and adding new stuff as xhtml for future-proofing, but I will need to be proficient with xml and the conversion tools for a project that I am working on (not yet though).

as this tool claims to be able to convert html into xml, and give you the tools to convert back again, this struck me as a good oportunity to learn the tricky conversion section. unfortunately, I can't seem to find the method for invoking the conversion in the first place.

one of the projects I am tied up with is trying to create a new distribution of linux, with a different emphasis than the current ones. it would be nice if we could find the information once, store it in xml, grab it where needed, and convert to other formats as required. Xhtml doesn't appear suitable for this task.


xmlspy 5.0 problem

Post 4

Ion the Naysayer

This is highly dependent on your XML structure. It's hard to go from a document structure description markup (HTML) to a generic data description markup (XML) unless you know exactly what you want the XML to look like.

It sounds like you're trying to build documentation or reference material in which case you should probably start with XML rather than trying to convert from HTML since HTML doesn't include meta data.

I could help you more if I knew more about the data you're trying to store.


xmlspy 5.0 problem

Post 5

xyroth

There are two seperate problems I am trying to solve.

first, the website is starting to get quite a bit of generated content in static pages. It would be quite nice if this could be maintained outside the site, and merged into it in the appropriate places without me having to hand-code it into M4 or some other language. XML strikes me as fairly suitable for this.

An example of this generated content is seen on the movies page on my website (see http://www.xyroth-enterprises.co.uk/movies.htm for the example). it contains a list in alphabetical order of all the movies that have a write-up on the site. There are also other pages for individual years which list in alphabetical order the movies copyrighted that year. It would be nice if when I wrote a new review, it copied the entry into the years list, and from their into the main list on the movies page, thus reducing the amount of maintainance effort.

I have other examples in the site which could benefit from the same sort of regenerative model. being able to almost casually add it to the XML model of the page seems possible, but changing it from that into the XHTML version seems very difficult, hence the potential usefulness of this tool (if I can get the blasted thing to work).

having a working XML+XSLT>XHTML system to work from would then help me learn how to create it for the next application in the linux distribution. it is going to need a much better dependancy and configuration system than is currently available. The idea is that you should be able to install just enough to get it to boot into a console and then tell it to add some program (rodegarden for example), and it will use it's knowledge of the current config and the dependancies from the package to install only the extra packages needed to make it usable, and only them.

debian allows this, but the dependancies and config are all local, reinvent the wheel, and are just not good enough. It should be possible to detect the modem or mouse, and once working use the same configure information for every program that uses it, on a discover once, use ubiquitously type model.

again XML is supposed to be good for this sort of task.

I hope this clears up what I want to do and why.


xmlspy 5.0 problem

Post 6

Ion the Naysayer

Ah. Alright. That's what I was looking for.

As far as your first application goes, I would suggest a database and not XML. XML is slow compared to a database. Converting XML to XHTML using XSLT involves a lot of overhead. Most websites that start having bunches of dynamic content go with a database backend eventually because anything else is too slow. PostgreSQL or MySQL would probably do the job nicely. I'd tend to go for Postres because MySQL is currently playing catch-up - it doesn't have subselects which is a killer (for me at least).

As for your second application, I don't really understand why you aren't just creating the XML format by hand and then making sure your software writes that format correctly. XML does seem like the right tool for the job in this case but I don't see any need for XSLT in this case.


xmlspy 5.0 problem

Post 7

xyroth

I will not be using any form of SQL database for my first application, because knowing a bit of the theory behind it, the best thing I can say about it is that it stinks to high heaven.

as for speed, that is not a problem. currently, quite a bit of the website is dynamically regenerated using bbc basic in win95 on a pentium 75 (last year it was being done on a 486 with windows 3.11). when the development moves onto the 900Mhz via C3, this should speed it up considerably. I find it unlikely that xslt is actually slower than that.

the idea behing using xml to xhtml generation was to learn how to do it, while at the same time making the site much easier to process. If you are suggesting I should do that without using xslt, then what do you suggest I use instead?

for the second application, there will be uses like you suggest, but it would also be good if the xml about a specific machine's configuration (in a multi machine network for example) could be dynamically regenerated into a html or xhtml summary page simply from the xml.

in this application it would definately be a discover once, store once, and convert into as many forms as necessary type model of operation.


xmlspy 5.0 problem

Post 8

Ion the Naysayer

XML is slow(er) because it's stored on the filesystem and has to be parsed each time. XML has much higher per session overhead than a database. XML doesn't scale very well for webpages if you're relying on the server to perform the transformations. With XSLT you also need a parser capable of processing an XSLT stylesheet. If you start to get concurrent hits, your web server will slow to a crawl. This is why most medium to large sites use a database backend and not an XML one.

If you're do use XML and want to do XHTML conversion, XSLT is one option. Perl and other scripting languages can also perform XML conversion. I don't know which would be faster but the scripting solution is more flexible. Regardless, both solutions would require an XML parser. Whatever you do, don't start slicing up XML with regular expressions - 99% of the time that's the wrong thing to do.

Learning XSLT isn't too hard. I found the following article very helpful. The other articles in the series are also useful. To find the others you can do a Google search for "Taming the XML Beast".

http://www.webreview.com/2000/09_01/developers/09_01_00_3.shtml

If you're going to be converting XML to other formats on a local machine, XSLT is probably the solution you're looking for since the difference in overhead is less significant.


xmlspy 5.0 problem

Post 9

xyroth

the aim is most definately to have it so that when the content is changed locally, the XML page is trivial to update. This should be a fairly rare occurance for any specific page.

Then, like with make, it should spot that the page has been changed, and ripple the changes through any relevent dependant pages, regenerating the html user copy as it goes.

Ideally, It would be nice to be able to do this automatically.

it is sounding as though XSLT might not be the most appropriate format to use for doing this.


xmlspy 5.0 problem

Post 10

Ion the Naysayer

I think you have a misconception about XML... XML marks up the content. When you change the content, you're changing the XML file (can't get more trivial than that).

It sounds like you want static pages that you can update more easily. That would be a good use of XSLT but why not just make the leap right to dynamic content generation? It takes a little more processing power but is much more flexible - if you start to add interactive components, the static solution goes out the window anyway.


xmlspy 5.0 problem

Post 11

xyroth

why not make the leap to dynamic content creation?

for a very good reason. my site currently has over 280 pages on it, and all but about half a dozen of them load almost instantly, anre viewable in any browser (see http://www.anybrowser.com for an explanation) and are still there next month.

they also stay in the cache in the same way that h2g2 pages tend not to.

moving to dynamic content (as in online databases) would make the entire site very slow, stop the pages from cacheing properly, and generally cause more trouble than it is worth.

however with that many pages at http://www.xyroth-enterprises.co.uk/ you can understand why I am starting to find it a bit tricky to maintain.

even so, I currently have no broken internal links in the site, and am not aware of any broken links external to the site which have either not been fixed (if trivial) or will not have been fixed at the next publication (hopefully soon).

The only interactive components on my site are the cookies on dedicated affiliate pages (which make sure I get commission if you buy anything) which you don't find unless you are looking for that sort of stuff, and the form filling page (which is required for some of the dynamically updated pages like http://www.xyroth-enterprises.co.uk/h2g2pals.htm and a couple of similar ones.)

I would like to get to grips with something like XSLT to transform pages which would then be created in XML to allow more semantic content to be marked up within the pages, but I can't find anything that makes XSLT look like anything but overly complex and unnecessarily hard to learn.

have a glance at my site (especially the site index at http://www.xyroth-enterprises.co.uk/sindex.htm ) and se what I mean about the usefulness of the current policies.

anything you could suggest which would help make generation and maintainance easier would be welcomed, but I specifically won't touch tools like frontpage express and netscape composer which insist on bloating your html, breaking it with bad assumptions and rewriting it, thus making it into nearly write-only code.

this is why I thought XML might help.


xmlspy 5.0 problem

Post 12

Ion the Naysayer

"moving to dynamic content (as in online databases) would make the entire site very slow, stop the pages from cacheing properly, and generally cause more trouble than it is worth."

The biggest and fastest sites on the Internet use database backends so your assertion that it will make the site slow is just plain wrong. The pages will also cache just fine if they're dynamically generated but the content is static so long as you don't set the No-Cache pragma and leave their expiry date alone.

There are tools for link checking, most notably the W3C Link Checker .

Don't think of XML in terms of pages. XML is for data. Think of XML as a database without SQL. What you put into the XML is irrelevant - it's just a storage format.

I do all my web work (ALL my web work) in Notepad so don't worry about me recommending bloated tools.

I think before you make the attempt to use XML you should do more research. You may want to start with the XML page at the W3C . Read the XML spec . Read that "Taming the XML Beast" article I posted the link to.


xmlspy 5.0 problem

Post 13

xyroth

I already know about some of that.

While I might find some use for XML as a complex database, most of my database work I try and keep as flat files. it makes them so much easier to keep consistant and to process properly.

Am I totally wrong in thinking of xml as partially designed as a semantic markup language?


xmlspy 5.0 problem

Post 14

Ion the Naysayer

Essentially... Yes.

From about halfway down the page :

"XML itself provides no semantics for its tags."


Key: Complain about this post