The tyranny of broken HTML in RSS

One of the problems with rendering RSS content nicely is broken HTML tag pairs. It seems certain RSS generators are very careless when it comes to preparing item summaries, often chopping through the middle of link tags when snatching the first few lines of an article. This isn’t such a big deal if you’re just displaying one item, but if you’ve got a whole bunch of these displayed one after another a single broken anchor (link) tag or stray blockquote (indentation tag) can really mess things up. I really don’t want to have to get into HTML parsing but it looks like I’m not going to have much choice at this rate.

A couple of offenders I spotted today are BoingBoing and AppleMatters. There are many more though, it’s just down to luck which ones get away with it and which ones don’t.

