japanfemdom

NewsMac Pro 1.1 change summary

I’m hoping to finish 1.1 this week and get it to testers for a public release next week sometime. Here is a summary of what’s new in this release:

New features

  • Adaptive interface – basically the interface configures itself differently depending on the task you’re currently performing. E.g. when you want to view a headline in the web browser it automatically grows to fill the width of the window, but then shrinks back down when you go back to viewing headlines. Irrelevant table columns disappear when not needed (e.g. search rank when you’re looking at your bookmarks). See it in action! [obsolete link removed] 1MB H264, kinda blurry, early beta.
  • New headlines item in the source list displays all freshly downloaded headlines.
  • You can now turn off both the unread column (which displays those coloured circles with unread headline counts) and the My Rating column so you can make the channel list really skinny.
  • Podcast player – plays and logs all your podcasts so you don’t have to clutter iTunes.
  • Setup assistant (appears first time you run NewsMac and gives you the option to turn off the built-in channels, import your existing RSS feeds from OPML or just to start with a blank canvas.
  • Option to display channels with small icons.
  • Check for new version option – ’bout time I added this really.

Fixes and Improvements

  • Channels now finish updating much more quickly because downloaded headlines don’t get indexed until after each channel has been downloaded and parsed.
  • NewsMac no longer forgets the positions of each splitter view (yay!).
  • The publication date displayed by each channel is now always the publication date of the last headline. For what ever reason an awful lot of RSS feeds have screwy dates specified in the feed level pub date/last build date tags. For Atom feeds I’m still using the publication date specified for the whole feed, in general atom feeds seem to be a bit less quirky than RSS feeds but I might change this too at some point just so the behaviour is consistent.
  • Other stuff I can’t think of right now.
  • NewsMac now correctly identifies and parses Atom 1.0 format feeds, no support for atom enclosures yet though. I’ll probably add that when someone moans at me to do so, because everyone seems to be using RSS 2 for podcasting atm.

I think this release covers the majority of the requests I’ve had for new features to date.

Sosumi

I found all the original Mac sounds [obsolete link removed] available as aiff audio files so now I can enjoy a Wild Eep every time I miss-click in OS X, great!

NewsMac Pro banner

If you surf over to ThinkMac today or visit the NewsMac Pro product page you can see the cool new iPod silhouette people inspired product banners too.

Performance vs. responsiveness

Striking the balance between performance and application responsiveness isn’t particularly easy. Heck even engineering a solution that offers some kind of balance isn’t that straightforward. But before I go into that, what’s the difference between responsiveness and performance? Well responsiveness refers to how quickly the application responds to your input, whether it beachballs and stutters or whether it runs smoothly. Performance on the other hand is how quickly the internal engine does its allotted task. Typically the harder you work the engine, the worse responsiveness becomes. This is because the CPU time that would be spent dealing with your mouse clicks and updating the user interface is being eaten up by the engine. On the other hand if you want a super responsive application you typically have to throttle back the engine to make sure there are enough resources that it runs smoothly.

In the context of NewsMac Pro the engine is the part of the program that downloads and parses RSS and Atom feeds. Up until now the engine has been allowed to pretty much kill the applications responsiveness in order to get things done fast. But this doesn’t really offer the best user experience, it means while lots of downloads are occurring you more or less have to just leave the app alone. This probably doesn’t effect most people that badly, it only really hits when you have lots of scheduled folder updates going or choose to reload a folder with lots of channels.

Still I see the ability as being able to deal with these large numbers of channels gracefully as being oen of the key goals of NewsMac Pro. You should be able to load 200 channels at once and not have to wait seconds for your mouse clicks to be registered. So with that in mind I decided to try and figure out a way of throttling back the engine, of adding a bottleneck somewhere that would give the app a major responsiveness boost under heavy load.

So first thing I needed to do was really identify which part of the whole process was hammering the CPU so badly. I was pretty sure it was the RSS/Atom parsers doing their thing that was the cause. The way NewsMac Pro works is that there can be up to 10 feeds downloading at once, and as soon as any of those 10 processes finish downloading they start parsing. This means that up to 10 parsers can be running at any one time. Now parsing 10 RSS feeds at the same time and indexing their headlines in an object oriented language like Cocoa is surprisingly CPU intensive. Lots of objects get created, stored, sorted, removed and compared. Running 10 at a time was bringing the app to its knees. So to see how much over head those 10 downloads were generating without the parsing I simply commented out the bit of code that invoked the parser. No noticible slow down. Hmm, well OK then the solution is fairly straightforward. Download everything as quick as we can, then queue it up and parse one channel at a time. This way downloads can happen really fast and are not dependant on the parser finishing to get on with the next channel, and there is only one CPU intensive parsing operation going on at a time.

Bingo, responsiveness! The down side is of course that batch channel updates take longer to finish processing, but at least you can pleasantly read the headlines from those that have downloaded while you wait. I’m sure as time goes by I’ll be able to further refine this and make it reasonable to have several parsers running together, but for the time being this offers a step in the right direction.

[image missing]

The diagram above shows you how the ‘engine’ works in NewsMac Pro.

1. The user initiates a download, e.g. by clicking on a channel.
2. The request is placed into a queue.
3. As soon as one of the 10 download slots becomes free the request is pulled off the queue and the RSS/Atom feed is downloaded.
4. The downloaded raw XML is placed into another queue to await being parsed.
5. When the parser is free from it’s last task it pulls another raw XML file off the queue to process.
6. Finally the processed headlines get added to the headline database and indexed for searching. From here they are accessible to the user.

Removing entities from HTML in Cocoa

To display accented characters and certain symbols in a HTML or XML document you need to encode them. For example the copyright symbol © is represented in HTML as ©

Applications like NewsMac Pro need to be able to decode these entities and translate them to the appropriate character. Straightforward you might think, but actually it isn’t. There are multiple ways in which characters can be encoded, as before with a textual name, but also with a decimal or hex value. In NewsMac Pro I used to use NSAttributtedString’s initWithHTML method, however for what ever reason this seem to lock up under Tiger, so I had to find an alternative solution. I thought I’d post the following code to help out other developers because if you go searching on this topic you will most likely get people telling you to use the NSAttributedString method.

This probably isn’t the most elegant bit of code ever, but it serves its purpose:

+ (NSString *) decodeCharacterEntitiesIn:(NSString *)source
{ 
  if(!source) return nil;
  else if([source rangeOfString: @"&"].location == NSNotFound) return source;
  else
  {
    NSArray *codes = [NSArray arrayWithObjects: 
      @" ", @"¡", @"¢", @"£", @"¤", @"¥", @"¦",
      @"§", @"¨", @"©", @"ª", @"«", @"¬", @"­", @"®",
      @"¯", @"°", @"±", @"²", @"³", @"´", @"µ",
      @"¶", @"·", @"¸", @"¹", @"º", @"»", @"¼",
      @"½", @"¾", @"¿", @"À", @"Á", @"Â",
      @"Ã", @"Ä", @"Å", @"Æ", @"Ç", @"È",
      @"É", @"Ê", @"Ë", @"Ì", @"Í", @"Î", @"Ï",
      @"Ð", @"Ñ", @"Ò", @"Ó", @"Ô", @"Õ", @"Ö",
      @"×", @"Ø", @"Ù", @"Ú", @"Û", @"Ü", @"Ý",
      @"Þ", @"ß", @"à", @"á", @"â", @"ã", @"ä",
      @"å", @"æ", @"ç", @"è", @"é", @"ê", @"ë",
      @"ì", @"í", @"î", @"ï", @"ð", @"ñ", @"ò",
      @"ó", @"ô", @"õ", @"ö", @"÷", @"ø", @"ù",
      @"ú", @"û", @"ü", @"ý", @"þ", @"ÿ", nil];
    
    NSArray *highCodes = [NSArray arrayWithObjects: @"Œ",   // 338
                                                    @"œ",   // 339
                                                    @"Š",  // 352
                                                    @"š",  // 353 
                                                    @"Ÿ",    // 376
                                                    @"ˆ",    // 710
                                                    @"˜",   // 732
                                                    @"–",   // 8211
                                                    @"—",   // 8212
                                                    @"‘",   // 8216
                                                    @"’",   // 8217
                                                    @"‚",   // 8218
                                                    @"“",   // 8220
                                                    @"”",   // 8221
                                                    @"„",   // 8222
                                                    @"†",  // 8224
                                                    @"‡",  // 8225
                                                    @"…",  // 8230
                                                    @"‰",  // 8240
                                                    @"‹",  // 8249
                                                    @"›",  // 8250
                                                    @"€",    // 8364
                                                    nil];
    int highCodeNumbers[22] = { 338, 339, 352, 353, 376, 710, 732, 8211, 8212,
                              8216, 8217, 8218, 8220, 8221, 8222, 8224, 8225,
                              8230, 8240, 8249, 8250, 8364 }; // 22 ints
    
    // decode basic XML entities:
    NSMutableString *escaped = [NSMutableString stringWithString: 
         (NSString *)CFXMLCreateStringByUnescapingEntities (NULL, (CFStringRef)source, NULL)];

    // Html
    int i, count = [codes count];
    for(i = 0; i < count; i++)
    {
      NSRange range = [source rangeOfString: [codes objectAtIndex: i]];
      if(range.location != NSNotFound)
      {
        [escaped replaceOccurrencesOfString: [codes objectAtIndex: i] 
                                 withString: [NSString stringWithFormat: @"%C", 160 + i] 
                                    options: NSLiteralSearch 
                                      range: NSMakeRange(0, [escaped length])];
      }
    }
    
    count = [highCodes count];
    
    // Html high codes
    for(i = 0; i < count; i++)
    {
      NSRange range = [source rangeOfString: [highCodes objectAtIndex: i]];
      if(range.location != NSNotFound)
      {
        [escaped replaceOccurrencesOfString: [highCodes objectAtIndex: i] 
                                 withString: [NSString stringWithFormat: @"%C", highCodeNumbers[i]] 
                                    options: NSLiteralSearch 
                                      range: NSMakeRange(0, [escaped length])];
      }
    }
    
    // Decimal & Hex
    NSRange start, finish, searchRange = NSMakeRange(0, [escaped length]);
    i = 0;
    
    while(i < [escaped length]) { start = [escaped rangeOfString: @"&#" options: NSCaseInsensitiveSearch range: searchRange]; finish = [escaped rangeOfString: @";" options: NSCaseInsensitiveSearch range: searchRange]; if(start.location != NSNotFound && finish.location != NSNotFound && finish.location > start.location && finish.location - start.location < 5)
      {
        NSRange entityRange = NSMakeRange(start.location, (finish.location - start.location) + 1);
        NSString *entity = [escaped substringWithRange: entityRange];     
        NSString *value = [entity substringWithRange: NSMakeRange(2, [entity length] - 2)];
        
        [escaped deleteCharactersInRange: entityRange];
        
        if([value hasPrefix: @"x"])
        {
          unsigned int tempInt = 0;
          NSScanner *scanner = [NSScanner scannerWithString: [value substringFromIndex: 1]];
          [scanner scanHexInt: &tempInt];
          [escaped insertString: [NSString stringWithFormat: @"%C", tempInt] atIndex: entityRange.location];
        }
        else
        {
          [escaped insertString: [NSString stringWithFormat: @"%C", [value intValue]] atIndex: entityRange.location];
        }
        i = start.location;
      }
      else
      {
        //i++;
        break;
      }
      searchRange = NSMakeRange(i, [escaped length] - i);
    }
    return escaped;    
  }
}

The tyranny of broken HTML in RSS

One of the problems with rendering RSS content nicely is broken HTML tag pairs. It seems certain RSS generators are very careless when it comes to preparing item summaries, often chopping through the middle of link tags when snatching the first few lines of an article. This isn’t such a big deal if you’re just displaying one item, but if you’ve got a whole bunch of these displayed one after another a single broken anchor (link) tag or stray blockquote (indentation tag) can really mess things up. I really don’t want to have to get into HTML parsing but it looks like I’m not going to have much choice at this rate.

A couple of offenders I spotted today are BoingBoing and AppleMatters. There are many more though, it’s just down to luck which ones get away with it and which ones don’t.