Performance vs. responsiveness

Striking the balance between performance and application responsiveness isn’t particularly easy. Heck even engineering a solution that offers some kind of balance isn’t that straightforward. But before I go into that, what’s the difference between responsiveness and performance? Well responsiveness refers to how quickly the application responds to your input, whether it beachballs and stutters or whether it runs smoothly. Performance on the other hand is how quickly the internal engine does its allotted task. Typically the harder you work the engine, the worse responsiveness becomes. This is because the CPU time that would be spent dealing with your mouse clicks and updating the user interface is being eaten up by the engine. On the other hand if you want a super responsive application you typically have to throttle back the engine to make sure there are enough resources that it runs smoothly.

In the context of NewsMac Pro the engine is the part of the program that downloads and parses RSS and Atom feeds. Up until now the engine has been allowed to pretty much kill the applications responsiveness in order to get things done fast. But this doesn’t really offer the best user experience, it means while lots of downloads are occurring you more or less have to just leave the app alone. This probably doesn’t effect most people that badly, it only really hits when you have lots of scheduled folder updates going or choose to reload a folder with lots of channels.

Still I see the ability as being able to deal with these large numbers of channels gracefully as being oen of the key goals of NewsMac Pro. You should be able to load 200 channels at once and not have to wait seconds for your mouse clicks to be registered. So with that in mind I decided to try and figure out a way of throttling back the engine, of adding a bottleneck somewhere that would give the app a major responsiveness boost under heavy load.

So first thing I needed to do was really identify which part of the whole process was hammering the CPU so badly. I was pretty sure it was the RSS/Atom parsers doing their thing that was the cause. The way NewsMac Pro works is that there can be up to 10 feeds downloading at once, and as soon as any of those 10 processes finish downloading they start parsing. This means that up to 10 parsers can be running at any one time. Now parsing 10 RSS feeds at the same time and indexing their headlines in an object oriented language like Cocoa is surprisingly CPU intensive. Lots of objects get created, stored, sorted, removed and compared. Running 10 at a time was bringing the app to its knees. So to see how much over head those 10 downloads were generating without the parsing I simply commented out the bit of code that invoked the parser. No noticible slow down. Hmm, well OK then the solution is fairly straightforward. Download everything as quick as we can, then queue it up and parse one channel at a time. This way downloads can happen really fast and are not dependant on the parser finishing to get on with the next channel, and there is only one CPU intensive parsing operation going on at a time.

Bingo, responsiveness! The down side is of course that batch channel updates take longer to finish processing, but at least you can pleasantly read the headlines from those that have downloaded while you wait. I’m sure as time goes by I’ll be able to further refine this and make it reasonable to have several parsers running together, but for the time being this offers a step in the right direction.

[image missing]

The diagram above shows you how the ‘engine’ works in NewsMac Pro.

1. The user initiates a download, e.g. by clicking on a channel.
2. The request is placed into a queue.
3. As soon as one of the 10 download slots becomes free the request is pulled off the queue and the RSS/Atom feed is downloaded.
4. The downloaded raw XML is placed into another queue to await being parsed.
5. When the parser is free from it’s last task it pulls another raw XML file off the queue to process.
6. Finally the processed headlines get added to the headline database and indexed for searching. From here they are accessible to the user.

Removing entities from HTML in Cocoa

To display accented characters and certain symbols in a HTML or XML document you need to encode them. For example the copyright symbol © is represented in HTML as ©

Applications like NewsMac Pro need to be able to decode these entities and translate them to the appropriate character. Straightforward you might think, but actually it isn’t. There are multiple ways in which characters can be encoded, as before with a textual name, but also with a decimal or hex value. In NewsMac Pro I used to use NSAttributtedString’s initWithHTML method, however for what ever reason this seem to lock up under Tiger, so I had to find an alternative solution. I thought I’d post the following code to help out other developers because if you go searching on this topic you will most likely get people telling you to use the NSAttributedString method.

This probably isn’t the most elegant bit of code ever, but it serves its purpose:

+ (NSString *) decodeCharacterEntitiesIn:(NSString *)source
{ 
  if(!source) return nil;
  else if([source rangeOfString: @"&"].location == NSNotFound) return source;
  else
  {
    NSArray *codes = [NSArray arrayWithObjects: 
      @" ", @"¡", @"¢", @"£", @"¤", @"¥", @"¦",
      @"§", @"¨", @"©", @"ª", @"«", @"¬", @"­", @"®",
      @"¯", @"°", @"±", @"²", @"³", @"´", @"µ",
      @"¶", @"·", @"¸", @"¹", @"º", @"»", @"¼",
      @"½", @"¾", @"¿", @"À", @"Á", @"Â",
      @"Ã", @"Ä", @"Å", @"Æ", @"Ç", @"È",
      @"É", @"Ê", @"Ë", @"Ì", @"Í", @"Î", @"Ï",
      @"Ð", @"Ñ", @"Ò", @"Ó", @"Ô", @"Õ", @"Ö",
      @"×", @"Ø", @"Ù", @"Ú", @"Û", @"Ü", @"Ý",
      @"Þ", @"ß", @"à", @"á", @"â", @"ã", @"ä",
      @"å", @"æ", @"ç", @"è", @"é", @"ê", @"ë",
      @"ì", @"í", @"î", @"ï", @"ð", @"ñ", @"ò",
      @"ó", @"ô", @"õ", @"ö", @"÷", @"ø", @"ù",
      @"ú", @"û", @"ü", @"ý", @"þ", @"ÿ", nil];
    
    NSArray *highCodes = [NSArray arrayWithObjects: @"Œ",   // 338
                                                    @"œ",   // 339
                                                    @"Š",  // 352
                                                    @"š",  // 353 
                                                    @"Ÿ",    // 376
                                                    @"ˆ",    // 710
                                                    @"˜",   // 732
                                                    @"–",   // 8211
                                                    @"—",   // 8212
                                                    @"‘",   // 8216
                                                    @"’",   // 8217
                                                    @"‚",   // 8218
                                                    @"“",   // 8220
                                                    @"”",   // 8221
                                                    @"„",   // 8222
                                                    @"†",  // 8224
                                                    @"‡",  // 8225
                                                    @"…",  // 8230
                                                    @"‰",  // 8240
                                                    @"‹",  // 8249
                                                    @"›",  // 8250
                                                    @"€",    // 8364
                                                    nil];
    int highCodeNumbers[22] = { 338, 339, 352, 353, 376, 710, 732, 8211, 8212,
                              8216, 8217, 8218, 8220, 8221, 8222, 8224, 8225,
                              8230, 8240, 8249, 8250, 8364 }; // 22 ints
    
    // decode basic XML entities:
    NSMutableString *escaped = [NSMutableString stringWithString: 
         (NSString *)CFXMLCreateStringByUnescapingEntities (NULL, (CFStringRef)source, NULL)];

    // Html
    int i, count = [codes count];
    for(i = 0; i < count; i++)
    {
      NSRange range = [source rangeOfString: [codes objectAtIndex: i]];
      if(range.location != NSNotFound)
      {
        [escaped replaceOccurrencesOfString: [codes objectAtIndex: i] 
                                 withString: [NSString stringWithFormat: @"%C", 160 + i] 
                                    options: NSLiteralSearch 
                                      range: NSMakeRange(0, [escaped length])];
      }
    }
    
    count = [highCodes count];
    
    // Html high codes
    for(i = 0; i < count; i++)
    {
      NSRange range = [source rangeOfString: [highCodes objectAtIndex: i]];
      if(range.location != NSNotFound)
      {
        [escaped replaceOccurrencesOfString: [highCodes objectAtIndex: i] 
                                 withString: [NSString stringWithFormat: @"%C", highCodeNumbers[i]] 
                                    options: NSLiteralSearch 
                                      range: NSMakeRange(0, [escaped length])];
      }
    }
    
    // Decimal & Hex
    NSRange start, finish, searchRange = NSMakeRange(0, [escaped length]);
    i = 0;
    
    while(i < [escaped length]) { start = [escaped rangeOfString: @"&#" options: NSCaseInsensitiveSearch range: searchRange]; finish = [escaped rangeOfString: @";" options: NSCaseInsensitiveSearch range: searchRange]; if(start.location != NSNotFound && finish.location != NSNotFound && finish.location > start.location && finish.location - start.location < 5)
      {
        NSRange entityRange = NSMakeRange(start.location, (finish.location - start.location) + 1);
        NSString *entity = [escaped substringWithRange: entityRange];     
        NSString *value = [entity substringWithRange: NSMakeRange(2, [entity length] - 2)];
        
        [escaped deleteCharactersInRange: entityRange];
        
        if([value hasPrefix: @"x"])
        {
          unsigned int tempInt = 0;
          NSScanner *scanner = [NSScanner scannerWithString: [value substringFromIndex: 1]];
          [scanner scanHexInt: &tempInt];
          [escaped insertString: [NSString stringWithFormat: @"%C", tempInt] atIndex: entityRange.location];
        }
        else
        {
          [escaped insertString: [NSString stringWithFormat: @"%C", [value intValue]] atIndex: entityRange.location];
        }
        i = start.location;
      }
      else
      {
        //i++;
        break;
      }
      searchRange = NSMakeRange(i, [escaped length] - i);
    }
    return escaped;    
  }
}

The tyranny of broken HTML in RSS

One of the problems with rendering RSS content nicely is broken HTML tag pairs. It seems certain RSS generators are very careless when it comes to preparing item summaries, often chopping through the middle of link tags when snatching the first few lines of an article. This isn’t such a big deal if you’re just displaying one item, but if you’ve got a whole bunch of these displayed one after another a single broken anchor (link) tag or stray blockquote (indentation tag) can really mess things up. I really don’t want to have to get into HTML parsing but it looks like I’m not going to have much choice at this rate.

A couple of offenders I spotted today are BoingBoing and AppleMatters. There are many more though, it’s just down to luck which ones get away with it and which ones don’t.

NSURLConnection woes

I’ve been trying to improve the speed at which things download in NewsMac Pro as well as provide support for things like feeds which require authentication. The logical choice seemed to be moving from using NSURLHandle and friends to NSURLConnection which was introduced with WebKit back in OS X 10.2.7.

The first thing that struck me about NSURLConnection was that it was very light on methods – still I figured that would just make it a bit easier to use. Initially I tried using it synchronously (this means the thread that was doing the download would basically hang until the connection either finished downloading, or failed). However the performance wasn’t great, and I read on CocoaDev that this approach also leaked memory. So the other day I decided to do a pretty major overhaul of the download system to use the event driven delegate methods. That wasn’t too hard and it only took a few hours to have it up and running, but then I discovered a huge caveat that seems affect a lot of WebKit related classes – it can’t cope at all well with threads. Now in a networked application threads are essential unless you want the entire app to lock up for the duration of each burst of network activity. NSURLConnection does threading behind the scenes, but makes it very hard to actually be run itself from a thread – which is more or less necessary if you want to have multiple concurrent downloads happening.

Anyway I thought I’d solved this and performance was indeed better, then I click on a freshly downloaded channel while others were still downloading, the new headlines pop up, I click one to see it displayed in the headline browser (a WebView) and boom NewsMac crashes inside one of NSURLConnections’ threads. WTF? Clearly the WebView was creating its own NSURLConnection and that was conflicting with the one’s I’d created, but I don’t see why it should. I really hope Apple fixes this ASAP because this is just shabby. I’m now left with the choice of going back to the old way of doing things or rewriting around something like CURLHandle, which I’ve just downloaded to estimate how much work it would take to integrate into NewsMac. That essential broken classes like this make their way into the API of a shipping operating system and then remain unfixed for over a year strikes me as unacceptable, and NSURLConenction isn’t alone. At the very least Apple could provide a warning in the API that the class is still ‘experimental’.

While some of you might be horrified that NewsMac Pro seems this broken, let me reassure you that with a object oriented language and modular program like NewsMac, ripping out the engine and sticking a new one in isn’t that big of a deal – it’s just an annoyance because this is time I’d rather be spending on finishing other features.

Mixed metaphors

I’ve noticed that there might be some redundant functionality in NewsMac, the thing is I’m not sure what the preferable solution to the problem is – what’s the best way of marking things that you use a lot and want quick access to?

Originally this problem was solved by the idea of favourites – you could mark any channel as a favourite and it would appear in the favourites bar and favourites collection. Then I introduced star ratings because I thought people like to be able to grade the usefulness of a given channel for future reference. But is there really enough difference between say a 4 or 5 star rating and having something marked as a favourite for quick access – surely you’ll probably want quick access to those sites you’ve rated so highly?

NewsMac Pro further complicates things because you can create any number of folders and drag any sets of channels into them that you desire for quick access. This essentially removes the original purpose of favourites. So I’m left wondering if it’s sufficient to just drop the whole favourites concept completely and just use star ratings, which allow more granular control over likes and dislikes. I can see a situation where you might only occasionally read a certain channel but still give it a high rating because of the quality and therefore not want it in your favourites listing, but even so it seems a bit tenuous.

The other things is NewsMac Pro also introduces the concept of bookmarks and I can foresee there being confusion about the different between bookmarks and favourites because the terminology is already mixed up by the different web browsers out there. In NewsMac Pro bookmarks apply to headlines – they let you keep a reference to a specific headline and they also override the automatic history removal so a bookmarked headline will stay around indefinitely until you remove the bookmark. It probably won’t make the 1.0 release but I want to have the ability to automatically export these bookmarks to Safari too.

Anyway on the topic of NewsMac oddities, the other thing that comes to mind is the way you synchronise things with an iPod or Palm – you have to separately mark it as ‘to be synchronised’. This concept caused confusion and will be removed in NewsMac Pro – instead you can just pick any folder to be the source of synchronised channels (well with the exception of the ‘All’ smart folder because it would exceed the capacity of an iPod or Palm to try and synchronise 100s of channels!).