Sunday, May 05, 2013

Optimizing Tr

Suneido has a string.Tr function, similar to the Unix tr command. Recently, I was looking at the stdlib Base64 code. Decode was using string.Tr to strip any newlines. However, there may not be any newlines. Which made me wonder what Tr did in this case - did it still make a copy of the string? Sure enough, it did.

My first reaction was to add a guard:

if s.Has?('\n')
    s = s.Tr('\n')

Then I started to wonder where else we should be doing this. But that was ugly. It made more sense to build it into Tr.

But implementing it as above would mean doing an extra scan of the source string. Since the code was already scanning, it would be more efficient to just defer copying until found somewhere where you needed to make a change.

I implemented this in the C++ code. It complicated it a little, but not too bad.

Then I tried to implement the same thing in the Java code. Not being able to "cheat" with macros like in the C++ version meant I had to create an instance to share the variables. This defeated part of the purpose (to avoid allocation) and seemed ugly.

Instead I decided to do an initial scan to find the first character to be changed, and then either return or continue from that point. This still avoided redundant scanning, without complicating the main part of the code.

In the process, I discovered there were a few other cases I could short circuit - if the source string is empty, or if the "from" character set is empty then you can just return the source string unchanged. Also, if there are no ranges in the from set or the to set, then you don't need to expand the sets, avoiding more allocation and copying, albeit only on the sets. I also simplified the code a little. (It was a good thing I had tests since I "simplified" a little too aggressively a few times!) See the code.

I also decided to add a cache for expanding sets with ranges. I'm not sure that's justified, but it's similar to what I do with regular expressions. The regular expression code had been using a home-brew LruCache, but I switched to Google Guava's caching facilities. That also allowed time based expiry so I could make the cache bigger without wasting space if it wasn't needed.

Then I went back and revised the C++ tr code using the same approach as the Java code.  For some reason I had originally use std::vector for the sets. I switched to making them gcstring's to avoid making a new set if there are no ranges to expand. I also added a cache, using the existing CacheMap that was being used for regular expressions.

Although it sometimes bugs me to have to update two implementations of Suneido, it does have its benefits. It means if I want to use the same approach I can't take too much advantage of specific language features (e.g. macros). This might mean I can't use some fancy feature, but in many cases that's not the wisest anyway. By the time I've written the code in two different languages, I think the end result is usually better than if I'd just written it once.

Again, I'm guilty of optimizing without real proof of whether it's justified. Tr is fairly heavily used in some areas, and the changes will probably speed up those areas. But whether that makes much difference in the bigger picture is questionable.
An amusing historical aside - I originally ported this code from Software Tools by Kernighan and Plauger, which uses Ratfor (rational Fortran). The code's origins are still visible in some of the names and structure. This book was an inspiration in my early programming days. I still have my original copy, although it's falling apart. It's pretty cool that it's still in print 37 years later.

See also previous posts: String Building Internals and An Amusing Bug

Friday, May 03, 2013

An Amusing Bug

I recently realized that the block form of Suneido's string.Replace was a more efficient way to "map" over strings.

As I mentioned in my recent post on String Building Internals, I also discovered cSuneido's string.Replace didn't handle nul's. I fixed this problem and we sent out the new suneido.exe to our beta customers.

And we started to get support calls that PDF attachments weren't being sent properly. Sure enough it was the new version of Base64.Encode that used string.Replace.

But I had tests, and we had also tested manually and it worked fine. We got one of the problem files from a customer and sure enough it failed.

Digging into it, I could see that it was only encoding part of the file. That seemed a bit like the previous nul problem, which led me in the wrong direction for a while.

More testing revealed it was encoding the first 39996 characters of the file. That seemed like an odd number. My first thought was that it was in the general vicinity of SHORT_MAX or 32767. When I first wrote the C++ version of Suneido I was still trying to use short int's when possible. This has led to a number of issues since SHORT_MAX isn't very big in modern terms. But the relevant code wasn't using any short int's.

But I noticed a magic number of 9999 in the code. Base64 encode outputs groups of 4 characters. 4 x 9999 - 39996. Aha!

string.Replace takes an optional argument for how many replacements to do. Usually, you either want 1 or all. When not specified, the count was defaulting to 9999. For "normal" usage, that's plenty. But when using replace to map large strings, it obviously isn't.

I changed it to INT_MAX and that fixed the problem. Out of curiosity  I went and checked the jSuneido code. (It did not have the same issue.) Frustratingly, it already had INTMAX. I don't think I foresaw this issue when I ported the code, it probably just bugged me to have a magic number.

I'm not sure why this strikes me as amusing. It just seems funny that the "bug" was not something obscure, just a badly chosen limit, that no doubt seemed entirely reasonable at the time.

Friday, April 26, 2013

No auto-update for 64 bit Java on Windows

It's hard to believe after all these years, and all the security issues, that there's still no auto-update for 64 bit Java on Windows.

I have known that Java on my Windows machine wasn't updating properly, but I just assumed it was because I had multiple copies and versions etc. But it's a hassle having to remember to manually download and install updates so finally I decided to try to fix it, only to discover there is no fix.

This was entered as a bug in 2006, and it's currently scheduled to be fixed in Java 8 (!)

Sometimes you really have to wonder about how these things get prioritized. Granted, our customers say the same thing about us, but considering what a security issue this is, I would have thought it would get addressed. Back in 2006 I can see thinking "no one" was running 64 bit, but nowadays that's not a good assumption.

Sunday, March 31, 2013

String Building Internals

Recently we ran into a problem where a service using cSuneido was running out of memory. I confidently switched the service to use jSuneido, and got stack overflows. So much for my confidence!

We tracked it down to calling Base64.Encode on big strings (e.g. 1mb) (This came from sending email with large attachments.)

Base64.Encode was building the result string a character at a time by concatenating. In many languages this would be a bad idea for big strings. (e.g. In Java you would use StringBuilder instead.) But Suneido does not require you to switch to a different approach for big strings, the implementation is intended to handle it automatically.

Up till now, Suneido has handled this by deferring concatenation of larger strings. If the length of the result is greater than a threshold (64 bytes in cSuneido, 256 bytes in jSuneido) then the result is a "Concat" that simply points to the two pieces. This defers allocating a large result string and copying into it until some operation requires the value of the result. Basically, Suneido makes a linked list of the pieces.

For example, if you built a 1000 character string by adding one character at a time, Suneido would only allocate and copy the result once, not 1000 times. (It still has to create 1000 Concat instances but these are small.)

In practice, this has worked well for many years.

I thought I had implemented the same string handling in jSuneido, but actually I'd taken a shortcut, and that was what caused the stack overflows.

The obvious way to convert the tree of Concat's to a regular string is a recursive approach, first copying the left side of each concat, and then copying the right side, each of which could be either a simple string or another Concat. The problem is that the tree is not balanced. The most common case is appending on the end, which leads to a tree that looks like:


But with many more levels. If your tree has a million levels, then recursing on each side (in particular the left side) will cause stack overflow. In cSuneido I had carefully iterated on the left side and recursed on the right. (Basically doing manual tail-call elimination.) But when I ported the code to Java I had unknowingly simplified to recurse on both sides.

Once I knew what the problem was, it was easy enough to fix, although a little trickier due to using StringBuilder.

Thinking about the problem I realized in this case, there was a better way to write Base64.Encode. I could use the form of string.Replace that takes a function to apply to each match, and simply replace every three characters with the corresponding four characters in Base64. Since string.Replace uses a buffer that doubles in size as required (via StringBuilder in jSuneido), this is quite efficient. (In the process I discovered string.Replace in cSuneido choked on nul's due to remnants of nul terminated string handling. Easy enough to fix.)

But that left the problem of why cSuneido ran out of memory. I soon realized that the whole Concat approach was hugely wasteful of memory, especially when concatenating very small  strings. Each added piece requires one memory block for the string, and another for the Concat. Most heaps have a minimum size for a heap block e.g. 32 bytes. So in the case of appending a single character, it was using 64 bytes. So building a 1 mb string a character at a time would use something like 64 mb. Ouch!

In theory, cSuneido could still handle that, but because it uses conservative garbage collection, spurious references into the tree could end up with large amounts of garbage accumulating.

Maybe my whole Concat scheme wasn't ideal. To be fair, it has worked well up till now, and still works well for moderate string sizes. It just wasn't designed for working with strings that are millions of characters long.

I dreamed up various complicated alternative approaches but I wasn't excited about implementing them. It seemed like there had to be a simpler solution.

Finally, I realized that if you only worried about the most common case of appending on the end, you could use a doubling buffer approach (like StringBuilder). The result of a larger concatenation would be a StrBuf (instead of a concat). Because strings are immutable, when you appended a string onto a StrBuf, it would add to the existing buffer (sharing it) if there was room, or else allocate a new buffer (twice as big).

Theoretically, this is slightly slower than the Concat approach because it has to copy the right hand string. But in the common case of repeatedly appending small strings, it's not bad.

This approach doesn't appear to  be noticeably faster or slower, either in large scale (our entire application test suite) or in micro-benchmarks (creating a large string by repeatedly appending one character). But, as expected, it does appear to use significantly less memory (roughly half as much heap space).

One disadvantage of this approach is that it doesn't help with building strings in reverse order. (i.e. by repeatedly adding to the beginning of a string) That's unfortunate, but in practice I don't think it's too common.

A nice side benefit is that this approach is actually simpler to implement than the old approach. There is no tree, no recursion, and no need for manual tail-call elimination.

Unfortunately, in Java (where I've been working on this) you still have to convert the StringBuilder to a string when required, which makes yet another copy. In cSuneido I should be able to simply create a string wrapping the existing buffer.

Thinking about it some more, I wasn't happy with the extra copying - a lot more than the previous approach. I realized I could combine the two approaches - use a doubling array, but storing references to the pieces rather than the actual characters. This eliminates the copying while still reducing memory usage.


Note: Concats and Pieces cannot be combined because Pieces is shared - appending to a Concats will result in a new Concats that (usually) shares the same Pieces. Concats is effectively immutable, Pieces is mutable and synchronized. (It would be nice if Java let you combine a class and an array to make variable sized objects, but it doesn't.)

However, in the worst case of building a large string one character at a time, this would still end up requiring a huge array, plus all the one character strings. But given the array of strings, it's relatively easy to merge small strings and "compact" the array, reducing the size of the array and the overhead of small strings.

I didn't want to compact too often as this would add more processing overhead. I realized that the obvious time to compact was when the array was full. If there were contiguous small strings that could be merged, then you could avoid growing the array.

I was disappointed to discover that this new approach was about 1/2 as fast as the old approach (when building a 1mb string one character at a time). On the positive side, it resulted in a heap about 1/4 the size (~200mb versus ~800mb). It was a classic space/time tradeoff - I could make it faster by doing less merging, but then it used more memory.

Interestingly, the test suite (a more typical workload) ran about 10% faster. I can't explain that, but I'm not complaining!

It still nagged me that the new approach was slower building huge strings. I tried running it longer (more repetitions) to see if garbage collection would have more of an effect, but that didn't make much difference. Nor did building a 100kb string instead of 1mb (and again doing more repetitions). Then I thought that since my new approach was specifically aimed at building huge strings, maybe I should try increasing the size. Sure enough, building a 10mb string was 4 times faster with the new approach. Finally, the results I was hoping for! Of course, in practice, it's rare to build strings that big.

I initially wrote special code for adding to the beginning of a concatenation and for joining two concatenations. But then I added some instrumentation and discovered that (at least in our test suite) these cases are rare enough (~1%) that they're not worth optimizing. So I removed the extra code - simpler is better.

There are a number of "thresholds" in the code. I played around a bit with different settings but performance was not particularly sensitive to their values.

I probably spent more time on this than was really justified. But it was an interesting problem!

As usual, the code is in Mercurial on SourceForge. Or go directly to this version of Concats.

Wednesday, March 27, 2013

Ilford Marketing Abuse

Can you spot the "unsubscribe" link on this email fragment:

It's at the bottom in tiny dark gray text on a black background. Could they have made it any harder to find?

I can just imagine the conversation: "Ok, so we have to have an unsubscribe link. How about if we make it really hard to see."

The reason they got my email address was that it was required so I could download color profiles for their high end inkjet paper. It seems to me, if you want people to buy your paper, then you want to make it as easy as possible for them to get good results. Not make them jump through hoops and get spammed, just for the "privilege" of buying their products.

They didn't even have a way to opt out of emails when you register, as most registration forms do.

Needless to say, it doesn't make me inclined to buy any more of Ilford's products.

Tuesday, March 12, 2013

Geek Overthink

We rented a car when we were in Virginia and we managed to get a Toyota Prius. The radio was playing and I hate radio as much as I hate TV! I had my music on my iPhone but I didn't have a car adapter. But hey, the car supports Bluetooth audio. I poked around in the menus but I couldn't figure out how to connect. I pulled out the owners manual but there was no mention of audio? I found there was a whole separate manual for the audio system!

It was a little hard to follow the manual because it described multiple versions of the audio system, none of which seemed to match our vehicle. I figured it had to be possible since a bunch of iPhones were listed on the Bluetooth menu. It turned out that might have been part of the problem since the manual said you could link up to five devices, and there were already five. No problem, I'll just delete one. Except the options to delete that were described in the manual didn't appear on our car. I wonder if rental cars are locked down somehow so you can't mess with certain settings?

I gave up at this point. But then I remembered seeing something about USB in the manual, and I did have a USB cable (from the charger). Sure enough there was a place to plug in a USB cable and when I did my iPhone showed up and played perfectly.

The moral of the story is - don't get so wrapped up in the wonderful complexities of one approach that you forget that there might be other (simpler) solutions.

Friday, February 22, 2013

Service is Not a Joke

Some companies seem to have the idea that humour equates to great customer service. (e.g. WestJet and Amtrak)

I disagree.

There's nothing wrong with lightening things up and getting people smiling, but you also need to fix the underlying system. Telling someone a joke while they are on hold for an hour doesn't change the fact that they were on hold for a hour.

And not everyone makes a good comedian. Management can dictate "you shall tell jokes" but that doesn't mean it's always going to be funny. Especially the fifth time you hear it.

Unless you also fix the real issues you're just putting lipstick on a pig.

Of course, that leaves the question - if you can't fix the pig are you better off at least dressing it up?

If this was individuals trying to make the best of a situation then ok. But in most cases it seems to be a top down directive. In other words it's coming from exactly the people that could actually fix the system if they really wanted to.