Inaugural CocoaHeads Reston Meeting

Tonight the first CocoaHeads Reston meeting was held at Near Infinity Corporation. Jason Harwig gave an excellent presentation on WebKit, starting with a simple browser created completely in Interface Builder (it consisted of a text field linked to a web view). I’ve recreated it and included a snapshot below.

Next he demonstrated how to interact with the DOM and JavaScript. The JavaScript integration was in both directions, first he demonstrated calling JavaScript functions from Objective-C and then calling Objective-C from JavaScript. It should be noted that for JavaScript to call Objective-C you need to implement the + (BOOL)isSelectorExcludedFromWebScript:(SEL)aSelector; method and return NO for each of the methods you wish to call.

Finally, Jason walked through various uses of WebKit an demonstrated an application specific, full-screen browser he wrote for a game as well as a del.icio.us Cocoa application he wrote called Delish. Several other webkit based applications were also covered such as Fluid – a freely available site-specific browser creator, PackRat – a commercial site-specific browser for Backpack, and another application which I’ve forgotten (I guess I should’ve taken notes). As an aside, Jason brought up the interesting work that the 280North folks have done for their 280Slides web-based presentation called Objective-J. Objective-J (code named Cappuccino), is an Objective-C like language built on JavaScript.

Matt Wizeman stepped up next to demo a WebKit application he’s developing to wrap a troublesome time tracking web application. It was interesting the machinations he had to go through to know when the page had finished loading and to transition between pages since the application was JavaScript heavy, using tricks like clickable divs.

I learned quite a bit about WebKit. As soon as I got home I was energized to play with it.

Be sure to come to the next meeting on July 10th (day before iPhone 3G day)!

Yesterday’s Storm

ErinGary has NEXRAD images and commentary of yesterday’s storm here in Northern, VA. For about half-an-hour yestersay, the folks at work milled around in interior offices, hallways, and the stairwell while my family at home hid out in the basement. Thankfully there were not much damage in my neighborhood, though I saw several fallen trees on homes during my commute home.

Learn a New Programming Language

As Chad Fowler says, “The best reason to learn a new programming language is to learn to think differently.” Here in Northern VA, we have a new opportunity to think differently with the introduction of the NoVA Languages group. Chris Williams from Iterative Designs posted to the NoVA RUG mailing list this morning and received quite a bit of interest. Here’s what he has to say about the new group:

Thoughts on the makeup of the group include obtaining (however you want) a book, working through the book 1 chapter per week on one night of that week with a group of like minded individuals.

To start, the group will work through Joe Armstrong’s Programming Erlang book starting on Monday, June 16th (hopefully I’ll be able to work out a schedule with my wife so I can attend). This is an excellent choice on several levels: every developer should know a functional programming language; single core processors are increasingly rare and the number of cores in commodity hardware should only increase in the coming years; and I already own the book.

If Erlang doesn’t pique your interest, the NSCoderNight DC group is going to work through the 3rd Edition of Aaron Hillegass’s Cocoa Programming for Mac OS X book on Tuesday nights. For several months now, I’ve been playing around with Cocoa and while I’m getting used to the syntax of Objective-C, the XCode IDE and Interface Builder still seem foreign to me (I never liked IDEs having been weened on Emacs).

It’s a great time to be a programmer in Northern Virginia!

Update: It turns out that there’s another Cocoa group right around the corner from me in Reston: CocoaHeads. They meet the second Thursday of each month.

Startups 101

Refresh
DC

Tonight I had the opportunity to attend a special Refresh DC meeting on the challenges of starting your own business. The format was a panel of folks from the DC startup scene moderated by Jackson Wilkinson of Viget Labs. The panel consisted of Brian Williams of Viget Labs, Andrew Lee of Publi.us, Eric Rupert from Odeo, Eddie Frederickof Hungry Machine, and Sean Greene of LaunchBox Digital.

These were the main points I took away:

  1. Be passionate. The most important quality for startup founders is to be passionate about their product or service. Starting a company can be exhilarating but the setbacks can be really difficult. If you’re not passionate about what you’re doing each of these roadblocks will be an excuse for you to quit. Figure out what you’re passionate about and work on that.
  2. Focus on your product. Don’t focus on the pie-in-the-sky potential valuations of your company or worry about leasing office space, hiring an attorney, finding an accountant, etc. Instead you should focus on the product or service you’re going to sell. Get something out there quickly — pick the most important feature and get it out there in front of your customers quickly.
  3. Have your customers influence your product. If you follow the previous advice and get your product out there quickly you can use customer feedback to iterate your product. You may not know all of the difference ways customers will use your product, so this feedback is critical in charting your product roadmap.
  4. Equity is control, don’t surrender it easily. One of the attendees asked about using equity to pay for services if money was tight. Brian said to avoid giving out equity as you make that person a partner in your business. While they may not have a controlling interest, they are still an owner and have some influence.
  5. Don’t overfund. All of the panelists warned of the dangers of taking too much money as it does more than dilute the founders’ ownership. Venture capitalists are looking to make a large return o their capital for themselves and their limited partners. If you need to pay a vendor or contractor find another way — defer payment, use credit, etc.
  6. Hire slow, fire fast. Early employees can make or break your company. You will be working long hours aside these folks so you must ensure they’re a good fit. If you make a bad hire, you need to resolve the situation quickly, don’t let emotion get in the way.

Some of the books recommended were: Art of the Start, Getting to Yes, and Founders at Work. Andrew also recommended Startup School run by Y Combinator (YC). I’ve been reading Paul Graham’s essays for around seven years now and following YC’s investments. I’m glad to see an early stage investor like YC in the DC area (LaunchBox Digital) and hope that the startup scene in DC and suburbs becomes more vibrant.

Thanks to Strategic Analysis for hosting this. It was a great venue and hope they’ll offer to host Refresh DC again.

Performance Tuning Network Applications

Recently at work I spent a few weeks tuning a network service across three platforms (Solaris, Linux, and AIX) to get within 10% of the theoretical maximum throughput. In this short article, I’ll walk through the various tools I used to improve the performance of the application.

This application is very specialized in that the two machines are connected directly through an ethernet switch. This means that the MTU could easily be determined from each end of the link and the extra work to determine the maximum segment size for the transit network (see RFC 1191) was unnecessary. This also made it very easy to watch the traffic between the two hosts as well as the system calls they were using to transfer and receive the data.

Before I get into the steps I took to tune the service, I’d like to introduce the tools used:

  • Truss: a tracing utility which displays system calls, dynamically loaded user level function calls, received signals, and incurred machine faults. This is available for many platforms, but I use it most on AIX.
  • DTrace/DTruss: a dynamic tracing compiler and tracing utility. This is an amazingly powerful tool from Sun, originally for Solaris but slowly spreading to other platforms. See Sun’s How To Guide.
  • strace: a dynamic tracing utility which displays systems calls and received signals under Linux.
  • mpstat: collects and displays performance statistics for all logical CPUs in the system.
  • prstat: iteratively examines all active processes on the system and reports statistics based on the selected output mode and sort order.
  • tcpdump: a utility for capturing network traffic.
  • Wireshark: a network protocol analyzer. It replaces the venerable Ethereal tool and allows you to either capture network traffic on demand or load a captured session for analysis. Find out more here.
  • gprof: a tool for profiling your code to determine where the performance bottle-necks are. See the manual for more information.
  • c++filt: a tool for demangling C++ method names. It is part of the GNU binutils package.

Since I already had the service up and running, I simply ran the two components and captured the traffic between them using tcpdump. While the processes were running, I also used dtruss, truss, or strace (depending on the platform) to capture the system calls being made. Since this is a network service, I focused on calls to select, send, and recv.

13455/15:   2143177    2994      4 pollsys(0xFFFFFD7EBADDB910, 0x1, 0xFFFFFD7EBADDBA30) = 1 0
13455/15:   2143180       5      0 pollsys(0xFFFFFD7EBADDB8D0, 0x1, 0xFFFFFD7EBADDB9F0) = 1 0
13455/15:   2143185       8      4 recvfrom(0x11, 0xB384A0, 0x10000)                    = 1416 0
13455/15:   2143253       5      0 pollsys(0xFFFFFD7EBADDB8D0, 0x1, 0xFFFFFD7EBADDB9F0) = 0 0
13455/15:   2143262      12      8 send(0x11, 0xB084D0, 0x14B8)                         = 5304 0
13455/15:   2143268     365      4 pollsys(0xFFFFFD7EBADDB910, 0x1, 0xFFFFFD7EBADDBA30) = 1 0
13455/15:   2143270       4      0 pollsys(0xFFFFFD7EBADDB8D0, 0x1, 0xFFFFFD7EBADDB9F0) = 1 0
13455/15:   2143275       8      4 recvfrom(0x11, 0xB384A0, 0x10000)                    = 1416 0
13455/15:   2143343       5      0 pollsys(0xFFFFFD7EBADDB8D0, 0x1, 0xFFFFFD7EBADDB9F0) = 0 0
13455/15:   2143348       9      4 send(0x11, 0xB084D0, 0x14B8)                         = 5304 0
13455/15:   2143353    1000      4 pollsys(0xFFFFFD7EBADDB910, 0x1, 0xFFFFFD7EBADDBA30) = 1 0

Looking at the results above you can see that select (pollsys) is being called each time we need to send or receive data over the network. Since the socket is non-blocking we can rely on the immediate return when the outgoing socket buffer is full as well as when there is no data available to read. By selecting at the very top of the receive loop we can bundle multiple receive calls together, increasing the application’s throughput. Now the output looks like this:

16712/9:     16202    1560      6 pollsys(0xFFFFFD7EBB9DB940, 0x1, 0xFFFFFD7EBB9DBA30) = 1 0
16712/9:     16217      10      6 recv(0xB, 0x8A6450, 0x10000)                         = 1416 0
16712/9:     16246       9      5 send(0xB, 0x876480, 0x540)                           = 1344 0
16712/9:     16267       7      3 send(0xB, 0x876480, 0x540)                           = 1344 0
16712/9:     16285       5      1 send(0xB, 0x876480, 0x540)                           = 1344 0
16712/9:     16680      10      5 recv(0xB, 0x8A6450, 0x10000)                         = 1416 0
16712/9:     16712      11      7 send(0xB, 0x876480, 0x540)                           = 1344 0
16712/9:     16733       7      3 send(0xB, 0x876480, 0x540)                           = 1344 0
16712/9:     16753       6      2 send(0xB, 0x876480, 0x540)                           = 1344 0
16712/9:     16768       4      0 recv(0xB, 0x8A6450, 0x10000)                         = -1 Err#11

You’ll notice that now we are able to process two requests and send out six responses in the time that it previously took to call select and receive a single request. When there is nothing left to read, the call to recv returns errno 11 (EAGAIN). This change made the single biggest performance impact on the code. I also changed the calls recvfrom to recv since the application did not make use of the foreign address.

At this point the performance was much better but I noticed that under heavy load the sending socket would block as the ratio of requests to responses was 1:3. As this was a UDP application, having the sending buffers fill up seemed strange as we assumed that additional packets would simply be dropped on the floor.

On the server, I checked the UDP socket buffer size using ndd (this was under Solaris. For AIX the command is no and for Linux the command is sysctl).

The following code was added to the socket initialize (minus the error handling) to ensure that the socket buffers were large enough.

unsigned size = 1024 * 1024; // 1MB
int ret = setsockopt(desc, SOL_SOCKET, SO_SNDBUF, &size, sizeof(size));
    ret = setsockopt(desc, SOL_SOCKET, SO_RCVBUF, &size, sizeof(size));

Now that the application was performing acceptably I decided to run it under the profiler. This turned up the function which was adding responses to the in-memory packet. It turned out that as responses were being added to the packet, the headers were being recalculated each time. I removed this unnecessary work and only made the calculations right before the packet was sent. This improved performance a few percentage points more.

By binding the network interrupts to a particular core and keeping the sending thread off of that core we were able to eek out additional performance from the application. To accomplish this, the application allows the operator to specify which core(s) it should bind to using sched_setaffinity (Linux) and processor_bind (Solaris). You can also accomplish this using taskset (Linux) and pbind (Solaris) if you don’t wish to modify your application.

Looking at the network traffic with tcpdump, I saw that I could fit an additional response in the response bundle packet if I reduced or removed some of the items in the packet header. At this point the analysis and tuning had gone on for a few weeks and we had a schedule to meet. Since the performance was where we needed it, the application was wrapped up and sent to quality assurance.

The single most important lesson I learned from this exercise was to use non-blocking sockets to their fullest by continually calling recv/send until the call would block and then using select to idle the process until there is work to do.