Google has announced the final evaluation result and the integration work is underway—my GSoC2011 journey with Jitsi has approached to its final period. Now it’s the right time to reminisce about this unique experience.
I can’t forget the moment when I received the email from Google saying that I had been accepted by Jitsi as a GSoC2011 student. The official announcement time is 19:00 UTC April 25, which is 03:00 here in Beijing. I spent all night waiting for the final moment with butterflies in my stomach. When the announcement came, I was even too excited to fall asleep.
After immersed in the excitement for a week, coding period started.
The project is to implement support for wideband codecs, to be specific, SILK, in Jitsi. So why wideband codecs? Thanks to the advancement of Internet access technology, the available bandwidth has increased a lot, and the major concern of VoIP developers has shifted from using less bandwidth to supporting better quality. Imagine the quality of music rather than speech, that’s cool. And then why SILK? Well, to be honest, Skype has done pretty well in the VoIP industry, and they share their main audio codec, SILK, to the open source community. And we are eager to have it in Jitsi.
The main work is to import SILK from C to Java. Sounds very straightforward, right? It may be easy to start, but you may suffer to make it work well. We’ve decided to translate the C code from scratch rather than using JNI technique. There are some important notes to be taken care of in the translation process.1). Pointers. The C programs use pointers intensively. However, there is no pointer in Java. So you should be careful when encounter pointers. Generally speaking, when the pointer is of complex type, e.g. struct, you can use an object reference in Java to translate it. However, when the pointer points an array, you should use an array reference and a data offset in addition. 2). Callbacks. C supports function pointers, which make callbacks flexible and powerful. Again, no pointers in Java, so you should design carefully based on the interface technique to support callbacks.3). Unsigned data types. There is no unsigned data type in Java, so you should be careful when unsigned data is involved in the operation.4). Shift operations. Java has separated operators for logic shift and arithmetic shift operation. 5). Endianness. Java is big endian, and usually the input data is in the format of little-endian. 6).Float point arithmetic. Subtle problems will arise when float point operation is involved.
The coding work was finished nearly in the mid-term, and then came the test. I should say that test really takes time. The test and debugging can easily take more time than you expect. So I think you should always remember to assign more time than you predict for debugging and test. In the first phase, it was not difficult to clear the bugs and made the program run. Though it gave no error messages, the result was not correct. For example, when I played the music which was encoded/decoded by the codec, the last part of the audio was not clear at all. At first, I tried to find the bugs from top down—debug step by step, and compare the intermediate result with that of the C version. But I found it was really difficult to determine whether they match or not when float point arithmetic operation was involved. Since the float point operation result is not exactly the same in Java as in C, when the intermediate result doesn’t match, it’s difficult to judge whether it’s because of a bug or just because of the float point operation itself. After discussion with my mentor Lyubomir, I decided to use another approach—a down-top approach. Get the intermediate result from C, and hard-coded it into Java to test whether there are bugs in the following parts. By this approach, I finally located the bug. It resides in the re-sampler part, and this explains why only part of encoded/decoded audio is incorrect.
Finally, how to test a codec? I think there are basically two methods. The first is to write a script to compare the result with a reference signal to see whether they match with each other. And the second is by hearing. Let the user test it. You play the audio result which is processed by the encoder and decoder and compare with the original sample audio. If it’s clear enough, I think you can say the codec works correctly.
The 3 months passed quickly, but the memory will last. Thank Google, thank Jitsi, and especially thank my mentor Lyubomir. I hope I can continue to work with the community in the future.