The Power of Performance


Slow software is costing you money, time, battery life and many times, your customers. Most of the software that powers the world is wasting countless computer cycles (time and energy) unnecessarily. I have been focused on efficient software for more than 30 years and for the last ten years I have dedicated 100% of my time to optimizing code. The positive results seen by my optimization services include increasing battery life, improving usability, cutting costs such as cloud service bills, and opening up new possibilities with improved speed.

Why People Use Inefficient Software:

  • They are unaware that the software is inefficient

  • It is part of an open source library or tool

  • They do not have on-site expertise to optimize the code

  • They think optimizing would be too costly or risky

  • They think optimization involves shortcuts or loss of quality


Kraken is a SAS company which optimizes images. These images are usually for web pages and reducing their size is important for web page responsiveness. I approached because I thought I could help them with performance and possibly offer additional value due to my experience with imaging. The timing of my contact was good because they host their own servers and were about to buy more to handle a growing list of customers. I discovered that they were using several open source tools to do the majority of the work on their servers. These generalized imaging tools were inefficient due to their size (slower to load) and execution speed. I wrote replacements for their most used command line utilities which allowed their service to operate 10x faster for the most common jobs. This speedup allowed them to avoid buying new server machines and dramatically improved the responsiveness of their service for their customers.


A friend connected me to the Astropad team and suggested I might be able to help them with their next software release. They were looking to make some big changes, including improving the performance. Their product mirrors the Mac's display on an iPad Pro so that the Apple Pencil can be used as a drawing device for a MacOS paint program (among other uses). Their program essentially copies/compresses the display memory on the Mac and then decompresses/displays it on the iPad while capturing the input on the iPad and simulating it on the Mac. I profiled their code and rewrote the time critical sections using SIMD instructions (x86 + ARM64). We then brainstormed some ideas and I helped redesign their data compression scheme. The newest release of their product includes all of these changes and is dramatically faster.


Occipital creates 3D camera hardware and mapping software. Their imaging pipeline does mundane tasks such as de-Bayering and color conversion as well as complex algorithms that need to analyze images in real time. They needed a faster way to convert color images from the camera (Bayer pattern) into RGB and YUV format. The chosen algorithm employed multiple decision paths for determining each pixel color to create the highest quality output. The compiler couldn't turn this algorithm into efficient code due to its multiple branches and byte-by-byte processing of the pixels. I streamlined the C code and added SIMD instructions which were able to process multiple pixels in parallel. The final result executes 6x faster than the original.

Logo Black - Blue Lens.png


In order to write code effectively, you must balance many variables and assumptions in your head. Often these assumptions cause you to look past simpler solutions or outright errors in the logic. Things like early exiting from loops, reduction of excess math precision or unnecessary floating point operations, or just making better use of the target platform capabilities can yield substantial performance improvements.


SIMD (Single Instruction Multiple Data)


Compiler vendors give the false impression that turning on maximum optimization or auto-vectorization is sufficient to make your C or C++ code run as fast as possible; this is rarely the case. I'm able to craft efficient SIMD in cases where the compiler has no clue how to do it (e.g. anything with a conditional statement in the main loop). This can net you a 2-16x speedup of your critical functions. Today's compilers are very good at instruction scheduling, but not at algorithm interpretation. You know your data better than the computer, but the C language doesn't always allow you to express it in a way that the compiler can use. By creating SIMD intrinsics, I overcome the limitation of the auto-vectorizer, yet take advantage of the instruction scheduler.


I've been writing my own imaging codecs since 1989. My experience with data compression, pixel format conversion and optimized memory access patterns of images can often provide new, non-obvious ways of improving your imaging pipeline. I have also written my own set of popular format imaging codecs with no licensing issues that can outperform open source libraries. These have helped solve performance issues with multiple clients over the years.

In an initial free consultation, I can quickly determine how much I can optimize your company's software. After our first consultation, you can decide what area to focus on and we can get started immediately. Positive results can be seen in as little as a few hours. The type of code that I normally optimize is native code (C, C++, ASM). Feel free to contact me via email or Skype.