Jack Bresenham invented a low computation line drawing algorithm many years ago. The benefit of his algorithm is that it uses only integer variables and only addition and subtraction operations to calculate the pixel at each point along an arbitrary line. It's a very efficient method and has become the 'standard' way of drawing a line on computers.
On a display device like the SSD1306 OLED, the image is changed by sending data (and commands) serially over the SPI or I2C bus. When the Bresenham algorithm is written for the SSD1306, the traditional way to implement it is to transmit each pixel (set the memory position and write the byte containing the pixel data) one at a time to the display. Essentially, to change a single bit of the display memory, at least 8 bytes need to be transmitted to the display. They are:
0x00 (0xb0 + y) - set y offset 0x00 (0x00 + (x & 0xf)) - set low 4 bits of x offset 0x00 (0x10 + (x >> 4)) - set high 4 bits of x offset 0x40 (image byte) - write actual byte that changed
A simple horizontal line across the width of the display (128 pixels) will require 1024 bytes to be sent to the display controller. There's additional overhead in initiating and terminating the individual command and data messages. My point in providing these details is that it's going to run rather slowly. The purpose of this experiment is to optimize the process to minimize the number of memory accesses and bus transfers. As always, I take optimization to the extreme.
The first step is to conceptualize the memory layout of the SSD1306 (image borrowed from SparkFun website):
The memory is arranged in rows of 128 bytes (0 <= x < 128) where the least significant bit is the topmost pixel (e.g. y = 0). The 8th pixel row will be the least significant bit of the next row of 128 bytes and so on down the display. When writing to the memory, the destination address is automatically incremented horizontally. This means that after setting the starting position, bytes can be written from left to right across 8 pixel rows with a single bus access. With this information, I can plan the best way to optimize line drawing. Bresenham's algorithm breaks the task into two situations - lines where delta Y is greater than delta X and the reverse. It makes sense to use these two halves of the algorithm to optimize for each direction. If a line is vertical or nearly vertical, then it makes sense to combine multiple pixels into a single byte before transmitting it. If a line is horizontal or nearly horizontal, then it makes sense to work from left to right and write as many sequential bytes as possible (which contain 8 pixels/rows each).
The way the Bresenham algorithm works is that the 'major axis' increments by 1 each iteration and the minor axis is incremented when an error term passes 0. This error condition can do double-duty to know when to write the changed pixels to the display. In the vertical direction, we can set or clear individual bits in a byte as Y is incremented. We can also compare the new value of the byte to the old and skip transmitting it if nothing changed (e.g. drawing a pixel on top of an existing pixel). Here's the prelude to the Y-major loop:
In the code above, I'm setting up a pointer to the backing buffer that I keep (a copy of the display memory), so that I can change single pixels without disturbing the other pixels in each byte. The bOld and bNew variables are so that I can determine if I've changed pixels in the current byte to avoid transmitting it if nothing changed. The mask variable holds a single bit rotated to the correct position for the current row. This will serve a second purpose. When the bit has been shifted out, mask will have a value of 0 and I can use this to detect when to switch to the next byte row. Here's the Y-major loop:
As long as the line is being drawn vertically in the current byte, only a local (register) variable is affected. If the x changes, or we pass into the next byte row, the byte is compared to its original value and if different, gets written to the display. At the end of the loop, I also need to check for any changed pixels that still need to be written. This code is obviously more complicated than the original Bresenham algorithm, but avoiding data writes saves a lot more time than the extra cycles used by the complexity of this code. In the best case of a vertical line, this will write 1/8th as much data to the display compared to drawing it the traditional way. As a possible further optimization for this code, the SSD1306 does support a vertical addressing mode which increments the destination pointer by 128 after each byte is written. This would really only be useful for a perfectly vertical line since any deviation left or right would require the write position to be explicitly set again.
Now let's look at the X-major code:
In this case, I need to take a different approach since we're working horizontally from left to right. I can't really test if setting each pixel changed the value of each byte because every step through the loop we advance to the next byte. If some bytes are changed and others aren't, it would be more costly to set the position of the write pointer multiple times. In this case, I keep track of where the pointer to my internal buffers starts its work and keep going until the Y value crosses into the next byte row (when the mask value becomes 0). When this occurs, I transmit all of the bytes I've changed in a single shot. For example, if the slope of the line is such that for every 8 horizontal pixels, the vertical pixel changes by one, I will potentially be able to write 64 pixels in a row before I have to send them to the display. After the loop finishes, I need to check for any unwritten pixels as well.
I've updated the code from my previous blog post to include this line function and demo it in this video:
Here's the Github repo for it:
Please share any thoughts in the comments if you have an idea that can speed it up further.