|
What follows is a guide for emulator authors to help
get the best performance from their emulators. I will follow
the steps I took to squeeze the most speed out of Galaga.
Galaga is a good example to use for this tutorial since it has a
reasonable high level of complexity. Galaga contains 3 Z80
microprocessors running in parallel at 3.125Mhz. It uses display
hardware with 3 layers - Stars, Sprites, and Characters.
After you have optimized the basic design of your emulator, the
next step is to see the maximum performance gain from each component.
It is ideal to have every piece of code as optimized as possible,
but the most effort should be spent on code which takes the most
time. I disabled each module one at a time (such as sprite
drawing) to see what the maximum attainable speed would be if that
module were to not take any time; this way I was able to concentrate
my efforts on the areas which needed it most. Besides following
my basic code optimization rules, the following
are the 8 main components of an emulator and how to specifically
optimize each one:
CPU Emulation
In this case of Galaga, this takes up a large portion of the emulation
time. It is possible to use a multithreaded approach to simulate
micros running in parallel, but since my cpu emulator code depends
on static variables for speed, I chose to use a round-robin approach.
The down side to this approach is the time wasted entering and exiting
(context switch) from each micro. Galaga requires a high level
of synchronization between the micros for the sound to work, so
this was an area that eats lots of cycles doing context switches.
A technique which improved performance dramatically was busy-loop
removal. Busy loops are code which sits in a loop waiting
for an event such as an interrupt. Almost every cpu on almost
every video game I have debugged uses a busy loop to wait for the
next frame or event to begin (signalled by an interrupt).
These are wasted cycles which waste valuable time doing nothing.
I have a check in the inner loop of my cpu emulators (not
published code) which checks for the busy-loop address and immediately
exits when found. This alone increased the speed of Galaga
about 25-30%. I assume that I don't need to state the obvious -
USE ASSEMBLY LANGUAGE WHEN POSSIBLE; the difference between cpu
emulation in C and in assembly can be 2-5X.
User Input
This is more of a Windows issue than DOS. I found some wasted
time in my keyboard message handler which was calling GetAsyncKeyState()
instead of just watching for keypress messages. Every call
to a Windows function eats tons of time just getting there and back,
so avoid/reduce use of Windows function calls. I also use
a Sleep() function in my timing loop when there is enough time to
spare to reduce CPU utilization.
Emulated I/O
This applies to all function calls which take place within the cpu
emulation. In my emulator design, I have a set of flags which
mark addresses for normal or 'special' use (special use being a
function pointer to a handler routine). Galaga has a shared
memory area between the three processors that required a handler
routine. I was having CPU #2 and #3 share memory from #1's
memory map, and had it calling the SharedRead() routine unnecessarily
in CPU #1's context. My point is that there can be lots of
fat to trim in the handler routines as well.
Sprite Drawing
This is one area of the code which I have completely rewritten
at least 5 times. Besides creating the most efficient methods
of drawing/erasing sprites with transparency, there's the not so
obvious point of only drawing sprites which need to be drawn.
Using MAME as a case of what not to do, there are many drivers (especially
for NAMCO games) which test 'flags' which don't exist and end up
drawing every possible sprite when only a few are actually enabled.
Look at the sprite memory map during various phases of gameplay
and you will see what a disabled sprite looks like. Sometimes
the color is set to 0, other times the X coordinate is placed off
the visible area, and still other times there is an actual flag
bit indicating the sprite is off. A little debugging will
solve this and can gain you valuable speed. Another issue
is to use as few memory planes as possible. For example, some
MAME drivers will draw a character plane, then a sprite plane and
later combine them. As my optimization rule #5 states, "The
less memory you touch, the faster you go". Even if the
code becomes a bit more complicated, try to keep all of the drawing
in one memory area (temp bitmap).
Character Drawing
As I've found in most character/sprite games, there are
only a few characters changing each frame. I found that the
most efficient way to handle this is to create a set of flags to
indicate which characters change each frame and only draw them.
There are some games with scrolling regions or multiple layers
which may appear to need every character drawn every frame, but
this is almost never necessary.
Color Optimization
This is an important issue that is sometimes overlooked.
Many games have a palette ROM and a color ROM which look like they
will require 256 or more combinations of colors, and thus require
a table lookup for every sprite/char drawn. Many times a careful
analysis of gameplay will show that it is only using a subset of
the colors and that they will in fact fit in a 256 color palette.
This can increase speed measurably, but not dramatically.
This step I would leave for last since it promises to give the least
benefit for the effort involved.
Sound Emulation
This is basically just common sense. I found that
In the case of Galaga I only need to update the sound 60 times per
second for it to sound good. If you think about this it makes
sense since most sound effects and music would not have any notes
shorter than a 60th of a second.
Video Access
This is probably the slowest part of your emulation code
(at least in Windows). The video memory is considerably slower
than main memory, which means you should limit how much of the screen
is updated each frame. I use a simple dirty rectangle technique in HiVE which divides
the screen into 32 horizontal bars. By only copying the parts of
the display which change, considerable time can be saved.
Some games make this difficult such as those that have star fields
or scrolling regions. Galaga, for example, needs nearly
the entire screen painted each frame because of the stars.
Other games such as PacMan can be highly optimized with this technique
since only a small portion of the display is changing each frame.
Some may be inclined to try drawing directly onto the video buffer
instead of an offscreen buffer to save time. Very few games
would work well with this technique because of flicker. An
example of a game where this is possible is Space Invaders.
Since it has a bitmapped display, it only changes a small portion
in any one frame and so there would be no noticeable flicker.
A game containing sprites would usually not work well because the
sprites would need to be erased and redrawn each frame leading to
flicker.
Webdesign
by Deep Magic Studios
- HanaHo Games, Inc. Copyright © 2002 |