|
The following is a set of tips for
MAME programmers to help squeeze the most speed out of their game
drivers. These tips can also be useful to authors of other
emulators since the same principals apply to most game emulation
in general. These tips are not guaranteed to double the speed
of a game, but in many instances can provide measurable benefit.
Each area I cover will not affect the speed of emulation greatly,
but taken as a whole, these tips can be useful for improving existing
drivers as well as helping authors of new drivers avoid problems
in the first place.
MAME's programming environment has many well
designed features to make writing game drivers easier, however,
it is also easy to overlook how the use of that environment will
affect a game's overall performance. The following 2 general
areas try to assist the MAME developer to avoid wasting CPU time
unnecessarily.
Watchdog timers
Most game machines have a watchdog timer which needs
to be read or written frequently or the machine will reset itself.
I have seen in many MAME drivers that the watchdog timer address
is used to call a watchdog_reset_r routine. Once you have
debugged the game and everything works correctly the watchdog timer
will never reset the machine since the smooth, proper operation
of the game prevents this from happening. Leaving this in
the final game driver just slows things down since each read or
write must pass to a special handler routine. Certainly leave
a comment in the driver to document the watchdog address and behavior,
but remove it from the final version of the driver. Dan Boris
has pointed out that this is true for about 99% of the games, but
a few actually need the watchdog reset to be active in order to
function properly.
Shared memory regions
Many multi-CPU games have some sort of shared memory
region between one or more CPUs as a means of communication between
the CPUs and to access shared hardware. If it is possible,
determine which CPU is reading/writing to this area the most and
have the shared memory region reside in that CPU's memory space.
In other words, one of the CPUs does not necessarily need to have
a special handler routine for read/writing to this shared memory
space because it already resides in that CPU's memory. This
can save a considerable amount of time on some games. For
example, reads from shared video memory normally don't need to do
anything special. To see a specific case of this look at \src\drivers\galaga.c
line 105 - since the shared memory region resides in CPU #1's memory
space, and reads don't do anything special, it does not have to
pass through a handler routine for each read done in this area.
This wasted time has a measurable impact on game performance.
Custom I/O chips
Several games incorporate custom I/O chips (especially
from Namco) which have a read/write memory area. Very often,
this memory area has no action associated with writes to it and
only reads need to be sent to a handler. In this case, there
should not be any write handler routine and the custom chip's memory
can reside in the normal RAM memory map to avoid calling a routine
every time a write occurs. An example of this is in \src\machine\mappy.c
line 124. Having these writes pass through a handler routine
wastes time unnecessarily.
CPU speed and interleave
Most game hardware is documented well enough to
know the precise speed of the CPUs. Especially in the case
of 6809 processors since the processor is typically run at rated
speed. If the original game used a CPU clock of 1.536Mhz,
then that's what should be emulated; anything faster will just waste
time in a busy loop.
The flip side of this is for multi-CPU games.
There typically needs to be some interleave between the CPUs
to allow any shared memory variables to be processed at the proper
time. Once the game is running properly, reduce this interleave
value to determine what is really needed.
Character Drawing
There are two optimizations that can be done to speed up character
drawing. The first is to optimize the palette usage. Many
older games did not use all 64*4 combinations of palette colors.
A good example is PacMan. Since only 128 color combinations
are used; all possible character color combinations can be mapped
directly into the working palette thereby avoiding a color lookup
for each pixel. Many of the newer, more complicated games
don't allow this optimization, but always investigate the palette
contents before making any assumptions.
Second, is the screen update loop. I have
seen in many MAME drivers that for each character to be drawn, the
video address is converted into a display x,y through a complicated
series of calculations. These calculations acually waste a
measurable amount of time (especially with Intel's slow integer
multiply and divide instructions). A better way to handle
this is to create a lookup table which maps the video address to
an x,y coordinate or a video buffer address (whichever is more convenient).
Something else to note related to divides - never use the modulo
"%" operator when working with powers of 2 since you can't
assume that the compiler will turn it into a logical AND versus
the much slower divide (e.g. x = i % 32 vs. x = i & 31).
Memory Usage
Many games have multiple layers of graphics which must be combined
to form the final image. The thing to consider when writing
the video render code is that the less memory is changed, the faster
the code will run. In other words, use as few
buffers as needed to get the job done since the more memory is involved,
the slower the code will run and the more cache misses will occur.
Webdesign
by Deep Magic Studios
- HanaHo Games, Inc. Copyright © 2002 |