Retro means old but cool.

I grew up with the Commodore C64 but was never able to master the machine. I was young, I wanted to play the latest games and let other people do the pioneer work on exploring this incredible hardware. Today I have better skills to catch up on what it takes to code the C64. I will share what I learn along the way. Enjoy the trip to the past!

Hardware Basics Part 2 - A Complicated Relationship

ϕ - the ruler of the system clock 

We have learned that various signals are generated somewhere in the Commodore C64, namely the Color Clock, Dot Clock and System Clock. And we also learned that we have exactly one period of the system clock - a so called CPU cycle - to draw 8 pixels to the screen. 

This already implies that the VIC-II must be somewhat involved during each Cycle - it is our video chip after all. And it actually has a much bigger role than you might expect. In fact I have not explained in the last article how the Dot Clock is transiting to the System Clock other than it is somewhere divided by the number 8. 

You may be surprised to learn that the actual origin of the system clock signal is not the CPU 6510 but in fact the VIC-II. The VIC-II generates all clock frequencies for accessing the data bus for both, itself and the CPU. Regulated by our crystal Y1 the VIC-II receives the Dot Clock signal which it converts to ultimately define the system clock rate which the VIC-II then sends out to the CPU 6510. The 6510 uses this unmodified signal to set the system clock rate for all other components in the system. 

To get a better imagination, lets look at the VIC-II and the 6510, and I mean that we look at the actual integrated circuits sitting on your C64 main board. A few pins on each of the two chips are connected with each other. 

It all breaks down to the symbol ϕ which is the Greek letter Phi and  pronounced just like the english word "Fee". 

The VIC-II has three related Pins labeled ϕ0, ϕColor,  and ϕIN  and the CPU 6510 has two Pins labeled ϕ1 and ϕ2 - the interplay between those signals result in the system wide clock frequency. 

ϕIN and ϕColor are Input Lines where the ϕColor pin receives the Color Clock signal and the ϕIN pin the Dot Clock signal. The ϕ0 pin is an output line that sends out the Dot Clock signal divided by 8. That pin is connected to the CPU 6510 pin labeled 0IN and finally with this received the CPU propagates the unchanged system clock rate of 985.245 kHz on PAL respectively 1022.7 kHz on NTSC machines to all other components via the output pin ϕ2.  But it does not end here - we get now to the delicate part of the VIC-II / CPU 6510 relationship.

Dynamic Logic and the two phases

From a system wide perspective the ϕ2 signal from the CPU 6510 - also referenced as Phase 2 - is of major importance for timing anything in the system.   

The duration of a single period of ϕ2 is further divided into two phases - a phase where ϕ2 is set low and a phase where ϕ2 is set high. In the first phase, the low one, the VIC-II is allowed to read from the data bus and in the second phase it is the CPUs turn to access the bus, in fact as opposed to the VIC-II the CPU is eligible for both, read and write access. The two chips can not access the bus at the same time so VIC-II and the CPU have to take turns. This strange sounding principle is in fact an innovation called Dynamic Logic invented by MOS at a time when accessing the data bus on other systems was done using so-called Static Logic patterns. Effectively Dynamic Logic could be twice as fast as Static Logic implementations when used right.

You would now think that VIC-II and CPU 6510 share their phases within ϕ2 and everybody is happy but this is not entirely true.  

There are two occasions  where the VIC-II actually needs more time than a low phase of ϕ2 provides: 

  1. At the start of every 8th line starting with the first visible line on the screen the VIC-II needs 40 additional cycles to fetch character pointers.
  2. When Sprites are involved, the VIC-II needs additional 2 cycles per activated Sprite.

Lets think about this. If the VIC-II steals 40 CPU Cycles on every 8th line it means that we only have 23 Cycles left on PAL respectively 25 Cycles left on NTSC machines for CPU instructions on every of those lines. The VIC-II work is not affected of course - after all it just greedily takes all Cycles it needs to complete the Raster Line. Since we have less Cycles for CPU related work left those lines are called Bad Lines. While they are the origin for lots of trouble they also opened up interesting ways to exploit how the VIC-II outputs to the screen. 

How does the VIC-II prevent the CPU to take its turn? 

Let's look at this diagram illustrating the way of the signals and the lines responsible to manage the ϕ2 phases - note that the pins below are not accurately drawn in respect to their real position at either chip.

Flow of the various signals to generate the clock rate and to control the Dynamic Logic architecture for memory access.

Pins involved in the Frequency Flow 

  1. he Y1 Chrystal generates the initial frequency of 14.31818MHz (NTSC) respectively of 17,734475MHz (PAL)T
  2. Then the sub-color frequency of 3.58Mhz (NTSC) or 4.43MHz (PAL) is sent to the ϕCOLOR pin of the VIC-II to generate the appropriate video output. In a separate process, the Dot Clock Rate of 8.18Mhz (NTSC) respectively 7.88Mhz (PAL) is provided to the ϕIN line of the VIC-II. 
  3. The Dot Clock signal gets divided by 8 within the VIC-II and moves on via the Output Line ϕ0 to the CPU 6510
  4. the 6510 receives the System Clock Rate at ϕ1 and directs it for output to ϕ2 so finally the unmodified System Clock Rate of 1.023MHz (NTSC) or 0.985Mhz (PAL) is propagated to all other integrated circuit via ϕ2

Dynamic Logic supporting Pins

  • A - the BA pin at the VIC-II is wired to the CPU 6510 RDY pin. If BA is set high we have normal operations and the CPU knows it can do its read/write access to the data bus during a high ϕ2 phase. If the VIC-II needs more cycles because it is for instance fetching Character Pointers on every 8th line, BA is set low and by that RDY on the CPU side becomes low as well. When this happens, the CPU is finishing any remaining write operations and then stops accessing the data bus.  
  • B - the AEC line signals the current phase within ϕ2. It is set low when the VIC-II accesses the bus and set high when it is the turn of the CPU. When the BA line is set to low, so is the AEC line.


How the VIC-II defrauds the CPU 6510

From the diagram above you can see that the VIC-II has control over the CPU access to the data bus. It can simply pause the 6510 activity by setting BA low. When the VIC-II has finished its extra work it returns BA to high indicating normal operations.

Let's recap the order of events during the interplay of VIC-II and CPU 6510: 

  1. Normal Operation: VIC-II and CPU share happily access to the data bus. When ϕ2 is low, it's the VIC-II turn, when ϕ2 is high, then it is the CPU 6510 turn. As long as the BA line of the VIC-II is set to high, this is what happens.
  2. VIC-II Stun: VIC-II  needs more time to fetch Character Pointers or take care of Sprites. It ets BA low effectively telling the CPU that it is not supposed to do anything for some time on the data bus. The CPU finishes any remaining write operation which can take between 1-3 Cycles 
  3. VIC-II TakeOver: After the CPU has finished writing ultimately the VIC-II is in control. It can now read from the data bus as long as required. For fetching Character Pointers that would be a time of 40 Cycles.
  4. VIC-II Release: Now that the Character Pointers are fetched, the VIC-II sets BA back to high and the CPU is now informed that we are back to normal operations. 
  5. Back to Normal Operation: VIC-II and CPU again take turn in accessing the data bus within the two phases of ϕ2 .


What is the big deal?

The problem with this relationship between the VIC-II and the CPU 6510 are two things. First of all the CPU will finish any write access before it goes into "sleep mode". Write operations take up 1-3 Cycles depending on the instruction. This is an uncertainty we need to handle eventually. And of course, the other problem is that by stealing a number of cycles for VIC-II operations on a Raster Line, we have accordingly less CPU cycles in total left for CPU routines.

Often this is not an issue but when you reach for the stars and want to build great VIC-II effects which needs to be cycle exact then you need to take Bad Lines eventually into account and figure out how to work around or even with them. 


Hardware Basics Part 1 - Tick Tock, know your Clock

I see clocks everywhere - what is all this?

Have you ever wondered why NTSC and PAL Commodore C64 have a different total number of available screen lines or why the NTSC model has a faster system clock rate? Do you know what a clock rate actually is? If not, then the next two articles are a great opportunity for you to shine in the next discussion on the architecture of 8-Bit computers.  

Y1  - the heartbeat of the Commodore C64

It all starts with Y1. Y1 is a crystal oscillator that uses the mechanical resonance of a vibrating crystal to create a signal with very precise frequency. Our Y1 Crystal in the Commodore C64 delivers a frequency signal of 14.31818MHz (NTSC) respectively of 17,734475MHz (PAL). This signal is also called the Color Clock.

The Y1 Crystal on NTSC Machines

Why Color Clock? Why did Commodore use crystals which generate exactly either of those two frequencies? When you divide 14.31818 by 4 you get 3.579545 and when you divide 17.734475 by 4 it results in 4.433618. Do those numbers eventually ring a bell already? 

3.58Mhz is the color subcarrier frequency for the NTSC system while 4.43MHz corresponds to the color subcarrier frequency for PAL. Those frequencies are responsible how colors are displayed on our each TV systems. So that's the Color Clock.

An integrated C64 circuit called a Dual Voltage Controlled Oscillator further generate a 7.88MHz (PAL) respectively 8.18MHz (NTSC) Dot Clock from this signal. Of course you wonder what a Dot Clock is. 

A Dot Clock defines how many pixels can be drawn on the screen per each refresh.  It is just simple math of the numbers of raster lines multiplied by the system refresh rate multiplied by the number of available pixels per each row. The properties of the C64 screen and the respective color standards lead to the required Dot Clock Rate which the Oscillator will generate for us. 

PAL: 50.125Hz (refresh rate) * 504 Pixels * 312 Lines = 7.88MHz
NTSC: 59.826Hz (refresh rate) * 520 Pixels * 263 Lines = 8.18MHz

This difference between PAL and NTSC systems leads to the problem that certain routines written for a PAL C64 may run 20% faster on a NTSC system while NTSC routines seem to be 16% slower on PAL. That is especially notable when comparing many SID musics composed on one machine but played back on the other, e.g. in games or intros. Interesting enough, while you would assume that this is a problem of the past, I want to remind you that many videogame console titles from Japan or the US where often not satisfactory adapted for the PAL market. Many PAL games, e.g. on Nintendo 64 ran 16% slower than the NTSC version of the same game on a US or Japanese Console - the reason is sloppy work on adapting to the PAL standard.  

Back to the C64. Luckily many demos and a few games make sure they detect the underlying system correctly and change routines and timing in the code respectively. The common approach for commercial games was to create two versions of a product for NTSC and PAL markets though. This explains why there were so many PAL-Fixes for games across systems in the old days. PAL system owners wanted to play the latest US-Games when they were released and not wait for adapted versions released 6+ months later. Nowadays it is not a problem anymore as HDMI and 60HZ are standard for TVs and supported by all current generation consoles.

So we have come from the Color Clock to the Dot Clock and the final destination would be the System Clock which defines a CPU Cycle. The C64 is an 8-Bit Machine. This limits it to display 8 Pixels per each CPU cycle. One CPU-cycle corresponds therefor to 1/8 of the Dot Clock. Let's confirm this.

PAL: 7.88MHz / 8 = 0.985Mhz
NTSC: 8.18MHz / 8 = 1.023MHz

This seems to be right with what we read about the system clock rates on Commodore C64 NTSC and PAL machines.

Congratulations! You are now able explain why the C64 has a certain system clock rate and that one period of this rate corresponds to a CPU Cycle which is the time required to put 8 pixels to the screen. Let's do one more calculation to complete the picture!  

How many CPU cycles do we need to generate one line of Pixels and how many CPU cycles are therefor theoretically required per each screen refresh? We know that 8 Pixels can be drawn per Cycle and we know the number of pixels per line - so this is all easy now.

PAL: 504 Pixels / 8 = 63 Cycles per Line * 312 Lines = 19656 total CPU Cycles 
NTSC: 520 Pixels / 8 = 65 Cycles per Line * 263 Lines = 17095 total CPU Cycles 

Those are also numbers you might have encountered already somewhere - even on this site.

Why is all this important in the first place?

It is first of all a good prelude to the next part which will further dissect the system clock rate and its origin in a delicate relationship between the VIC-II and the CPU 6510. We will soon learn that 63 available cycles per to be generated raster line on PAL respectively 65 cycles on NTSC machines turn out to be insufficient for certain VIC-II tasks which results in side effects you should be aware of - side effects that will usually be notable in demos and games.


Ram under Rom - a brief look into C64 memory

64k is enough for anyone

Imagine the C64 world to be a very long street with exactly 65536 properties. You usually have bungalow with one floor or apartment buildings with two floors occupying each a piece of land.

Each building has a house number or even a range of house numbers starting from 0 up to 65535 ($0000 - $FFFF). 

If we translate the actually kinda weak metaphor to software engineering we have first of all 65536 memory locations in your Commodore C64 and each can hold one single byte - this is the C64 RAM. On top of this RAM there are a few areas with read only memory (ROM)  which hold the BASIC interpreter, the KERNEL Routines and the Character Generator ROM. Last but not least there are some  I/O mapped areas which basically means that memory locations are mapped to features in the I/O chips, e.g. in the SID and VIC-II. This overlapping memory design is often referred as "RAM under ROM" and addressing chips via memory address registers is called memory-mapped I/O.

That also means that as a C64 coder you for the most part are concerned about reading or writing to some location in memory. 

The Memory Layout when the C64 is turned on

The initial memory configuration when turning on the C64 is targeted to BASIC programmers though there is 4Kbyte of RAM not used by BASIC and dedicated for machine language from $C000 to $CFFF. For games and demo programming, this is obviously not the best configuration though.

The C64 Memory Configuration when turning on the machine. 

To read from addresses which are located under ROM you have to briefly switch out a portion of ROM overlapping that area.  If you would not do that, any READ access to those locations will  return values from ROM but writing to the same addresses will put information into the respective RAM under ROM.

As opposed to other 8bit-computers using the 6502 processor, the C64 has a slightly modified version of the CPU called 6510. The only significant difference to the standard 6502 is the possibility to choose between different memory layouts which makes the Commodore C64 very flexible. This is feature you use to switch out ROM overlapping RAM as required. 

Mapping to the Chipset

There is a special block of ROM overlapping the RAM from $D000 to $DFFF. First of all, there is the 4K large Character Generator ROM. The name sounds more complicated than it actually is. This area simply carries all available characters and symbols you can put on your screen using your C64 keyboard. Actually it's four different character sets each 1K in size. Since one single character is made of a 8x8 bits matrix every 1K-set holds 128 single chars. Not all chars are alphanumeric though, as there are also tons of different symbols available to print to the C64 screen. Additionally there are different version of each character stored in the Character Generator ROM: shifted, unshifted and reversed where reversed means that background color and character color are replaced with each other.  

In the same address space starting at $D000 the I/O mapping to the C64 chips take place. The Video Controller VIC-II, the SID and also both CIAs are simply accessed by reading and writing to memory adresses in that area. The 1000 bytes of Color RAM maps the positions in Screen RAM 1:1. This means you can define colors in any position of the 40x25 screen matrix. Of each Color RAM byte only the lower nibble is actually used so smart programmers can utilize another 512 bytes by using the higher nibble of each Color RAM byte for something different.  

The i/O Map  - The memory area from  $D000 to $DFFF is pretty crowded



  • There is 64 Kbyte RAM...
  • ... plus 20 Kbyte ROM (Read Only) for BASIC, KERNEL and Character Generator ROM sharing address information with RAM
  • To read any byte in RAM which is "under ROM" you have to switch out the section of ROM which overlaps that particular RAM area. This ROM switching concept is the major difference between the 6510 CPU in the C64 and the 6502 CPU used in other 8Bit-Computers. The concept also allows many different memory configurations which can be optimized towards whatever use case needs to be taken care of.
  • The I/O chips are memory-mapped. You access features of SID, VIC-II etc by reading and writing to memory addresses.

Update June 14, 2014: In an earlier version I mistakenly assumed that there might be an additional 512 Bytes of spare bytes available due to the fact that Color RAM only uses one Nibble to set color information. As it turned out (thanks Csaba!), the Color RAM is organized as 1024x4 Bit Static RAM, so there is no High Nibble to access, hence, there is no spare RAM here.