Explanation of how to create an Ultima-style map view turns into an adventure in C64 BASIC optimisation
Games like Ultima have a classic overhead camera view rather than the moving character view that I have been showing in my retro roguelike. How is that implemented?
Jay in the Commodore 64 Ultimate Development & Modifications Facebook group asked:

Here is my answer (which on reflection wasn’t as helpful as it could have been):
The way to do it with c64 characters is the map defines the whole potential area and the “camera view” is a slice of that starting at x, y of the map.
So if the map is 100,100 you need x to x+11, for y to y+11 rows.
I’m sure someone has code already if not I can come back with some
Rather than leave that as it was, I felt I needed to offer a better solution, plus it is a good challenge to walk through in a blog post, so here we are.
As I mention briefly in my response, the main mental split is between the “world map” and the visible portion. We are simulating a viewport or portal into the whole, and solutions will involve taking the correct slices out of the bigger version and pasting them onto the game screen.
X and Y coordinate that represent their horizontal/left and vertical/top position in the world(x, y) position within the wholex to x+10 across and y to y+10 down (ie. 11 tiles each way for an 11×11 window)Another wrinkle, of course, is however we draw it, we also want to center the gameplay on the player character, so there also needs to be an additional offset from the top, left, so the sprite or whatever is in the middle vertically and horizontally rather than always at the top left of our game screen.

You can now follow the tutorials and edit the code right in your web browser with the Online Retro IDE
– No downloads, configuration, etc necessary, and it is free!
Our dirty first draft, completely unoptomised, will be barely one step away from pseudocode:
M(x, y) sized to the full world dimensions(PX, PY) separately from their screen positionCX = PX - 5, CY = PY - 5 (half the 11×11 viewport)(I, J) in the 11×11 viewport, copy M(CX+I, CY+J) to screen RAM at (OX+I, OY+J)(PX, PY), redrawOf course this is really really slow, but as a proof of concept it helps us get the general shape nailed.
→Get your own editable copy of the final code and see it run in the online editor.
10 REM ---- LARGE MAP / SMALL CAMERA DEMO ---- 20 REM JAY'S QUESTION: KEEP PLAYER CENTRED, 30 REM MOVE THE MAP AROUND THEM (ULTIMA STYLE) 40 REM LEVEL 1: HORRENDOUSLY SLOW UNOPTIMISED FIRST DRAFT 50 MW=40 : MH=24 60 VW=11 : VH=11 70 HX=INT(VW/2) : HY=INT(VH/2) 80 OX=14 : OY=6 90 SC=1024 100 DIM M(MW-1,MH-1) 110 GOSUB 600 : REM BUILD MAP 120 PX=20 : PY=12 130 PRINT CHR$(147) 140 GOSUB 300 : REM DRAW VIEWPORT 150 GOSUB 500 : REM DRAW PLAYER 160 GET K$ : IF K$="" THEN 160 170 DX=0 : DY=0 180 IF K$="W" THEN DY=-1 190 IF K$="S" THEN DY=1 200 IF K$="A" THEN DX=-1 210 IF K$="D" THEN DX=1 220 IF K$="Q" THEN PRINT CHR$(147) : END 230 NX=PX+DX : NY=PY+DY 240 IF NX<0 OR NX>MW-1 OR NY<0 OR NY>MH-1 THEN 160 250 PX=NX : PY=NY 260 GOSUB 300 : GOSUB 500 : GOTO 160 270 REM 300 REM ---- DRAW VIEWPORT (NAIVE) ---- 310 CX=PX-HX : CY=PY-HY 320 IF CX<0 THEN CX=0 330 IF CY<0 THEN CY=0 340 IF CX>MW-VW THEN CX=MW-VW 350 IF CY>MH-VH THEN CY=MH-VH 360 FOR J=0 TO VH-1 370 FOR I=0 TO VW-1 380 POKE SC+(OY+J)*40+(OX+I),M(CX+I,CY+J) 390 NEXT I 400 NEXT J 410 RETURN 420 REM 500 REM ---- DRAW PLAYER (NAIVE) ---- 510 SX=OX+(PX-CX) : SY=OY+(PY-CY) 520 POKE SC+SY*40+SX,81 530 RETURN 540 REM 600 REM ---- BUILD TEST MAP ---- 610 FOR Y=0 TO MH-1 620 FOR X=0 TO MW-1 630 T=46 640 IF X=0 OR Y=0 OR X=MW-1 OR Y=MH-1 THEN T=160 650 IF (X=10 AND Y>4 AND Y<15) THEN T=160 660 IF (Y=8 AND X>14 AND X<25) THEN T=87 670 M(X,Y)=T 680 NEXT X 690 NEXT Y 700 RETURN
We could leave it at the above but it animates like a slideshow rather than a game, plus you might be forgiven for thinking it has crashed due to the extreme slow startup. Let’s tweak the display logic first.
The biggest move we can make at this stage is to replace the expensive multiplication (OY+J)*40 with a precomputed lookup table. Our 8 bit 6510 is not quick at multiplication as a rule, even less nimble in floating point BASIC. So we add DIM R(24) and fill once at startup: FOR Y = 0 TO 24 : R(Y) = Y*40 : NEXT Y
The display loop then becomes one lookup, no multiply:
RO = SC + R(OY+J) + OX
POKE RO+I, M(CX+I, CY+J)
121 floating-point multiplications eliminated! ~3–5× faster.
This will unfortunately make the startup even slower.
Why is it still slow? 2D array access in BASIC v2 still costs a hidden multiply per read.
So how about we switch the map to a flat 1D array: DIM M(MW*MH - 1)
Now due to this change we should add a map-row lookup table: DIM MR(MH-1) and fill that LUT with MR(Y) = Y * MW at startup.
Our viewport loop now has become ONLY additions, which microprocessors are much better at:
RO = SC + R(OY+J) + OX (screen base for the row)MO = MR(CY+J) + CX (map base for the row)POKE RO+I, M(MO+I) zero multipliesAgain, we trade off play speed with initialisation delays – we have added ~24 more multiplications at startup, but we did eliminate ~121+ per display frame.
We started out with a slow startup but it is now so slow that if we don’t show the program is running it is certain to look frozen.
All we need to do is print inside each initialisation loop.Ironically those prints do add even more slowness to the process, C64 BASIC is not at all quick at printing.
Fortunately in a real game we would encode the LUTs and map as DATA statements and READ them, or even better load from disk.
The last stage of optmisation is to find the next “Hot Path” and optimise it.
A “hot path” is the section of your program that is executed most frequently. Optimising those sections provide the most noticeable improvements.
In this case our FOR J = 0 TO 10 ... NEXT J runs every redraw with the same constant bounds.
BASIC’s FOR/NEXT per-iteration overhead is significant (push to the stack, perform variable lookup, do a comparison, jump to next).
Instead we can move the calculation to another LUT DIM VR(10) filled with VR(J) = SC + R(OY+J) + OX once at startup and “unroll” one of the loops. Instead of two FOR loops we replace the outer loop with 11 explicit lines that each handle outputting one row:
RO = VR(0) : MO = MR(CY) + CX : GOSUB 460
RO = VR(1) : MO = MR(CY+1) + CX : GOSUB 460
Yeah, we still have a FOR but we have eliminated 10 NEXT s per display frame which is not too shabby.
It does feel snappier on each WASD press which is a big deal.
→Get your own editable copy of the final code and see it run in the online editor.
Further optimisation ideas follow, but here is a good place to end with some working but still too slow code.
10 REM ---- LARGE MAP / SMALL CAMERA DEMO ---- 20 REM JAY'S QUESTION: KEEP PLAYER CENTRED, 30 REM MOVE THE MAP AROUND THEM (ULTIMA STYLE) 40 REM LEVEL 3: UNROLLED VIEWPORT + PRECOMPUTED 50 MW=40 : MH=24 60 VW=11 : VH=11 70 HX=INT(VW/2) : HY=INT(VH/2) 80 OX=14 : OY=6 90 SC=1024 95 PRINT CHR$(147) : PRINT "LOADING"; 100 DIM M(MW*MH-1) 105 DIM R(24) 106 DIM MR(MH-1) 107 DIM VR(10) : REM PRE-BAKED VIEWPORT ROW SCREEN BASES 110 FOR Y=0 TO 24 : R(Y)=Y*40 : PRINT "."; : NEXT Y 111 FOR Y=0 TO MH-1 : MR(Y)=Y*MW : PRINT "."; : NEXT Y 112 FOR J=0 TO 10 : VR(J)=SC+R(OY+J)+OX : NEXT J 113 PRINT : PRINT "BUILDING MAP"; 114 GOSUB 600 115 PRINT : PRINT "READY" 120 PX=20 : PY=12 130 PRINT CHR$(147) 140 GOSUB 300 150 GOSUB 500 160 GET K$ : IF K$="" THEN 160 170 DX=0 : DY=0 180 IF K$="W" THEN DY=-1 190 IF K$="S" THEN DY=1 200 IF K$="A" THEN DX=-1 210 IF K$="D" THEN DX=1 220 IF K$="Q" THEN PRINT CHR$(147) : END 230 NX=PX+DX : NY=PY+DY 240 IF NX<0 OR NX>MW-1 OR NY<0 OR NY>MH-1 THEN 160 250 PX=NX : PY=NY 260 GOSUB 300 : GOSUB 500 : GOTO 160 270 REM 300 REM ---- DRAW VIEWPORT (UNROLLED, LUT) ---- 310 CX=PX-HX : CY=PY-HY 320 IF CX<0 THEN CX=0 330 IF CY<0 THEN CY=0 340 IF CX>MW-VW THEN CX=MW-VW 350 IF CY>MH-VH THEN CY=MH-VH 360 RO=VR(0) : MO=MR(CY)+CX : GOSUB 460 361 RO=VR(1) : MO=MR(CY+1)+CX : GOSUB 460 362 RO=VR(2) : MO=MR(CY+2)+CX : GOSUB 460 363 RO=VR(3) : MO=MR(CY+3)+CX : GOSUB 460 364 RO=VR(4) : MO=MR(CY+4)+CX : GOSUB 460 365 RO=VR(5) : MO=MR(CY+5)+CX : GOSUB 460 366 RO=VR(6) : MO=MR(CY+6)+CX : GOSUB 460 367 RO=VR(7) : MO=MR(CY+7)+CX : GOSUB 460 368 RO=VR(8) : MO=MR(CY+8)+CX : GOSUB 460 369 RO=VR(9) : MO=MR(CY+9)+CX : GOSUB 460 370 RO=VR(10) : MO=MR(CY+10)+CX : GOSUB 460 410 LX=CX : LY=CY 420 RETURN 430 REM 460 REM ---- POKE ONE VIEWPORT ROW ---- 470 FOR I=0 TO 10 : POKE RO+I, M(MO+I) : NEXT I 480 RETURN 490 REM 500 REM ---- DRAW PLAYER ---- 510 SX=OX + (PX-LX) 520 SY=OY + (PY-LY) 530 POKE SC+R(SY)+SX, 81 540 RETURN 550 REM 600 REM ---- BUILD TEST MAP ---- 620 FOR Y=0 TO MH-1 625 YB=MR(Y) : PRINT "."; 630 FOR X=0 TO MW-1 640 T=46 650 IF X=0 OR Y=0 OR X=MW-1 OR Y=MH-1 THEN T=160 660 IF (X=10 AND Y>4 AND Y<15) THEN T=160 670 IF (Y=8 AND X>14 AND X<25) THEN T=87 675 M(YB+X)=T 680 NEXT X 690 NEXT Y 700 RETURN
So about those future optimisation ideas …
The biggest would be to eliminate the slow pokes (heh). We have seen before that in C64 BASIC with no assembly routines, print is faster than poke. POKEing individual characters to screen memory is slow. PRINTing with the cursor positioned via cursor escape controls would be the single biggest remaining win in pure BASIC.
Building up the string using concatenation would still be slow so instead the map would be built as a string array of rows I think. We could use C64 BASIC string manipulation commands to extract just the portions we need.
Alternatively, or in addition, we could have an assembly routine that does the display and uses memory copies. This would bypass our display loops and character by character friction and instead would be given a starting memory address and would get the source and paste to the destination super quick.

Last thought I had was to use meta-tiles. Part of the reason initialisation is so slow is because the map is made up character by character, but in a game like Ultima or Zelda you might use tiles that are 3×3 or 5×5 to make a wall corner, part of a house, a bend in a road, and so on. This would make loading or generating the world map a lot quicker because it could be the same size when displayed but compressed down to 1/3 or smaller.
How else would I improve it?
$D800 (55296) so walls are grey, water is blue, grass is green, player is yellow …$D016 / $D011 for sub-character pixel scrolling.The technique that started this discussion applies no matter what platform or language you are using. Decouple your world coords from the visible screen coords and treat the playable, visible area as a window into a larger ‘world’ buffer. While my ‘Zelda-Like‘ demos use push scrolling, the concept used is the same.
We quickly went into a side-quest of trying to get CBM BASIC v2 to perform. The cost of multiplications in particular was very visible.
Lookup tables are the single most powerful optimisation tool on retro systems: trade a tiny bit of RAM and up-front calculations for huge time savings at runtime. You can see this technique over and over in the demo scene.
Finally, unroll loops (and anything else you need to do), but don’t optimise until you’ve measured where your hold-ups are (your hot path). As in the linked video from Robin, just because something seems like it should make things faster, doesn’t mean it will!