Ian's trip to Harvard

On Monday 8th May 2000, Ian Griffiths visited the Pusey library at Harvard to get our first look at the Altair 4K BASIC source code. What follows are notes he made at the time, all made by hand and later typed up. We didn't know until he got there that visitors to the archives are allowed to take laptops into the reading room.

 

Physical packaging

The document is contained in a cream box, which has a label on it mentioning Gates & Allen. This box (c. 15"x20"x1.5") contains a single item - a black leatherbound book, the cover of which is unmarked save for the spine which is gold-embossed with "GATES AND ALLEN - 8080 BASIC INTERPRETER - APRIL 1975".  There are also a a couple of decorative bars bracketing the writing at either end of the spine.

The inside cover and front binding page of the book are both blank save for two pencilled-in archive reference numbers: the official one HUF 300.775, and one which had no particular explanation attached, ACC14093.

There are 57 pages inside the binding - ie 114 sides.  The back binding page has got the following text on it :

This book is a preservation photocopy.
It is made in compliance with copyright law
and produced on acid-free archival
60# book weight paper
which meets the requirements of
ANSI/NISO Z39.48-1992 (permanence of paper)

Preservation photocopying and binding
by
Acme Bookbinding
Charlestown, Massachusetts

1999

The rear inside cover has got a pale yellow card sleeve glued in, which contains 2 sheets of US letter, printed single-sided and they give every impression of being done in TEX.

 

Letter from Dean Lewis

It's a letter dated December 29, 1999 from Harry R. Lewis, the Gordon McKay Professor of Computer Science, Dean of Harvard College, and is addressed a Mr Harley Holden, University Archives, Pusey Library.

Dean Lewis' address is given as "University Hall 4, Harvard College, Cambridge, MA. 02138", lewis@harvard.edu, tel. (617) 495-1555, fax (617) 496-8268.

In the letter he provides the background as to how this document came to be in the archives in the first place : it was behind his filing cabinet for 20 years [link to other page here] and then goes on to quote an email he got from BillG on the subject which is about technical details on the development process for this version of BASIC.  Especially talking about developing 8080 assembler stuff without the aid of an 8080 cross-assembler for the PDP-10.

Quote from Dean Lewis :  "They used the Harvard PDP-10, located in the Aiden Computer Laboratory". He also says "I asked Bill Gates, 'Was there an 8080 cross-assembler running on Harv-10?'".  The Dean's summary of Gates's reply is "That is, the regular PDP-10 assembler was enhanced by defining the 8080 instruction names to be PDP-10 'Undefined User Operations', and the Digital Debugging Tool for the PDP-10 was likewise modified. Single 8080 instructions then corresponded to single PDP-10 instructions, though every instruction trapped to be emulated in user space on the PDP-10."

Some interesting facts that emerged from the above letter :

The letter mentions some communication between Gates & Lewis, and contains a verbatim email from Gates, dated "Mon 1 Nov 1999 16:20:55 -0800" (from billg@MICROSOFT.com) to "Harry Lewis <lewis@deas.harvard.edu>, Carl Stork (Exchange) <carls@exchange.microsoft.com> Subject "RE: caption". So Bill had some input, but it's not clear what.

Most of this letter is concerned with the history of the document, but the final paragraph has something to say on the listings themselves :  he mentions the 'features listing' which  just enumerates the unique selling points of Gates-Allen BASIC. Another quote from Lewis in the letter "Some of these extensions, such as the ability to put two statements on one line, or to use two-letter variable names instead of single letter names, seem risibly modest from today's perspective."

He also points out this is not only the listing for 4K BASIC, but also for 8K BASIC : "... (a conditional assembly switch distinguished the two versions)."

Ian notes from the source code that there was also a switch for a third version, 12K BASIC, but it is apparent from the source code that the switch didn't actually do anything.  I don't think there ever was a 12K BASIC, but I could be wrong.

Final thing from the letter is a note of who got courtesy copies - "Professor I. Bernard Cohen, Ms Jan Merrill-Oldham, Dean Venkatesh Narayanamunti". Professor Cohen is mentioned in the letter along with Donald Knuth as two people who have made pilgrimages to see the listing displayed in the computer science building.

 

The Listings

The pages appear to be a more or less direct copy of printer output (albeit a very high contrast copy, and Lewis' letter implies that some touching-up was done to improve legibility). The spool-feed holes are visible down both sides of the pages and the output looks like that of a line printer (as certain columns have persistant problems).

The pages are not numbered.

Several files appear to be present. Each has a good old line printer header page. This consists most visibly of the file name printed out BIG :  each letter of the file name is drawn pixellated, with each pixel drawn with a 3x3 block of characters - the character being whichever letter of the filename it belongs to.

The files are (note we think that MAC is short for MACro assembler) :

 

F3
MAC

65 sides, lines 00100 through 90008, mostly in increments of 10 or 20, up to 68960, increments of 4 from 70000 to 70554, then 90004, 90008. For some reason, this listing is the only one not double-headered. We have no idea why.

F4
MAC

33 sides (including headers), lines 00100 to 90450, increments of 50.

PUN7
MAC

3 sides (including both headers), 00100 to 03500. Code seems to be PDP-10 code, not 8080 code. Ian's guess is that this is what they used to launch everything else into the 8080 emulator, although this really is a guess.

C

[This file could also be titled 'L' - it's hard to tell from the handwritten notes]. 3 sides, 00100 to 05200. Not asm - seems to be some sort of config script - contains 4K, 8K, and 12K select - defining a symbol 'LENGTH' to be 0,1, or 2 respectively. When LENGTH=1, ie 8K version, 'EXTENDED FUNCTIONS', 'MULTIPLE DIMENSIONED ARRAYS ALLOWED' and 'STRINGS ALLOWED' symbols are defined as 1. When LENGTH=2 (ie 12K) only 'EXTENDED FUNCTIONS' gets defined as 1 (ie fewer features than 8K BASIC - more reason to suppose 12K never made it.) There are also some symbols that never get defined to 1, regardless of what LENGTH is defined as. These are: 'INLINE CONSTANT CONVERSION', 'DOUBLE PRECISION ARITHMETIC', 'INTEGER VARIABLES', and 'PRINT USING ALLOWED'. Ian notes that later in the listings, at the actual code for inline constant conversion, there is a comment saying that 'this doesn't work'!.

FEAT

6 pages of text, not code. This is just a list of features that were considered to be unique to their BASIC. So we have : Multiple statements, per line, colon seperated. Direct commands (ie you can type BASIC at some prompt as opposed to needing it in a program). NEXT loop variable optional (can also close multiple loops with single NEXT statement). Variable names of any length, but only the first two characters looked at. Can evaluate an expression to calculate the size of an array dynamically. Can clear an array with CLEAR. In PRINT, can use SPC(x) to insert spaces and TAB(x) to tab to the x'th column, and it's ignored if you've gone past that column already. POS (built in function) returns the current position of what they rather quaintly call the 'print head'. Nested IF expression THEN statement supported. AND, OR, and NOT available on all expressions. Comparison operator likewise can be interpreted as numbers. And this (interesting bit of BASIC (and much later, COM automation) history is where -1 for TRUE comes from...) and IF can accept numeric expressions as well as boolean ones. The FRE function returns you the amount of free space in user storage. Question mark (?) is shorthand for PRINT. (It still is today!) CONT command continues after a STOP or a Ctrl+C. You can temporarily halt program execution by just hitting the return key at the prompt provided by the INPUT keyword. INP and OUT give you hardware bit twiddling. Multi-dimension arrays supported. Can provide user-defined formulae with FN. Also lists all of the statements supported by the language : PRINT, IF <formula> <relation> <formula> THEN <statement>, GOSUB, RETURN, NEXT, FOR, READ, INPUT, END, STOP, DATA, RESTORE, LET, DIM, REM, and then in 8K BASIC only, ON <formula> GOTO <line>;<line>... (is this some primitive switch?), ON <formula> GOSUB <line>;<line>..., OUT <channel#>;<8-bit value>

Q
CMD

3 pages including the double header. Full content of the file is : DDT,ITAB2,S3;%SF3.REL,%SF4.REL%3%B
We have no real idea what this means, but Ian's guess is that it launches the debugger... DDT could stand for the 'Digital Debugging Tool' mentioned in Dean Lewis's letter.

There is also a final page, seems to be nothing more than spooler output, but is interesting nonetheless. At the top it says "SPOOLER RUNTIME 31 SECONDS, 179 KCS, 0 DISK READS; 0 DISK WRITES; 104 PAGES."
After this there are 15 copies of the same line : "**END** USER HOLLOWAY [6001,155] JOB F3 SEQ 513 DATE 30-APR-75 18:08:53 MONITOR HARVARD 5.06B-131 **END**".

I wonder who Holloway was??

OK, the 2 main files are F3.MAC and F4.MAC. Judging by the title directives, these are the main interpreter and the math package respectively :

F3.MAC

That this is not the final version is evident from the line :

00340 SUBTITL VERSION 1.1 -- MORE FEATURES TO COME

The copyright reads :

00400  -------------------------------------------
00410  COPYRIGHT 1975 BY BILL GATES AND PAUL ALLEN
00420  -------------------------------------------

It also says 'written originally on the PDP-10 at Harvard from February 9 to April 27.' Remember that the spooler output (above) showed that this printout was made on 30th April 1975.

Interestingly, another comment tells us that :

00560  PAUL ALLEN WROTE THE NON-RUNTIME STUFF.
00580  BILL GATES WROTE THE RUNTIME STUFF.
00600  MONTE DAVIDOFF WROTE THE MATH PACKAGE.

This explains the 'Gates, Allen, Davidoff' reference in the record, although 'Davidoff' doesn't appear anywhere on the box or volume titles.

There is a 'THINGS TO DO' section :

SYNTAX PROBLEMS (OR)
NICE ERRORS
ALLOW ^W AND ^C IN LIST COMMAND
TAPE I/O
BUFFER I/O
USR ??
ELSE
USER-DEFINED FUNCTIONS (MULTI-ARG,MULTI-LINE,STRINGS)
MAKE STACK BOUNDARY STUFF EXACT
(FOUT 24 FIN 14)
PUNCH, DELETE;,.
INLINE CONSTANT CONVERSION -- MAKE IT WORK
SIMPLE STRINGS

Here's a rough outline of the F3 code :
780-1180Constant definitions for things like status bit, positions, and I/O channels
1190-1560Conditional assembly, just turning features on and off for 4K and 8K BASIC
1640-1760Reset routine, consisting entirely of JMP INIT. There's a comment saying 'gets replaced with JMP READY'.
1780-3120Character I/O and a 16-bit compare
3140-3880(8K BASIC only) Floating point related stack hackery
3900-4240Table indicating operator precedence.
4260-6680Keyword table, plus comment saying that if one keyword is a substring of another, that it can cause problems.
6700-7840Table of function pointers, one for each entry in the keyword table.
7860-8664Table of errors, all of which are 2 letters long.
8700-10240BASIC's workspace. DB's DWs and so on.
10260-10820Error handling.
10840-13200Main program loop. Just loops, getting a line of input from the TTY and processing it.
13220-14000Reset state and clear memory
14020-18320Get a line of input.
18340-19900LIST.
19920-21640FOR.
21660-22740"New statement fetcher". Could not work out what this does.
22760-22940RESTOR.
22960-23740STOP and END.
23760-25340Some input handling.
25360-25500RUN.
25520-27066GOSUB, RETURN, and GOTO.
27080-27580REM and DATA.
27600-30180LET.
30200-32080PRINT.
32100-35480INPUT and READ.
35500-37700NEXT.
37720-46180Expression evaluation. Ian notes that this is "v. hairy".:-)
46200-46736User defined fomulae.
50000-57024Multi-dimensional arrays.
57040-67240String handling.
67260-67760INP and OUTP.
67780-68960More string handling.
68964-70428Line range execution handling? It looked to be supporting a mechanism of saying 'run from line x to line y', but Ian is not certain about this.
70432-70496Constant crunching. Don't think this ever got compiled in... looks like it's the inline constant conversion stuff (above) which was noted as not actually working.
70498-70554Shuffle memory down after deleting a line of code.
90004-90008Tail end of file.

F4.MAC

This was not gone through in detail, as it was mostly hard-to-follow floating-point code and Ian was "getting pretty knackered by now".  However there was some interesting non-FP code in it :

79500-82800 System initialisation
82850-84075 Function deletion code. Looks like a mechanism to throw away bits of the interpreter to free up some memory.
84100-88150 The code which asks the user which functions to throw away.
88200-88500 The 'WRITTEN BY...' message crediting all three authors.
88550-88850 The text 'STRING SPACE'.
88900-89050 The text 'BYTES FREE'.
89100-89850 The startup prompt text.
89900-90000 The text 'MEMORY SIZE'.
90050-90250 Mapping out stack and memory space.
90300-90450 Tail end of file.

Ian finally notes that he was unable to find BillG's favourite bit of code in this - the decimal-to-string conversion which he (Bill) explained in full in his Software Notes column (link to my site).  The PRINT keyword uses a routine called FOUT, which is a floating-point print routine.  The line-number printing is the only place where it prints an integer, and this uses a routine called LINPRT (file F4.MAC, 40800-41250) but all this does is promote the integer to an fp number, and delegate to FOUT.