9. SOURCEROR 9.1 Introduction Sourceror is a sophisticated, and after practice, an easy to use disassembler designed as a subsidiary to create MERLIN source files out of binary programs, usually in a matter of minutes. SOURCEROR diassembles Sweet 16 code as well as 6502 code. The above is true. However, there are a number of ways you can become tripped up if you are not careful and think about the job you are undertaking. In general, you are trying to fully understand and/or change an undocumented program. Prior to undertaking the job of disassembling a large program ($1000 bytes or larger); it is advised that you should practice on smaller programs where the documentation and/or understanding is available from computer stores or friends. The 256 bytes of the disk controller ROM that begins at address $C600 is a challenge worthy of your attention. Through reading of "Beneath Apple DOS" is almost a must before undertaking any but the most trivial disassembles. In addition, The publication "call A.P.P.L.E. in depth- All about DOS" makes very interesting reading. The main part of SOURCEROR is called SRCRR.OBJ, but this can not be run (conviently) directly, since it may overwrite DOS buffers and crash the system. For this reason, a small program named SOURCEROR is provided. It runs in the input buffer (where keyboard input is normally kept), and does not conflict with any program in memory. This small program simply checks memory size, gets rid of any programs such as PLE which would conflict with the main SOURCEROR program, sets MAXFILES = 1, then runs SRCRR.OBJ (at $8800-$9AA5). To minimize the possibility of accident, SRCRR.OBJ has a default loading location of $4000 and if you BRUN it, it will just return without doing anything. If you try to BRUN it at its designed location of $8800, however, you could be in for big trouble. SOURCEROR assumes the standard Apple screen is being used and will not function with an 80 column card. The upshot of the above is "do it our way" and you "probably" won't get into trouble. 9.2 Using SOURCEROR 1. Load in the software to be disassembled. The most common way to do this is to use the command "BLOAD." It is possible to boot some disks and then exit to Applesoft by menu and/or reset key. The location where the program normally resides is preferred if that doesn't interfere with where MERLIN or one of its tables is going to be placed. See the memory map in this section for a guide to Sourceror memory usage. There are a number of ways that the loading address of a program can be found. W Get the Beagle Brothers disk "Apple Mechanic" and study the documentation for the program "byte zap." This will allow you to follow the linkage from the catalog on track $11 to the actual track and sector where the binary data is stored. Referring to your copy of "Beneath Apple DOS" (chapter 4) there are figures that show how track/sector lists are configured and how and where the starting address and program length are stored. W Refer to the various "tip sheets" available with Beagle software. They are chock full of tidbits that you are going to need if you are going to cope with the various problems encountered in disassembling code. In particular, they will tell you the addresses to inspect after a BLOAD that will give you the starting address and program length. While this might seem to be the easier approach of the two presented so far, the knowledge obtained is not nearly so great. W Central Point Software has an excellent disk called "Copy ][ Plus." It is sold as an disk back up system. However, the disk utilities are excellent. For this discussion, the Catalog command is of interest. It has an option that will print (screen or printer) the names, starting address, and program length for each binary file on the disk being cataloged. While the Copy ][ Plus solution is most attractive from an individual effort standpoint. It is not the best from a learning point of view (do buy the disk. It is well worth it's modest price). 2. Put the Merlin disk in drive 1 and type "BRUN SOURCEROR" When the drive stops, take the Merlin disk out of drive 1. 3. You will be told that the default address for the source file is $2500. This was selected because it does not conflict with the addresses of most binary programs you may wish to disassemble. Just hit RETURN to accept this default address. Otherwise, specify (in Hex) the address you want. You may also access a "secret" provision at this point (not secret anymore, I'm about to blab it all out). This is done by typing CTRL-S (for "SWEET") after, or in lieu of the source address. You will be asked for a (non standard) address for the SWEET 16 interpreter. This is intended to facilitate the disassembly of programs which use a RAM version of SWEET 16. 4. Next you will be asked to hit RETURN if the program to be disassembled is in its original (running) location. If it is NOT in its running location, then you must specify (in Hex) the code's present location. Finally, you will be asked to provide the ORIGINAL location of that program. When disassembling you must specify the ORIGINAL address of the program, not the address where it currently resides. It will appear that you are disassembling the program at its original location, but actually, SOURCEROR is disassembling the code at its present location and translating the addresses. This is an important enough concept that we will give a small example at this time: a. You specify that the program is NOW loaded at location $803. b. You further state that it normally resides at $2500. c. When you type your first L to begin disassembly, type -----> 2500L <----- !! If you type 803L, the disassembly will NOT be for the code you expected to disassemble. 5. Lastly, the title page, which contains a synopsis of the commands available for disassembly will be displayed. You may now start disassembling or using any of the other commands provided. Your first command must be prefixed with a hexidecimal address. Thereafter this is optional, as is explained in the "Command Description" of this manual. At this point, and until the final processing, you may hit RESET to return to the start of the Sourceror program (CNTRL-RESET Apple //e). If you hit reset once more, prior to at least one disassembly command, you will exit Sourceror to Basic. Using RESET assumes are using the autostart ROM. Please note that RESET will cause the loss of all disassembled code up to that point. This is a real asset if you have become confused and wish to start over. If this is not your intent, do not hit RESET! 9.3 Commands Used in Disassembly The disassembly commands are very similar to those used by the disassembler in the Apple monitor. All commands accept a four digit hex address before the command letter. If this number is omitted, the the disassembly continues from its present address. The only time a number must be provided is upon initial entry. If you specify a number greater than the present address, a new ORG will be created. More commonly, you will specify an address less than the present default one. In this case the disassembler checks to see of the address equals the address of one of the lines previously disassembled. If so, it simply backs up to that point. If the address is less than the present ORG, Sorceror backs up to that point and creates a new ORG. All source lines are erased. This has the effect of hitting RESET and then specifying the new starting address. It is almost always best to avoid new ORG statements. When this occurs, back up a little more until you no longer get a new ORG on disassembly. 9.4 Command Descriptions 9.4.1 L (List) This is the main disassembly command. It disassembles 20 lines of code (not bytes, lines) It may be repeated (e.g. 2000LLL will disassemble 60 lines of code starting at $2000). By the same token, 2000L , followed by LL , will disassemble exactly the same amount of code. If L detects a "JSR" to the SWEET 16 interpreter, disassembly is automatically switched to the SWEET 16 mode. Command L always continues the present mode of disassembly (SWEET 16 or normal). If an illegal opcode is encountered, the bell will sound and the opcode will be printed as three question marks in flashing format. This is only to call your attention to the situation. In the source code itself, unrecognized opcodes are converted to HEX data, but not displayed on the screen. This brings up an interesting point. You and I know that the Apple cannot execute "unknown" opcodes. What has happened is that the "opcode(s)" you see are really data that the program utilizes in some fashion or other. Your job is to figure out what is going on here. The solution tends to be one of the following: 1. The "opcodes" are preceded by a JSR to some address and the opcodes following said JSR are data for that routine. If that is the case, the end of data is some constant that the called routine can recognize. HEX $00 is the usual choice. The hex digits preceding the end code could be data or ASCII characters. Context will tell you which. In the case of all data, the "H" command is your choice. If ASCII is involved then "T" followed by "H" for the end code would be the choice. 2. The "funny opcode(s) is preceded by a JMP. In this case, the opcodes (BRKs or data) could be memory storage locations (with/without pre-initialized data). Sometimes (always) you will tend to start hitting the L(LLL) key and RETURN in rapid fashion. Especially when things are going "well." At some point you are going to realize that you are "out of step with the world." The solution is to do one of the following: 1. Hit the RESET key and start all over. This happens often. 2. Type xxxxL where xx is a "remembered" address where things were "good." This is the preferred mode. When you begin a disassembly, have a sheet of paper and a writing utensil handy so you can jot down pertinent thoughts as the disassembly progress. One of the more "pertinent" things is the address where things are "good." Keep in mind that Sourceror takes the disassembly address YOU give it as gospel. If that address is wrong and yields "funny" code, it is your fault, not Sourceor's. A final thought. Given that you have loaded the correct program into memory and that you disassemble the correct number of bytes, the resultant source will (almost always) generate the "correct" (identical) object (see the exception in the "Final Processing" section that follows). That is not the point of SOURCEROR! With Sourceror you are trying to generate source code that will lead to eventual understanding and/or modification. Generation of source that is "garbage" and yet yields the identical object code as the original is of very little value. There are a number of very fine copy programs on the market that could of achieved the same purpose at much less personal cost. The above is provided to give you a taste of what disassembly is all about. Often, it is not nearly as difficult as presented. However, there are times when the above is an extreme oversimplification of the problems you will encounter. 9.4.2 S (Sweet) This is similar to L, but forces disassembly to start in SWEET 1 mode. SWEET 16 mode return to normal 6502 mode whenever the SWEET 16 RTN opcode is encountered. 9.4.3 N (Normal) his is the same as L, but forces disassembly to start (return) to normal 6502 mode. 9.4.4 H (Hex) This creates the HEX data opcode. It defaults to one byte of data. If you type a one or two digit hex number after H, that number of data bytes will be generated. 9.4.5 T or TT (Text) T attempts to disassemble the data at the current address as an ASCII string. Depending on the form of the data, this will (automatically) disassemble under the pseudo opcode ASC, DCI, INV, or FLS. The appropriate single or double quote (' or ") is automatically chosen. The disassembly will end when the data encountered is: 1. inappropriate. 2. When 62 characters have been treated (remember, MERLIN has a 64 character limit. 62 + 2 delimiters = 64). 3. the high bit of the data changes. In this case the ASC opcode is changed to DCI. Sometimes the change to DCI is inappropriate. This change can be defeated by using "TT" rather that "T" in the command. occasionally, the disassembled string may not stop at the appropriate place because the following code looks like ASCII code to Sourceror. In this event, you may limit the number of characters put into the string by placing a one or two digit hex number after the T (TT) command. Using the hex number or TT may have to be used to establish the correct boundary between regular ASCII strings and a flashing one. It is usually obvious (the second time) where this should be done. Any lower case letters appearing in the text string are shown as flashing upper case letters. 9.4.6 W (Word) W disassembles the next two bytes at the current location as DA opcode. Optional, if the command WW is used, these bytes are disassembled as DDB opcode. Finally, if W- is used as the command, the two bytes are disassembled in the form DA-1. The latter is often the appropriate form where the program uses the address by pushing it on the stack. You may detect this while disassembling, or after the program has been disassembled. In the latter case, it may be to your advantage to do the disassembly again with some notes in hand. See the recommendations in the explanation of the L command. 9.5 Housekeeping Commands 9.5.1 / (Cancel) This essentially cancels the last non / command. More exactly, it re-establishes the last default address (the address used for a command not necessarily attached to an address). This is an useful convenience which allows you to ignore the typing of an address when a backing up of the address pointer is desired. As an example, suppose you type a "T" to disassemble some text. You may not know what to expect following the text, so you can just type L to look at it. If the text turns out to followed by some hex data (such as $8D for a carriage return), simply type / to cancel the L. You the type the appropriate H command followed by the L. 9.5.2 R (Read) R allows you to look at memory in a format that makes imbedded text stand out. R reads a block of $100, thus to look at data from $1234 to $1333 type 1234R. After the initial R, will bring up the next "page" of memory (page is quoted because pages generally begin at addresses with the two least significant digits = 00 and end with those same two digits = FF. Not necessarily true for pages read by the R command). It is important to keep in mind that the numbers (address) you use for the R command are totally independent of the disassembly addresses. This has two impacts on you: 1. The R command is NOT in step with the L or other disassembly commands. This means that you have to type the initial hex address or R will be reading far afield from where you want. 2. The above was the bad news. Now for the good news. R does not change the next default address for any other command. This means that you can R(ead) all over the place, before, or after L commands and not change the next address L or any other disassembly command uses. R is of particular use when unknown opcodes (???) are encountered by the L command. Using the address noted down prior to the last L command you type the appropriate R command. This allows you to see where the ASCII characters actually begin so you can then type xxxxT[yy] to disassemble the ASCII characters. You may disassemble, then use (address)R, then L alone, and the disassembly will proceed just as if you never used R at all. If you don't intend to use the default address when you return to disassembly, it may be wise to make a note on where you wanted to resume, or use the / command before the R. The latter is probably the wisest choice. 9.5.3 Q (Quit) This ends disassembly and goes to the final processing which is automatic. If you type an address before the Q, the address pointer is backed to (but not including) that point before processing. If, at the end of the disassembly, the disassembled lines include: 2341- 4C 03 E0 JMP $E003 2344 A9 BE 94 LDA $94BE,Y and the last line is just garbage, type 2344Q. This will cancel the last line, but retain the first (and all preceding). 9.5.4 Final Processing After the Q command, the program does some last minute processing of the assembled code. If you hit RESET at this time you will return to BASIC and will lose the disassembled code. The processing may take a second or two for a short program, to two or three minutes for a long one. Be patient. When the processing is done, you are asked if you want to save the source. If so, you will be asked for a file name. Sourceror will append the suffix ".S" to this name and save it to disk. The drive used will be the one used to BRUN SOURCEROR. Replace the Merlin disk first if you want the source to go onto another. To look at the disassembled source, BRUN MERLIN, or type ASSEM, and load it in. 9.5.5 Dealing with the Finished Source In most cases, after you have some experience, and assuming you used reasonable care, the source will have few, if any, defects. You may notice that some DA's would have been more appropriate in the DA LABEL-1 or the DDB LABEL formats. In this and similar cases, it may be best to do the assembly again with some notes in hand. The disassembly is so quick and painless, that it is often much easier than trying to alter the source appropriately. The source will have all the exterior or otherwise unrecognized symbols at the end in a table of equations. You should look at this table closely. It should not contain any zero page equates except one's resulting from DA's, JMP's or JSR's. This is almost a sure sign of an error in the disassembly (yours, not SOURCERROR's). It may of resulted from an attempt to disassemble a data area as regular code. NOTE: If you try to assemble the source with zero page equates, you will get an error as soon as the equates appear. If, as you eventually should, you move the equates to the start of the program, you will cease getting errors, but the assembly MAY NOT BE CORRECT. It is important to deal with this situation first as trouble could occur if, for example, the disassembler finds the data AD 00 8D. It will disassemble correctly as LDA $008D. The assembler assembles this code as a zero page instruction, giving the two bytes A5 8D. This has shortened the program by one byte! Many relative branches can now be in error. The assembler always assembles such code as a zero page instruction, giving the two bytes A5 8D. Occasionally, you will find a program that uses the AD 00 8D form for a zero page instruction. In that case, you will have to insert a character after the LDA opcode (LDAL) to have it assemble identically to its original form. Please note that this is rarely the case, most often the "instruction" is really data and must be delt with by either replacing the incorrect opcode with the appropriate HEX, ASC, etc. opcode or by redoing the disassembly with notes in hand. The above has pointed out several important points: W Go slowly at all stages and take time to understand the implications of each of your decisions. W Be precise, double check everything. One good method is to assemble the code and look at the length when you type "Q" and exit to the Merlin command mode. If the length differs from the original then there is most certainly an error. W If you wish to see just where the original and your version of the code differ, there is an option in COPY ][ PLUS to "VERIFY IDENTICAL FILES." This routine will tell you the first byte where the two files differ. W Even if the file lengths are the same, it is still a good idea to compare the two object files to assure yourself that there are not compensating errors that cause the length to come out correctly. One final comment, the binary load address and length fields are part of the comparison done by COPY ][ PLUS. If the length is wrong then the two files will differ in the second or third byte. In order to see where the real difference lies, you might consider using a sector editor to change the length on the SOURCEROR generated object to that of the original. Again, Beneath Apple DOS, et al, are your best friends in this type of situation. 9.5.6 The Memory Full Message When the source file reaches within $600 of the start of SOURCEROR (that is, when it goes beyond $8200) you will see "MEMORY FULL" and "HIT A KEY" in flashing format. When you hit a key SOURCEROR will go directly to the final processing step. The reason for the $600 gap is that SOURCEROR needs a certain amount of space for this final processing. It is possible (but not likely) that part of SOURCEROR will be over written during final processing, but this should not cause problems since the front end of SOURCEROR will not be used again by that point. There is a "secret" overide at the memory full point. If the key you hit is CTRL-O (for override), then SOURCEROR will return for another command. You can use this to specify the desired ending point. You can also use it to go a little further than SOURCEROR wants you to, and disassemble a few more lines. Obviously, you should not carry this to extremes. Caution: After exiting SOURCEROR, do not try to run it again with a CALL. Instead, run it again from disk. This is because the DOS buffers have been re-established upon exit, and will have partially destroyed SOURCEROR. 9.5.7 The LABELER Program One of the nicest features of the SOURCEROR program is the automatic assignment of labels to all recognizable addresses in the binary file being dissembled. addresses are recognized by being found in a table which SOURCEROR references during the disassembly process. For example all JSR $FC58 instructions within a binary file will be listed by SOURCEROR as JSR HOME. This table of address labels may be edited by using the program LABELER. To use labeler, BRUN LABELER. The program will mention that SRCRR.OBJ is being loaded into memory, and present the main program menu. 9.6 Labeler Commands 9.6.1 Q (Quit) When finished with any modifications you wish to make to label table, press "Q" to exit the LABELER program. You will then be presented with a screen that asks you to hit "S" if you wish to save any modifications you have made to file, if not hit "ESC" to exit without saving the table. Be sure to read about the U command before attempting to save a labeler file. 9.6.2 L (List) This command allows you to list the present contents of the labeler file. After pressing L, hit any key to start the listing. At that point a screens worth of labels will be presented for your scrntiny. When you have finished with that page, hit any key to see the next. Typing CTRL-C will stop the listing. 9.6.3 D (Delete Label(s)) Use this option to delete any address labels you do not want in the list. After entering the D command, simple enter the NUMBER of the label you no longer want. If you want to delete a range of labels, enter the the beginning and ending label numbers, separated by a comma. Note that the label numbers below the deleted label will change when a label(s) is deleted. Thus, you should delete from the back towards the front, or type L to list the file after each deletion so you can ascertain the new line number. 9.6.4 A (Add Label) Use this command to add a new label to the list. Simply tell the program the hex address and the name you wish to associate with that address. When finished, press the RETURN key only, to exit the Add mode. 9.6.5 F (Free Space) Typing F causes the program to tell you how much free space remains in the table for new label entries. The number returned is in bytes remaining, not the number of new labels that can be added. 9.6.6 U (Unlock SRCRR.OBJ) Before saving a new label table, you will need to unlock the SRCRR.OBJ file. Use this command before using the Q(uit) command if you intend to save a new file. Merlin Sourceror ---------------------------------| $FFFF | | | | $D000 | | |--------------------------------| $CFFF | | | Apple I/O | $C000 | Soft Switches | |--------------------------------| $BFFF | | | DOS | $9AA6 | | |--------------------------------| $9AA5 | | | SRCRR.OBJ | $8800 | | |--------------------------------| $85FF | | | | $8200 | SOURCEROR work area $600 bytes| |--------------------------------| $81FF | | | | | | | Disassembled Source | $2500 | | |--------------------------------| $24FF | | | Area available to load | | programs for disassembly | $0800 | | |--------------------------------| $07FF | | | See memory maps in technical | $0000 | information section(section 8).| |--------------------------------|