S-C DisAssembler The S-C DisAssembler (SCDA) is a ProDOS-based tool for generating assembly language source code from a binary or system file. SCDA operates in conjunction with the S-C Macro Assembler, to convert 6502 and 65C02 machine code into source code files in the "S-C" format. Here are some of the features: * Input is from one or more binary object files, including file types BIN and SYS. * Output is to one or more "S-C" type (compressed source code) files. * Generates comment lines before each label, listing all references to that label. * Disassembly is "script" driven, allowing incremental enhancement as knowledge is gained about the program being disassembled. * Input files may be positioned to specific starting addresses. * Decodes ProDOS "MLI" calls as such. * Allows pre-named symbols up to 32-characters long. * Comes with complete commented source code, allowing you to understand how it works and make your own personal extensions. Of all the features, the most important may be the "script". This is essentially a "program", written in "disassembly language". The script allows you to define which input files to include and which output files to generate, to name symbols such as monitor entry points and major subroutines in your program being disassembled, to define table areas, and even to insert comments. The script itself is written using the standard S-C Macro Assembler, and may be saved on a source file just as an assembly language program would. As you gain knowledge about the program you are disassembling, you can add lines to the script. Disk Contents: The disk is in ProDOS format, with a volume name of /S.C.DISASM/. The disk is not protected in any manner, and is fully copyable. It is a good idea to make a backup copy right away, and put the original in a safe place. The main file of interest on the /S.C.DISASM/ disk is S.C.DISASM. This file is the S-C DisAssembler, ready to run. I suggest you use FILER, System Utilities, or some other file copying program to make a copy of this file on your working disks. When I am working on a disassembly project, I usually make a special diskette which contains the object code I am tearing apart, the tools I am using to do the tearing, and the resulting source code files. The SCDA disk also contains all of the source code for the S-C DisAssembler in the format of the S-C Macro Assembler. The text file ASM gives you a short way to run an assembly of all of these source files; it contains two simple lines: "LOAD D.ACF" and "ASM". The file named D.ACF is a control file, containing mostly ".INB filename" lines which pull in each of the other source code files during assembly. I have included several sample scripts with associated binary files, which you can look at, study, play with, and so on. SCRIPT.F800 is a fairly complete script for the Apple monitor from $F800 through $F881. It reads BIN.F800 and produces SRC.F800. SCRIPT.FILER disassembles a particular section of the Apple FILER program. The portion is located at the end of the FILER file, starting at position $5E00 in that file. This section of code is relocated to $0800 when FILER is executed, so the disassembly uses an origin of $0800. For licensing reasons, I did not include the Apple FILER program on this disk; however, it is included on the S-C Macro Assembler disk. SCRIPT.SIDER disassembles the file named B.SIDER. This is a copy of the firmware used with the SIDER hard disk (version C). The script produces four output files, which can be assembled by using an assembly control file which names all of those files on ".INB" lines. The file named SIDER.ACF is just such a control file. (Before trying to assemble, you will have to delete the ".OR $C800" line from the generated file SIDER.MAIN, because the correct .OR and .TA lines are part of SIDER.ACF.) I have also included a script file which includes the most commonly used equates from the Apple monitor, SCRIPT.MONDEFS. You can use it as the seed for working with large programs which make heavy use of the monitor entry points. Operation: Once you have developed a "script" for a program you wish to disassemble, a process which is explained below, the rest is easy. With the S-C Macro Assembler in operation, the script in memory, and the files to be disassembled online, you simply BRUN the S-C DisAssembler. If the disk containing SCDA is online, you simply type "BRUN S.C.DISASM" or "-S.C.DISASM". If you do not have enough drives for the SCDA file and your own source and object files to be online at the same time, then first mount the SCDA disk and type "BLOAD S.C.DISASM"; then mount your own disk(s) and type "$800G". When I am working on a disassembly project, I find it useful to set up the S-C Macro Assembler "." command to start SCDA. The "." command is vectored through a JMP instruction at $800F, so you can patch it to start up the DisAssembler at $800 by patching the address $0800 at $8010: $8010:00 08 Then any time you wish to start executing a disassembly script, simply type a "." at the colon prompt, and hit RETURN. Just be careful not to use the command if you have not got the DisAssembler in memory! The S-C DisAssembler operates in two passes: your script is executed twice. During the first pass a symbol table is built in RAM, together with a cross reference list. During the second pass the generated source code is written onto one or more output files. The source code is written in S-C Macro Assembler format: each line has a line number, followed by label, opcode, and operand fields. SCDA starts with line number 0001, and continues with an interval of 1. The output file will be type "S-C" in the catalog when you are in the S-C Macro Assembler. [When you are in BASIC.SYSTEM, the file type will display as "INT". In the Beagle Brothers Applesoft Compiler, the file type will display as "COM".] During pass one SCDA displays the message "PASS 1" and three running addresses. The first address is the current location counter; the second, the current address of the top of the cross reference symbol table; and the third, the current address of the bottom of the pre-defined symbol table. These are displayed for your curiosity, and to give you an idea of how much memory is still available for expanding your script. The memory between the two symbol tables is the free memory. During pass two SCDA displays the message "PASS 2" and one running address, the current location counter. You will notice slight pauses while SCDA is running when it reads from or writes to the disk. Scripts: A disassembly script is made up of command lines, with one disassembly command or comment on each line. You create and modify a script using the S-C Macro Assembler, with either the standard line editor or the Laumer Research Full Screen Editor, in the same way you create and modify assembly language programs. Scripts can be saved on files using the SAVE command, and loaded for use with the LOAD command. The special "auto-SAVE" comment line which we recommend at the beginning of assembly language source files will also work in disassembly scripts. There are currently eleven different commands which can be included in scripts. Future versions will include more commands, in order to allow more precise control over the disassembly of data regions, as well as other desirable features. Most of the commands consist of a single letter, followed by a colon, followed by whatever parameters the commands require. Other characters may be include between the command character and the colon; for example, you may spell out the command or even make a comment. Here are the commands: I:pathname The I-command tells the DisAssembler to begin reading object code from the specified pathname. If you do not specify a complete pathname, the current ProDOS pathname-prefix will be used. If there is no current pathname-prefix, the last-accessed slot and drive will be used. You may use more than one object-code input file in a script. Each I-command causes the closing of any previous input file, and the opening of the new one. It will open positioned at the first byte: if you wish a different starting position, use the P-command after the I-command. Examples: 1000 I:/HARD1/FILER 1000 INPUT:SCASM.SYSTEM O:pathname The O-command tells the DisAssembler to close any previous output source-code file, and open a new one. By using more than one output file, you can partition the source code into a series of files which you will assemble using the ".IN" or ".INB" directives. Example: 1010 O:SOURCE.FILER P:hex The P-command positions the input object-code file to the specified byte in the file. The hexadecimal value you specify actually is used in an MLI Set Mark call. Examples: 1020 P:5E00 1020 POSITION to $800 part:5E00 L:ZIX hex-hex The L-command defines which prefix letters to use for zero-page symbols (Z), within the hex-hex range (I), and non-zeropage symbols outside the hex-hex range (X). I generally use Z, I, and X; however, you may use any letters you wish. For example, assume the code I am disassembling is located during execution between $800 and $3FFF, and I want those symbols to start with P; further assume I want zeropage labels to begin with Z, and external references to begin with R; then the command would be: 1030 L:ZPR 800-3FFF The L-command is required if you wish to have any internal labels of the form "I.xxxx". Without the L-command to specify the range of internal labels, they will all be classified as external labels. XREF:OFF XREF:ON The XREF-command lets you turn the cross reference line generation off, or back on. If you do not include an XREF command in your script, cross reference lines will be generated. If the XREF:OFF command is placed before any C- or H-commands, no cross reference lines will be generated. This saves a lot of memory, and will allow you to disassemble larger programs with a single script. Extremely large programs can be disassembled without the cross reference lines, and then you can use the S-C XREF program (a separate product available from S-C Software) to generate a complete cross reference listing. If you include several XREF commands, switching the option off and on and off and on, the cross reference lines will not be complete; they will be omitted when the option is off, and may only list part of the references when the option is on. W:hex The W-command lets you specify the width of the label field. The default width is six, which allows labels up to six characters long to be written on the same line with the corresponding opcode or data. Longer labels will be written on a line by themselves, with the corresponding opcode or data following on the next line. You may specify a value (in hex) from 0 to 3E. A value of 0 will force all labels to be written on separate lines, while any value over 1F will allow all labels to written on the same line with their opcode or data. C:hex1-hex2 C:-hex The C-command tells the S-C DisAssembler to create source lines for 65C02 or 6502 code from current location to -hex, or from hex1 to hex2. A future enhancement to the DisAssembler will be the ability to disassemble 65816 opcodes. The bytes to be disassembled come from the input object-code file. If the disassembly address (hex1) changes from the current address, an origin directive (.OR $hex1) will be generated. Examples: 1050 C:800-843 1070 C:-89A H:hex1-hex2 H:-hex The H-command tells the S-C DisAssembler to create source lines for a data region from current location to -hex, or from hex1 to hex2. Generates up to eight bytes per ".HS" line. Future versions of the DisAssembler will provide more extensive data-disassembly capability. The bytes to be disassembled come from the input object-code file. If the disassembly address (hex1) changes from the current address, an origin directive (.OR $hex1) will be generated. Examples: 1040 H:844-84F 1060 Hex:-9FF Normally C- and H-commands will alternate back and forth in a disassembly script. "comment Lines beginning with a quotation mark generate a comment line in the output source-code file. The generated line will use an asterisk in place of the quotation mark, and the comment will copied. For example, the command: "This is a comment. would generate, with the appropriate line number: 0347 *This is a comment. =hex,name Lines beginning with an equal sign generate symbolic labels to be used for particular hexadecimal values. All of the =-commands should come before any C- or H-commands in the script for best results. Names for values which are outside the range specified on the L-command line will cause zeropage and external label equates to be generated, of the form "symbolname .EQ $value". The definition lines will be generated in the appropriate position in the output source-code file. Zeropage labels will be generated first, in numeric order. Next come any external labels which precede the program being disassembled, in numeric order. External labels for values higher than the L-command range will cause ".EQ" lines to be generated after the C- and H-commands have all been processed. Names for values which are inside the L-command range will cause internal labels to be generated. Each internal label will be generated when the location counter reaches the value of that label, during the processing of the C- and H-commands. If a label defines the beginning of an opcode, as is usually the case, that label will be generated in the label field (if it will fit), or simply be generated on a line by itself without any ".EQ value" following. If the label defines a value inside a multi-byte instruction line, it will be defined after the instruction line with a ".EQ *-1" or ".EQ *-2" definition. Values which are referenced but not given special names by the =-command will receive labels using your selected prefix letters with the hexadecimal value. Only one name can be given to a particular value. If you try to define more than one name to the same value, SCDA will quit with the "EXTRA DEFINITION FOR SAME VALUE" error message. On the other hand, it is all right to use the same label over again. When you re-assemble you will get multiple-definition errors, unless the labels you re-used were legitimate local labels (a period followed by one or two digits). Examples: 1050 =36,CSWL 1060 =37,CSWH 1070 =FDED,MON.COUT 1080 =28,BASE.ADDRESS 1090 =29,BASE.ADDRESS+1 *comment Lines beginning with an asterisk are comments within the script itself, which help to document the script. These do not affect the disassembly process in any way. One special kind of comment line allows the use of the S-C Macro Auto-SAVE feature. Type the "*", six pairs of ctrl-O,backspace characters, and then "SAVE" and your filename. If this comment line is the first line of your script, typing Escape-S will cause the SAVE command to be displayed on the screen; then typing a RETURN will cause the SAVE command to be executed. This technique is very useful in being sure you always SAVE a script where it belongs. The simplest possible script would include only four lines: one each to specify an input object-code file and an output source-code file, one to specify the prefix characters and the range for internal labels, and one to specify the address range and type of disassembly. For example, 1000 I:OBJECT 1010 O:SOURCE 1020 L:ZIX 800-1FFF 1030 C:800-1FFF Examples of more complicated scripts are included on the SCDA disk. Automatic *---- lines The S-C DisAssembler automatically inserts comment lines to separate subroutines. The lines are inserted after an RTS opcode, and after JMP opcodes. The separation lines are the same as generated within the S-C Macro Assembler by typing Escape-L after the line number: an asterisk followed by a line of dashes. Cross Reference Lines The S-C DisAssembler is somewhat unique in its ability to generate imbedded cross reference information as comment lines in the output source-code file. These lines consist of an asterisk (to signify a comment line), the address of the label which follows in parentheses, and a list of addresses from which this label is referenced. If the address list is too long for an 80-column source line, additional lines will be generated. The cross reference lines prove to be extremely helpful in analyzing disassembled programs. Nevertheless, you may not wish to see the cross reference lines. You can turn off this feature using the "XREF:OFF" command in a script. One reason for turning it off is to save memory during disassembly for an extremely large binary file. Delta lines The S-C DisAssembler makes every effort to generate every label necessary to assure error-free re-assembly. Sometimes this includes generating lines of the form: 1234 I.087B .EQ *-1 These are called "delta equates", and you will normally only see these when you are disassembling self-modifying code, or code that uses values within operand fields of instructions as data. Sometimes it occurs because of offset references to data tables, such as the following: LDX POINTER TABLE INDEX + $80 LDA TABLE-$80,X When disassembling the second line, SCDA would generate a label reference to TABLE-$80, which might be inside a code area, and might cause a delta reference. Whenever you find a delta reference, or any label that is inside a code area but referred to as data by the program, you might suspect offset table references. Memory Usage: The S-C DisAssembler loads at $800, and uses all the memory between there and the bottom of your script. Your script is maintained by the S-C Macro Assembler starting at HIMEM-1, which is usually $73FF, and going down toward $800. (If you are using the Laumer Research Full Screen Editor, it eats up more memory and lowers HIMEM.) 800-17xx S-C DisAssembler from there up... the cross reference symbol table between free memory from xxxx down the pre-defined symbol table xxxx to 73FF your script 7400-BFFF SCASM.SYSTEM and file I/O buffers 00-1D various pointers 73,74 point to end of script+1 (HIMEM) CA,CB point to beginning of script Note that running SCDA will overwrite any symbol table left over from an assembly, and assembling any program will overwrite SCDA. Error messages: The S-C DisAssembler aborts disassembly when an error is detected. You may see any of the following error messages. SCDA also displays the line in the script which caused the error. NOT A VALID COMMAND...............The first character of a script line is not one of the valid command or comment characters. SCRIPT LINE TOO LONG..............The maximum length of a script line is 80 characters, not including the line number. MISSING COLON.....................The colon oafter the command character is missing. You are allowed to include any other characters you wish between the commmand character and the colon; for example, you might wish to spell out the command names. MISSING COMMA.....................The comma is missing in the "=" command. MISSING HEX VALUE.................A required hexadecimal value is missing. HEX RANGE BACKWARDS...............The address range in a C- or H-command is backwards. The lower address must go first. If you have specified a continuation, as in "C:-1234", and the hex value is behind the current location, you will get this error message. EXTRA DEFINITION FOR SAME VALUE...You have tried to give more than one name to the same value with an "=" command. MEMORY FULL.......................The predefined symbol table and the cross reference symbol table have met in memory. This means you are going to have to do something to reduce the memory requirements. On option is to break the disassembly into separate parts, so that the symbols will all fit in memory. Another option is to eliminate the cross reference lines by using the XREF:OFF command. !np BAD PATHNAME......................The pathname on an I- or O-command is either missing, or does not specify a complete path. If the volume name is not specified and there is no prefix, SCDA attempts to complete the pathname by using the volume name it finds in the most recently accessed drive. If there is no ProDOS volume in that drive, you will get this error message. OUTPUT FILE WRONG FILE TYPE.......The output file must be of type "S-C" ($FA). If there is an old file by the same pathname but a different type, you will get this error. POSITION BEYOND END OF FILE.......You are attempting to position (P-command) or read (C- or H-commands) past the end of the input object-code file. READ PROBLEM ($XX)................ProDOS MLI error $XX when READing the input object-code file. WRITE PROBLEM ($XX)...............ProDOS MLI error $XX when WRITing the output source-code file. CREATE PROBLEM ($XX)..............ProDOS MLI error $XX when CREATing the output source-code file. OPEN PROBLEM ($XX)................ProDOS MLI error $XX when attempting to OPEN either the input or output file. ------------------------- THE END