Recomment Documentation

recomment is a small 6502 disassembler utility originally written by Jouko Valta in 1994. In 1998 A. Fachat introduced some patches to nice the output, improve table detection and more. The utility is written in PERL, so you need this one. The reassembler is under the GNU public license.

Options:

  recomment [-sym sym_outfile] [-hdr headerfile] [-html] [-noaddr]
         [-long] [-p1|-p2|-p3|-p4] [-addr start_address]
         [-mail mailaddress] [-verbose] [-quiet] [-labels] [-hints]
         programfile [outfile]

recomment takes the original 6502 binary file and produces a human-readable assembler source file. The output can be customized and even be put into html format.

Code detection and interpretation

Code detection and interpretation is done in several passes. recomment takes the header file given, and produces another, new header file (the symbol file). In the run it adds additional information. This symbol file can then be given to recomment in another run as header file.

If an opcode with a data address is encountered, this address is saved as data label. If a JMP, JSR or branch opcode is encountered the jump address is saved as execution label. These labels are used to switch between interpretation of data and text. It normally doesn't really work in the first run, but quite well in the second. But then still stray calls from the earlier run may produce warnings. Thus in the second run only labels that are really used are saved in the symbol file, to reduce those warnings in the third run.

Output customization

Output can be customized in several ways. The major goals of recomment is to add comments back to a binary file and save them in a more readable form. Thus it can produce text or html output. The text output can be customized for usage with a 6502 assembler.

You normally would want -html -long to produce html output, and -noaddr -labels to produce an assembler input file.

Header File Format

The header file consists of label definitions and comments (that include reassembler mode switches)

Normally the reassembler determines the mode (code or data) itself. But sometimes it needs some help, which is taken from the comment. The different modes with their keywords the reassembler can be in are In addition to the modes some hints can be given, that not exactly determine the mode until the next mode switch, but just switch - in automatic mode - to code or data

Example

For an example which is not included here, please look at the recomment homepage.


Theory

[ Here follows a text by the original author that describes a bit the theory behind the reassembler. I have removed the outdated parts, though. Comments in [] brackets by A. Fachat. ]

 ReComment V 4.03                                               3 Dec 1995

        Recomment -- an iterative database driven reassembler


  1. Theory of A Learning Database Driven Reassembler

When studying available machine language programs or in case a program
originally written directly in machine code needs to be rewritten with
assembler, a reassembler can be of great help. Of course, reassemblers have
been available for ages, but they get easily perturbed upon encountering a
piece of code by any advanced programmer, let alone other defects.

In '93 Marko Mäkelä developed an ultimate, fully recursive reassembler, namely
the "d65". Me, in turn, digged up my own reassembler I had written to study
and print programs contaminated with undocumented opcodes, and used it as the
basis when I needed a disassembler for this program.
Actually, ReComment still is a comment generator rather than a reassembler.

One special feature included on both ReComment, and on it's predecessor, is
recognising references to routines like PRIMM or "Print Text Immediate", which
is peculiar to C128 and some other of the latest Commodore models.
Another one I had never seen before is separating routines by underlining all
JMP's and RTS's (on both C128 80 col and printer), whereas ReComment prefers
printing blank lines after them.

It was also intended from the very beginning to implement searching for
conditional braches that always brach, and then handle them like JMP's, but it
had to be dropped because the amount of work involved was too much for the C128
to handle ...

To make it even worse, 6502 machine language allows a wide variety of ways to
misuse the opcodes. Complete istructions can be hidden in the operand to other
 -- non-effective -- instruction, or modified while the program is running.
[ Parts of those tricks have been addressed in the current version. ]
There are also so called undocumented opcodes (See file '64doc' for complete
details.), most of which are completely valid instructions, however.

Data blocks can be easily detected by Absolute and Absolute Indexed references
to them, and -- whenever the undocumented opcodes are forbidden -- by the
first unknown opcode encountered. In addition, BRK and JAM are quite unusable,
and thus they are always suspicious.
Any call to address with one of those (unsusable or forbidden) instructions can
be for sure determined invalid, and have data mode activated.

As a matter of fact, handling the "Print Text Immediate" mentioned earlier, is
the easiest task, as it always starts with certain JSR call, while the data is
terminated with a 00 byte.

If ingenious use of conditional branches sometimes confuses anyone studying the
code, it can be said that a reassembler gets perturbed by his code for good.
The most pessimistic reassembler might give up and declare everything as data
if e.g. BNE is immediately folloved by random data.
The easiest cases can be detected by keeping track on any Immediate LDA, LDX,
LDY, AND and ORA instructions. If any of these is not folloved by any label or
other command affecting the flags in any way, the branch may actually be
unconditional.

The way an ordinary two-pass disassembler (TPR) works, it that it just collects
any jump, brach and read/write references. The main disadvantage in this method
is, however, that data segments can be mistaken as executable program code,
whereas any entry point only called indirectly will not be found. Thus, errors
in the interpretation are inevitable, causing in the worst case more incorrect
references to be produced.

Mäkelä's idea to solve this problem was the following:
Each branch is tested by checking the code it refers to. If any error is
encountered there, the whole segment being tested is marked invalid as well,
and any references found on it are rejected.
Naturally, this method provides independependency of any external database.
However, excessive testing is required in order to obtain 100% confidence.


 Implementation

The main goal of this program is adding comments to the system disassembly
listings, mainly by using the variety of memory maps available.

Thus, it was intentional choice to use the opposite way as in d65, and only
make an ordinary reassembler. Instead of running the code on a CPU emulator,
ReComment just wanders trough the code in order, collecting any references
to subroutines and data segments.
[ However, saving the references found and using them in a second pass
gives quite some impressive results, esp. when more than two passes with
the appropriate options are used. ]

Fortunately, the power requirement problems of earlier versions were solved by
the power of the average Unix machine and the invincible flexibility of PERL
programming language.

The main difference to any other reassemble is the way of using the database;
it's the pivot of ReComment. This makes it possible to produce commented
disassembly very quickly (assuming you have the data available), but on the
other hand, ReComment won't work very well without the exact memory maps.
[ This has been improved, though. ]
This also allows disassembling only one version of the program per one header
file.  ):

Alas, the "misassembler", like any other reassembler has one typical problem
which is not present on the recursive reassembler: the Indirect Addressing
modes. When thew origin of an array has offset greater than zero, the refence
may be created within a valid subroutine, whereas the real data block won't be
marked at all.

There is still one more factor that has not been utilised yet. References of a
certain type can be forbidden on some areas of memory. E.g. jumping to the
screen memory or I/O area doesn't belong to the characteristics of an average
program. However, this has very little significance in practice.

[ Section 2., "Usage" has been outdated by the above text ]
[ Section 3., "File Formats" has been outdated by the above text ]

  4. Reasoning System


        Separating Different Routines

        If an unconditional jump instruction (JMP, BRA, RTS, RTI) is
        encountered, and there isn't any conditional branch over it,
        it most probably is the end of current routine.
        In this case, a blank line is printed.


        Forbidden Instructions

                BRK
                JAM
                Branch *-1
                Branch *-2



        DATA segments

        There are 5 types of data recognised by recomment. Each can have
        different formatting rule to increase readability.


        EMPTY segments

        EMPTY marks unused or non-existent memory area. Usually these areas
        are filled with FF, AA or sometimes 00. Non-existent memory locations
        return the high byte of their address on most 65xx processors.

        It is also possible to have patch code on some of these areas on
        later revisions. Recomment wants the change made in the header file
        according to whether the pach code exists or not.


  5. Error Messages

        [ Overworked to the new version ]

        Message                         Type            Reason
        -----------------------------------------------------------------------

        You don't exist. Go away!       Fatal           $USER undefined
        No host. Where are you?         Fatal           $HOST undefined

        Cannot locate comments/headers.                 Unused
        Can't open program file '...'   Syntax          Binary file missing or
                                                         unreadable

        Can't open header file '...'    Syntax          Map file missing or
                                                         unreadable
        Warning: Duplicate header/Hint: '...'
                                        Auto            recomment cannot handle
                                                         more than 1 title or
							 mode switch per
                                                         address.

    Notes in the output


        Invalid reference XXXX ignored. Informational	execution reference on
							 an illegal opcode.

        Reference mismatch for XXXX.    Informational   DATA reference into
                                                         current instruction or
                                                        CODE reference to DATA.

        XXXX: Endless loop.             Informational   Branch with offset -1
                                                         encountered in code.

        XXXX: Ignored CALL reference.   Informational   Branch to forbidden
                                                        instruction encountered

        XXXX: CODE TO DATA attempted.   Informational

        XXXX: Illegal instruction.      Auto            Illegal instruction
                                                         encountered in code.

        XXXX: TEXT immediate            Auto            Encountered text string
                                                         within program.

        XXXX: ADDRESS DIFFER. This may indicate misassembly
                                        Error           While in WORD mode,
                                                         the next reference is
                                                         not WORD-aligned.


        define label: sss = XXXX        Debug/Verbose
        XXXX: autodefine label: XXXX    Debug/Verbose


        ; *** ERROR: Descending address: XXXX ***
                The input Memory map must be in strictly ascending order.

        ; *** XXXX: CALL ADDRESS ALIGNMENT. This may indicate misassembly ***
                General error. Either it's Bad programming style, i.e.
                using command masked out, or recomment is confused.

        ; *** Resyncing ***";
                'ADDRESS DIFFER' is replaced with this message whenever there
                is a Memory Map entry provided for the offending address.


The above part is by Jouko Valta.


17 Feb 1998
A. Fachat (a.fachat@physik.tu-chemnitz.de)