Decompiling InstallShield Scripts
(work in progress)
progcor
Programmers
04 March 1998
by zeezee
Courtesy of Reverser's page of reverse engineering
slightly edited by reverser+
fra_00xx 
98xxxx 
handle 
1100 
NA 
PC
Here it is, with a little delay :-( Zeezee's work on installshield decompiling!
I don't believe that this work is obsolete. Quite the countrary: it offers (IMO) a welcome ADDITION to NaTzGUL's beautiful tool and to NaTzGUL's master essays.
In fact, as I have stated more than once, I believe that if we have more capable reversers working on the SAME subject we get (almost always) as many DIFFERENT ways to tackle the same solution. (At times we even get different solutions!)
Since that what interest us are mostly the TECHNIQUES that should be used and not the solutions in themselves, this 'doppeltgemoppelt' approach offers -on our Web- quite some advantages.
Enjoy!
There is a crack, a crack in everything That's how the light gets in
Rating
( )Beginner (x)Intermediate (x)Advanced ( )Expert


This essay has become obsolete one day after it was written. Nights I sent it to Reverser+ and on the morning the wonderful
decompiler by NaTzGUL was available. The idea behind this essay was to help develop InstallShield decompiler and
since it is now present, this essay is purely informational.

This essay has no use to all except those willing to decompile InstallShield scripts.

 I will explain here where you can find InstallShield script compiler code generator tables.You will find from the disassembly how modern compiler encodes code generation procedures etc.


Decompiling InstallShield scripts and guidelines for decompiler writers
(work in progress)
Written by zeezee
 

Introduction

  

Decompiling InstallShield scripts is not cracking. It's pure reverse-engineering. 
It may be more - a lesson how good compiler is built, how expressions are translated 
into machine code, how they are executed, how a modern compiler implements
token search algorithms and more.
It's impossible to cover all these topics in one short essay, but I think
it may be a foundation for other crackers and +crackers to delve deeper into
this subject (assuming anyone is interested in pure reversing).
Goal of this essay: To give the reader some basic information about InstallShield compiler.
Future Goal: Someone writes an InstallShield scripts decompiler :).
>>>Author's note five days later: There IS now InstallShield decompiler!
Before reading this: Read this essay by NaTzGUL. I'm assuming here that the reader knows what InstallShield script is and where it can be found (hint: setup.ins ;). 
Tools required
IDA 3.7 (full, 'quined' or freeware)
Wdasm will work also, but we will heavily rely on cross-references, which in IDA are only a mouse-click away
 

Target's URL/FTP
http://www.installshield.com You must register first to download lite version.
This is not absolutely necessary to perform our work, but helpfile may help a lot explaining built-in functions syntax, parameters etc.
The compiler engine available for download is sufficient to follow this essay.
InstallShield Compiler main engine Download it here (109k).
InstallShield setup.exe (659k) - get it from any CD you have handy...

Essay

I'm assuming here that the reader has all our tools ready and will follow all steps
described below. I don't attach any listings. If you want to follow, just
load setup.exe and compiler.dll into IDA (one after another), wait until autoanalysis is
finished and save resulting IDB files.


When reading above mentioned NaTzGUL's essay (excellent, btw.) I tried to delve
deeper into compiled script. First I disassembled setup.exe (this is the script
interpreter 
if you forgot) and located the command scanner:

At 00420E84 you will find opcode fetch, parameter determination and, finally,
a jump to the command service routine.

Table containing addresses of procs (opcodes 000..1C6) starts at 00495FC8.
For each command there is a record:
   byte  parameters type
   dword procedure_address
When you have time, name these procedures like cmd_xxx where xxx is opcode.

eg. at 0049609B you will find a record containing (2, 0042AC02), which
corresponds to the command 02A - MessageBox.
02C is goto, 02F is strlen, 033 is Exec etc...

This was found quite easy, but then comes a more tricker part:

Where can we find real names of the functions?

After short thinking I decided to reverse the InstallShield compiler.
Of course the compiler must be located anywhere first.

First solution:   Get it from www.installshield.com (Lite version).
Second solution:  Get in from the Web (you know how to search).
Third solution:   Get Visual C++ 5 or other CD containing InstallShield Lite 
                  as an added bonus.
Fourth solution:  Get it here (DLL only for reversing).

Our target is now compiler.dll, main compiler module called from the IDE and
probably from the command-line script compiler which I don't have.

The essential part of the compiler is lexical analyzer which isn't so
interesting for us. More important is the code generator and token analyzer.
Remember that this compiler does not generate pure machine code. It generates scripts
which are interpreted during actual setup process by setup.exe.
The code generator contains of
   [deep breath here]
      several hundred pointers to linked lists of records containing pointers to
      code generation tables
   [deep breath end]
(sounds nice, ehm).

I'll explain it briefly.

Short definition: Token: something that can be a keyword, function name,
  number, variable name etc,
  eg. tokens are:
  goto, MessageBox, IS_OS2 etc.
If you are purist, replace this definition by yours.

Assume the compiler has completed the token MessageBox. It searches a service
for it using nice pointer table contained in 10033408..10033807. Looks like
a tree...

These 256 dwords are almost all pointing to linked lists of structures.
Let's see eg. where points dword at 10033650. Click on reference and you land in
10031938.
Make 4 dwords from the following data and you quickly discover the token table
structure:

address 10031938:                      actual value

   dword *next_list_element_pointer   10031948
   dword *token_string                "ConfigAdd"
   dword command_flags                00000201
   dowrd *token_service_data          1002E898

 command_flags are 0201 in all functions and built-in constants, other values
 in this field are meaining that token is a keyword (eg. 'case' has 0A01)

 token_string is simply this what are we searching for: ASCIIZ name of function or constant.
 token_service_data are the data for the code generator. These data are several
 (mostly 3 for constant or 4 for functions) words, eg.

-  for functions:
   word 2                    (this means: token is a function)
   word function_opcode
   word ?                    (maybe parameter count?)
   word ?

-  for compiler predefined constants:
   word 0                    (this means: token is a predefined constant)
   dword constant_value.

Let's follow the linked list in above example. At 10031948 we have:
    dword pointing to next list element
    dword pointing to "MessageBox"
    dword 00000201 (function/constant)
    dword pointing to 1002E8A0

And at 1002E8A0 we have:
    word 2 (function)
    word 2A (opcode for this function in script)
    word 2 ???
    word 2 ???

Of course IDA 'Xref clicking' is much faster than reading this 
step-by-step description.

So, we have found an opcode for MessageBox which is (surprise?) equal to this
one found by reversing setup.exe (see above).

Once again, here is a quick way to find an opcode for built-in function or
 predefined constant value:

1. Locate ASCIIZ name of searched token (Alt-B in IDA). Remember to find this
   keyword _exactly_ as it should appear.
2. Click on Xref - you will land in linked list
3. Check if dword after is 00000201, if not you are searching for a keyword...
4. Click on Xref of dword below - you will land in token_service_data.
5. Second word is the opcode.

Let's check it again for LaunchApp (should be simple exec, isn't it?)
Alt-B, "LaunchApp" (check case-sensitive, otherwise slow search) -> 1002C2FC.
Click on Xref: 1003198C
Below we have dword 00000201 (or press 'D' 3 times to make dword)
And then kinda offset - click on it -> 1002E8BE.
First word is 2 - it's a function
Second word is 33 - it's the opcode.

BTW, when you return to the disassembly of SETUP.EXE and find command code 033
service procedure you will find quickly that we're right...

Author's note five days later:
{
  You may experiment with IDA script language to correctly display all these records.
  There is no need to experiment further. The decompiler is already written.
}

So... A little bonus for all going so far with this essay!

If you want to see the token table compile the file below.
---cut here, don't destroy your monitor with scissors!--- // InstallShield COMPILER.DLL token table dump // by zeezee // Warning: Compile using 16-bit segmented model. Good old BC3.1 will do. // sizeof( int ) = 2 // sizeof( long ) = 4 #include&nbsp;<stdio.h> #include&nbsp;<stdlib.h> // known offsets in the COMPILER.DLL file #define TABLE_START 0x1002F328l #define TABLE_END 0x10033408l #define LOAD_OFF 0x10000E00l // diff load_address - file_offset void main( void ) { FILE *cfil; char sbuf[80]; unsigned long ptr_next, ptr_name, ptr_serv, flags, fptr; unsigned int token_type; unsigned long opcode; cfil = fopen( "compiler.dll", "rb" ); if( cfil == NULL ) { printf( "Cannot open COMPILER.DLL\n"); exit( 1 ); } fptr = TABLE_START; // We are now pointing to the start of the list // Each element has the form: // dd *next_element // dd *asciiz_name // dd flags // dd *service_proc // Since we know that all list elements are consecutive, we are treating // the list simply like table ignoring pointers to next elements do { fseek( cfil, fptr - LOAD_OFF, SEEK_SET ); // read these 4 dwords fread( &amp;ptr_next, 4, 1, cfil ); fread( &amp;ptr_name, 4, 1, cfil ); fread( &amp;flags, 4, 1, cfil ); fread( &amp;ptr_serv, 4, 1, cfil ); // read name fseek( cfil, ptr_name - LOAD_OFF, SEEK_SET ); fread( sbuf, 1, 79, cfil ); // read function name // read type and opcode fseek( cfil, ptr_serv - LOAD_OFF, SEEK_SET ); fread( &amp;token_type, 2, 1, cfil ); fread( &amp;opcode, 4, 1, cfil ); // may be opcode or constant value if( token_type != 2 ) // constants and variables { printf( "F=%04X, O=%08lX, T=%01X, N=%s\n" , (int) flags, opcode, token_type, sbuf ); } else // functions { printf( "F=%04X, O=%04X, P=%01X, T=%01X, N=%s\n" , (int) flags, (int) (opcode &amp; 0xffff), (int) (opcode >> 16 ), token_type, sbuf ); } fptr += 16; // point to next element in the table } while( fptr &lt; TABLE_END ); fclose( cfil ); } ---cut here---

and here short doc to the above

   indis.exe - InstallShield COMPILER.DLL token table dump - by zeezee

Usage:
Place into directory where COMPILER.DLL exists
Works only with version which is exactly 260096 bytes long and dated 22.01.97.
When using other version you must find and change table start address!

Command line: indis >indis.dat

For each function/constant/variable a record is generated eg:

For predefined values:

    F=0201, O=000000CA, T=0, N=FREEENVSPACE

    where:
    F - flags
    O - constant value (long)
    T = 0 for constants, 1 for variables (?)
    N - name

For built-in functions:

F=0201, O=0000, P=5, T=2, N=StructGetAddressEx
    F - flags (other than 201 only for keywords)
    O - opcode
    P - param count (?)
    T = 2 for built-in functions
    N - name

Enjoy!
zeezee
Final Notes
  Greets to NaTzGUL for his brilliant essay.
>>>five days later: ...and for wisdec decompiler!
zeezee (not +zeezee yet, but I hope I will earn this someday ;)
Ob Duh
The standard Ob Duh does not apply here. We're only reversing, not cracking.

You are deep inside reverser's page of reverse engineering, choose your way out:
progcor
Back to progcor

red homepage red links red search_forms red +ORC red students' essays red academy database
red reality cracking red how to search red javascript wars
red tools red anonymity academy red cocktails red antismut CGI-scripts red mail_reverser
redIs reverse engineering legal?