Inside the VB3 .EXE
visualbasic
Visual Basic 
8 March 1998
by _Duke_
Courtesy of Reverser's page of reverse engineering 
 
A fundamental essay by Duke, that has the huge merit of setting the point for visual basic reverse engineering where it belongs: inside its elements (tokens) its forms and controls. Reading this you will be tempted to start a marvellous journey inside all visual basic programs... by all means do it, and bring back to us your discoveries... enjoy!
There is a crack, a crack in everything That's how the light gets in 
Rating 
( )Beginner (x)Intermediate ( )Advanced ( )Expert 

An exploration of the inner workings and structure of the Visual Basic 3 Executable file

Inside the VB3 .EXE

by _Duke_ 
 
Introduction

The following essay is not intended to ba a "How to crack VB programs" essay but I will show you exactly HOW a VB program is protected from de-compilers. It is important that you have a working knowledge of programming using Visual Basic in order to understand the essay and more importantly, to follow the source code of the programs you de-compile. Although this essay covers an older version of VB, there are many programs out there which have yet to be cracked. It also serves as a starting point to understanding the later versions.

Tools required

1) Visual Basic 3- For Compiling our own test programs.

2) A Good Hex Editor- Use one that will let you Binary Compare two files and display the differences.

3) SoftIce- Of Course!

4) MAKE_MAK.EXE or DoDi's VBOPT to "Protect" our programs.

5) A VB DeCompiler. (Is there anything but DoDi's??!!)
 
Essay

As many people have discovered, a VB program isn't really a 'Program' in the traditional sense of the word. Visual Basic is an 'Interpreted' language. What this means is that the program is stored in a 'higher level' language than the machine's native code. It is the job of the interpreter to read back and execute this higher language AT RUNTIME. Most other languages (Such as C) are stored in native code and need nothing to translate them. In case you didn't know, the VB interpreter is VBRUN300.DLL (No wonder all the VB programs need it to run!!) This is the REAL program that is running. Any Softice breakpoints you set for the 'standard' Windows routines will ALWAYS return you to VBRUN not to the EXE! The interpreter is reading the contents of the EXE, translating the TOKENS, and executing various subroutines to perform the desired task. A VB program, therefore, cannot be disassembled by the standard tools. Softice is pretty much useless here unless you like to follow the spaghetti inside VBRUN. The program can however be de-compiled back into VB source code thanks to DoDi's VBDIS. It is available on the net as shareware but I STRONGLY recommend you get (and pay for, its worth it) the full version if you are serious about R-E'ing VB programs. This decompilation is possible thanks to Micro$oft's including information in the executable that is not needed for the program to run. Now why would they do that?

 The VB executable is made up of the same basic parts as other windows programs:

     DOS HEADER: This is provided for backward compatibility of the EXE file format.

     STUB PROGRAM: Checks if Windows is running. Provides an error message if the
               program is being run from DOS.

     WINDOWS HEADER: This section provides important information about the EXE to
               the operating system. Some of the more important locations are:

               OFFSET (hex)        FUNCTION
          ---------------------------------------------------------
               14        Initial value of CS:IP
               1C        Number of Segments
               22        Relative offset to Segment Table (typ. 40)
               24        Relative offset to Resource Table
               3E        The expected Windows version
***Note About Hex Editors: There seems to be a difference in opinion as to the 'START'
of a program. Some editors call the start byte 00, while others consider it byte 01.
If the addresses you are looking at just don't seem right, try shifting 1 byte to the
right or left.

For a good reference on the Windows Header, look in the WIN SDK help file WIN31WH.HLP and look under "Executable-File Header Format"

A short VB program (1 form/module) will typically contain 4 entries in it's segment table referencing 3 segments (one can be ignored). One of the segments, usually located just after the Windows Header, is a single CALL instruction which transfers
control to the interpreter. THIS IS THE ONLY CODE IN THE VB PROGRAM THAT RUNS!!!!!!  The other segments point to the Tokens themselves and a section which specifies how the tokens are structured into the various Subs and Functions.

Resources are 'packages' of data in a pre-defined format which a program will access. Examples of resources are Icons, Fonts, and Menus. In a VB program, they are also used to reference Forms and other 'Data' sections of the program.
 
 
 

Some Hands On

** For this section of the lesson, you will need CALC.EXE compiled from the samples that come with VB3, it should compile to 9020 bytes. Or download it here within +Reverser's page.

Start your favorite Hex Editor and load CALC.EXE. Examine the following sections as I describe them. I have found it easiest to print the whole file in hex starting from the windows header and use colored markers to see what the sections 'look' like.

    0000-003F DOS HEADER- Note the 06 @ 003D; This is the start page of the Windows Header.

    0200-049F Stub Program- This code only runs from DOS.

    0600-07FF Windows Header- Lets look more closely:

          0614- Initial CS:IP = 10 00 01 00 
               This translates to 10 bytes past segment 1

          061C- # Segments = 04 00

          0622- Offset to Start of Segment Table = 40 00 
               Segment table starts @ 0640, segments are 4 words long

               Segment 1 @ 0640 - 08 00 19 00 50 1D 19 00 
                    This means: 
                         The segment is located @ 0800
                         The segment is 0019 bytes long
                         1D50 - Flags (more later)
                         The segment need 0019 bytes of memory

               Segment 2 @ 0648 - 00 00 00 00 11 0C 02 00 
                    Ignore this segment definition

               Segment 3 @ 0650 - 0F 00 50 02 10 1D 50 02
                    This segment @ 0F00 is the 'Sub Structure Table'

               Segment 4 @ 0658 - 09 00 D0 50 10 1C D0 50
                    This segment @ 0900 is the Tokens (the 'Code')

          0624- Offset to Resource Table = 68 00
               Table starts @ 0668:

               Word @ 0668 = 08 00 - This is rscAlignShift, ignore it for now
               First a resource's Type is defined, then all of the resources of that type follow:

               First Type definition @ 066A - 0E 80 01 00 00 00 00 00
                    This means:
                         The TypeID is 800E (A Group Icon)
                         There is 1 resource defined
                    *The last 2 words are reserved
 
               Then the resc. is defined @ 0672 - 12 00 01 00 30 1C 01 80 00 00 00 00
                    This means:
                         The resource starts on Page 0012
                         It is 0001 Pages long
                         1C30 is more Flags
                         The resource's ID is 8001
                    *Again, the last two words are reserved

               The next type definition is @ 067E - 03 80 01 00 00 00 00 00 
                    'There is 0001 resource of type 8003 (Icon)'

               Then the resource definition @ 0686 - 13 00 03 00 30 1C 01 80 00 00 00 00
                    The resource starts at page 0013 and is 3 pages long

               The next type definition is @ 0692 - 0A 80 05 00 00 00 00 00
                    'There are 0005 resources of type 800A (Data)'
                    * There are actually 4 resources, the 3rd is skipped

               Then the 4 definitions starting @ 069A:
                    069A - 16 00 02 00 30 1C 01 80 00 00 00 00   
                    06A6 - 18 00 02 00 30 1C 02 80 00 00 00 00
                    06B2 - 1A 00 09 00 30 1C 04 80 00 00 00 00
                    06BE - 23 00 01 00 30 1C 05 80 00 00 00 00

               These resources are respectively:
                    Forms Definitions
                    Internal Definitions
                    A Form
                    Form and Control Names

               It should be noted that the resource ID is not related to what
               the resource is used for. The function of the resource is 
               identified by it's header bytes.
               The FLAGS sections of the segments and resources are used for
               information like if they are MOVABLE, SHAREABLE, PRELOADED, 
               EXECUTEONLY, etc.

     06D8-07FF Various name tables used by windows

     0800-0819 This is the first segment. If you remember, the initial value
               of CS:IP was 10 bytes past the start of this segment. This byte is 
               a long CALL (9A) into the interpreter. The address is computed at
               runtime since there is no way to tell where VBRUN will load into
               memory. The bytes which follow the segment are loading information
               for other segments.

     0900-0EFF These are the actual tokens. The source code is translated to this
               at compile time. Strings are stored literally; this helps us to find
               our place while comparing tokens to source. More on this section and
               the ones that follow in the next lesson.

     0F00-114F This section defines how the tokens are arranged into their various
               subs.

     1200-12FF This is the GROUP_ICON definition (Don't bother!)

     1300-15FF This is the ICON definition. For information on this and the previous
               section look in the WIN SDK help file under 'Graphics File Formats'

     1600-17FF This is the Forms Definitions section. Here, information on forms,
               imported VBX's, and controls is stored.

     1800-19FF This section's format is quite mysterious but it is used to hold
               object definitions like forms, controls, variables, and constants.

     1A00-22FF This is the actual form used in the program. It's format is very similar
               to a VB .FRM file. Notice the 'in line' icon @ 1A61. Pictures are also
               stored this way. The form's controls are defined in the second half of
               the form.

     2300-END  These are the control names. ***This section is unnecessary for program 
               operation and is removed when the program is PROTECTED.***
What does this mean for CRACKERS???

Crackers can modify the information in these various sections to:

If you are lucky enough to own the Professional Version of DoDi's VB tools, you can De-compile most programs into source code which will recompile in Visual Basic after you have made your changes. Unfortunately, the shareware and standard versions don't handle custom controls properly and will probably not give you source code that will re-compile. Your only option is to make the modifications manually.
 

Tokens

While a complete explanation of all of the tokens is beyond this lesson, I will describe some of the more common things you will come across as you examine tokens. Lets take the following small snippet of code: Lets break this down. The first two bytes '35 49' is the token for encoding the number of leading spaces in the original source code. Some of the token words for spacing are as follows: Hence '35 49' means there are 4 spaces at the start of this line of code (four spaces is also the default TAB in VB). Although this information is only for formatting and is not necessary for program operation, the interpreter expects to see valid tokens here and funny things happen if it doesn't. HINT: This makes it easy to find the start of each line as you look at raw tokens.

'21 2D 1A 00'  References the variable password.

The next bytes '9A 38 0A 00 0C 00 04 00 64 75 6B 65 00 00' is the string definition for 'duke' : 'C3 11'  Follows most Literal String Definitions (performs a PUSH to prepare for next token)
 
 '7A 44' This is the important one. It basically means 'Compare the two variables and if  < >  then continue' Hmmm. What would happen if we changed '7A 44' to '6A 44' which means 'Compare the two variables and if  =  then continue' You guessed it! Our program would be cracked.

The remaining tokens are the END instruction and the next line spacing token. The easiest way to learn about the different tokens is to write short VB programs, make .EXE's, and compare the tokens with the source code which generated them. When you compare the differences between simple code changes, you will begin to see the patterns. You could also look at the routines for the various tokens but these are very difficult to follow. If you would like to look at the routines, try the following:

This JMP AX is about to go to the first routine. If you look at AX, you will notice that it's value is the first token. It was loaded with the previous instruction LODSW ES. The tokens are actually the addresses of the routines to be performed! If you DB ES:0  you will see the tokens the way VBRUN is referencing them using SI. As you step through, you can watch the tokens being loaded and their routines run.
Now that you have a basic understanding of how the tokens work, let's move on....

Forms and Controls

Before I go into an actual Form, there is another resource which describes the various forms and controls in the program. Lets take another section of CALC.EXE:
 
    1600:03 20 81 80 FF FF 43 41 4C 43 00 00 00 00 00 05
    1610:00 01 00 43 41 4C 43 00 00 46 09 04 80 46 00 FF
    1620:01 A4 48 00 43 41 4C 43 2E 46 52 4D 00 00 00 58
    .....
This is the start of the 'Forms Definitions' section. It contains the names of the form (.FRM) files used in the compile, names of any .VBX files needed by the program, and references for both common and custom controls.
 The start (header) of this section is '03 20 81 80'  and there is only one of these sections in an .EXE (that I have seen, anyway). 'FF FF' always follows.
 The next nine bytes contains the program name, eight byte DOS limit and a terminating 0. If the name is less than 8 bytes long, the extra space is padded w/ 0's.
 The next bytes '05 00' is the length of the Application Title with a terminating 0. This title may be up to H29 bytes long and is entered at compile time.
 '01 00' ?????? Possibly the number of titles?
 '43 41 4C 43 00' is the title 'CALC'
The next '00' is padding.
The bytes which follow are the definitions of the names and controls. They have the following format:
  Now for the Form: CALC.EXE only has one form which starts at 1A00. The form can be thought of as two sections:
the form description and the controls description. The form(s) will have the header 'FF CC 2C 00', (The header is actually just CCFF but all of the forms I have seen follow this with 002C, it doesn't seem to ever be checked. VBRUN300 will also accept CC23 as a valid form section header although I have not yet seen it in an .EXE.)
This form has 7 controls,
Offset to end = 08A3 (End of form : 1A05 + 08A3 =  22A8),
Offset to first control = 038D (First control : 1A09 + 038D = 1D96)
I don't know what the 0D is....

The important thing to remember from this point is that VB starts with a default form and makes changes to it from there.
If our form were a default VB form, the 'FF 00' @ 1A16 would be located at 1A10; this FF designates the end of the basic window properties. Instead we have three entries:

As you can see, the changes are made by adding a 'Property ID' and the new value. The three properties changed above all had values that were only one byte long but this is not always the case. The Form's caption is next, then starts property 05 @ 1A23 . The next 4 DWORDS are the values for property 05 which set the size and location of the window. Changing these will move and/or resize the window. The next property 0C, is a font change. This is followed by the font name and 5 words which describe it's attributes. At 1A46 is property 23, an Icon. Icons, as well as images, are stored inline, in standard format and can be edited. The next word 02FE is the offset to the end of the icon. Right after the icon, is property 24 @1D49; the Link Topic stored in the usual VB string format. And after that, property 25 which is the Link Mode. Last, and definitely least, are properties 35, 36, 37, and 38. Do you recognize their values? They are the same DWORDS as property 05 above. There is a difference though, changing these does nothing. I don't know if these are used for anything at all. The next property is FF meaning of course 'No More Properties'. A few unknown bytes and we are on to our next part: the controls section (remember though, this is all one resource).
 The controls on this form start at 1D96. Since there are a lot of buttons on our 'Calculator', the section is too long to go over the whole thing but here is a chunk of it:
   Lets look at the first control in detail:
   If our program contained a menu, the items would also be listed in this faishon. The hierarchy can get quite messy but the key is in the first byte(s) of the control. The bytes following the first may or may not be the offset to next, If they are 01 - 05 they are hierarchy codes (04 meaning no more controls). If they are > 5 then they are the offset to the next control. After you examine this section of a program with a complex menu, you will see what is going on.

 Unfortunately, due to the large number of control properties, I cannot give you a list of them. It is, however, fairly easy to find the code of a property you are looking for .... Just compile a test program with whatever control you are trying to find the property for, make an .EXE out of it, then change the property and make another .EXE. When you binary compare the Form section's of the two programs, you will see what bytes have been added to change the property. This is the best way to find out most of what is in the .EXE.

* A note on VB3: VB3 has a strange habit of compiling the exact same source code into slightly different .EXE's between the first and second compiles. When making your reference file, compile your source code TWICE without changing anything. It will ask you if you want to over write the existing .EXE; answer YES. NOW rename the .EXE and compile it a THIRD time. A binary compare of these should be identical, if not, repeat this until you can get two files which are identical. THEN make a your changes and Compile again. This is necessary on the VB3 that I have, you may want to test it on yours.

Here is a list of some of the Form Properties you may want to change:

and standard control types:

The last resource in our CALC program is the control names resource @ 2300. Not too much to talk about here, the first entry is the name of the form, subsequent entries are the names of the controls on the form. With a control array, only the first item is listed. This section is not needed at all for the program to run and it can be removed (and is!) without effect. Each control defined in the control section has a reference to the position in this list of the control's name. Unfortunately, the program's variable and sub/function names are not stored anywhere in the program, and hence can never be recovered. If our program had more than one form, the additional form(s) would follow alternating with their control names section(s).
 

"Protection" from De-Compilers.

 First let me start by saying that NO PROGRAM CAN BE PROTECTED FROM A GOOD DE-COMPILER!!!! This is not to say that DoDi's De-Compiler is not good, but he has written it with the intent to be able to prevent it from working. As long as the Program Tokens are in the .EXE (and they must be for the program to function) those tokens can be de-compiled back to the original source code. So when I talk about Un-Protecting a file, what we are really talking about is making it acceptable to DoDi's de-compiler.

 Programs are protected by removing the sections which are not needed for the program to run, but ARE needed for the de-compiler. These sections are the .FRM names in the Forms definitions section and the Control Names resource(s). Get MAKE_MAK.EXE from the net if you don't already have it. It is a VB Protector. Make a copy of our CALC.EXE with the name CALC.OLD and using MAKE_MAK, 'Protect' CALC.EXE.

When you start to HEX examine the file, look at the following things:

DoDi's de-compiler will now refuse to work on our file. It is detecting the changes and refusing to run (CHEAT!). But all we have to do is fix these sections and it will work, right?? ABSOLUTELY!!! Since we know what was originally contained in these sections, we can rebuild them exactly. If we were dealing with someone else's protected file, we could only guess at what their forms and controls were named but IT WILL DE-COMPILE once it is fixed.

Un-Protecting a file:

* If the program has been protected with DoDi's VBOPT, there will be one additional step needed in order to un-protect it. I won't tell you this step out of respect for the writer of the only VB de-compiler I know of, but it isn't hard to figure out. I have faith in all of you!!

 
Final Notes

 If there is something that I have not made clear, and after much time of trying to figure it out for yourself, or if you know of parts of my essay that are just wrong, please e-mail me at vbman@nassau.cv.net and I will do my best to help.
Please Don't email duke@nassau.cv.net, it's not me. Someone got the address before I did :(
Ob Duh

Ob duh does not apply here... on the countrary: visual basic buffs should pay Duke for this kind of information...  
You are deep inside reverser's page of reverse engineering, choose your way out:


redhomepage redlinks redanonymity +ORC redstudents' essays redacademy database
redtools redJavascript wars redcocktails redantismut CGI-scripts redsearch_forms redmail_reverser
redIs reverse engineering legal?