Learning to write programs in C/C++

 


1. How does the Stringizing operator work in C/C++?

The C/C++ preprocessor recognizes various operators that convert tokens to strings. If the '#' character occurs at the start of a parameter on the right hand side of a #define statement, the corresponding argument is enclosed in quotes. For example:


  #define VERSION 100
  #define PRODUCT "Acme"
  #define _STR(i) #i
  #define STR(i)  _STR(i)

  i = VERSION;                // i=100
  strcpy(s1, _STR(VERSION));  // strcpy(s1, "VERSION");
  strcpy(s1, STR(VERSION));   // strcpy(s1, "100");
  strcpy(s1, _STR(PRODUCT));  // strcpy(s1, "PRODUCT");
  strcpy(s1, STR(PRODUCT));   // strcpy(s1, "\"Acme\"");

The _STR macro encloses its arguments in quotes, for example _STR(VERSION) expands to "VERSION". The STR macro (no underscore) is replaced by _STR with an underscore, but if its argument is a macro it will also cause the macro to be expanded. For example, STR(VERSION) expands to _STR(100) which is further expanded to "100".

Therefore in the code above, _STR causes any macro argument to be converted to a string, and STR causes the value of the macro to be converted to a string. Both are useful in different cases.

2. How do I write DLLs in C/C++ and call them from different languages?

Using Visual C/C++ to write a DLL

32 bit DLLs produced with Visual C/C++ should have functions with the modifiers "_declspec(dllexport)" and "_stdcall", interestingly both "_declspec" and "_stdcall" seem to work with one or two leading underscores. The "_declspec(dllexport)" modifier causes the function to be placed in the DLL exported names table without the use of a DEF file, although we return to this topic later. The "_stdcall" modifier makes the function use the standard calling convention which passes parameters from right to left and in which the called function cleans up the stack, as used by the Windows API and Visual Basic. DLLs are easier in 32 bit Windows as you do not have to worry about special compiler options to set up the registers correctly or the memory model. To export a function func from a DLL called "mydll.dll":


  #define EXPORT _declspec(dllexport)
  // CALLBACK is defined in the Windows header files
  // it is defined as __stdcall

  return_type EXPORT CALLBACK func(parameters)
  { ... }

To check the exported functions type:

  dumpbin /exports mydll.dll

To call this function from a Pascal program it must be prototyped as follows:

  function func(parameters):return_type; stdcall;
  implementation
  function func; stdcall; external 'Mydll';

To explain these, the C code declares and exports func from source code that is compiled into a DLL. By using #define EXPORT and CALLBACK we can have the same code compiling for 16 and 32 bits. There is an MS KB article Q123870 "Portable DLL Example Using _declspec() and _export" which claims that "_declspec(dllexport)" should go before the function return type and will not work if positioned in the code as shown, however under Visual C/C++ this does not appear to be a problem. The Microsoft article was written for Visual C/C++ v1 and v2 so possibly Microsoft improved the compiler after the article. The CALLBACK macro chooses the proper calling convention for 16 and 32 bits, it basically means to pass the parameters in the same way they would be passed to an API function. WINAPI is equivalent to CALLBACK in 16 and 32 bit Windows.

After compiling the C DLL use "dumpbin.exe" supplied with Visual C/C++ to produce a list of all exported functions and check the proper functions are exported. Client code will not be able to call routines from the DLL that are not in the exported list. DLLs are name sensitive so the exact matching name must be used when calling into the DLL from Pascal or Visual Basic. Note that without a DEF file a function Fred will be exported as "_Fred@N" where N is the number of bytes passed on the stack, this is known as "stdcall name mangling" and is discussed later.

To call the function from Pascal the function is prototyped, then in the unit implementation it is declared as being external in the named DLL. The function name used in the external declaration must be a legal Pascal identifier, also it must be an exact case sensitive match for the function as exported from the DLL. This is what makes dumpbin so useful, as it is a way of checking the exported name. Pascal must use the stdcall parameter passing convention otherwise it defaults to optimized parameter passing which puts parameters in registers and is incompatible with the DLL. Also the types of the parameters used in the Pascal prototype must exactly match the actual parameters used by the function in the DLL, otherwise the call and the function will interpret the stack differently.

Stdcall name mangling

As discussed Visual C/C++ uses stdcall name mangling, which causes a function "Fred" to be exported as "_Fred@N" where N is the number of bytes of stack required. C++ uses more complex name mangling which is not discussed here. The stdcall name mangling is done to bring an element of type safety into C programs consisting of separate object modules linked together, as the linker will detect most cases in which a prototype or call uses the wrong numbers of parameters.

Unfortunately the stdcall name mangling is unacceptable for a generic DLL that is to be called from Pascal or Visual Basic, also it is not used by the Windows API DLLs as you can easily tell by running dumpbin on GDI32.DLL. Also the name mangling makes it difficult to call the DLL from Pascal as the function identifier cannot include the "@" sign. Thus we must undo the mangling by naming the exports in the DEF file. This uses an alias scheme whereby we give the mangled function name followed by the alias by which it is to be exported from the DLL, however the linker helps us by not requiring the alias if it can match up the name to the function without ambiguity. This means all we have to do is to list the un-mangled exported function names in the EXPORTS section using the same case as the functions were declared with. Thus "mydll.dll" above would have the following DEF file:

  LIBRARY "mydll"
  EXPORTS
  func

When the DEF file is processed by the linker, it realizes we want to export the function using the name given (func), and it matches the export to the occurrence of func in the code. After undoing the mangling use dumpbin to read the names exactly as they are exported. Assuming the function is exported properly, it can be called from Pascal code as long as the exported Pascal identifier in the DLL interface unit exactly matches the name. Although Pascal identifiers are not case sensitive, this is a situation where the Pascal compiler uses the external function identifier exactly as it is written for load time linking to the DLL.

In theory both Pascal and Visual Basic can get round the mangling problem by using aliasing, which is a system in which the function name is followed by the real name as exported by the DLL. In Pascal use the "name" keyword or in Visual Basic use the "alias" keyword, followed by the mangled name exported from the DLL. But this is awkward since you have to know or work out the mangled names to alias them properly. Also it is not the standard way of writing DLLs, which should export functions without mangling the names. The Windows API DLLs export functions without mangling.

DLL callbacks and why WINAPI is the same as CALLBACK

It is possible to allow the DLL to make calls back into the client code, which is useful if you are writing a DLL to implement a callback function, or designing an event driven interface. The concept is the client calls a function in a DLL, passes the DLL a pointer to a function that resides in the client, and the DLL function will make calls back into the client using the pointer.

In essence this is the same situation as when a client calls into a DLL. Whether a call is made into a DLL, or a call is made from a DLL back into an EXE file, the called function should use the EXPORT CALLBACK (or EXPORT WINAPI) modifiers to ensure the function is in the DLLs exported names table and the parameters are passed correctly. This is why WINAPI and CALLBACK are equivalent, as in essence there is no difference between calling an API function which is in a DLL, and having the Windows DLL calling back into the application's EXE file or an application DLL. Thus we call CreateWindow which has the modifier WINAPI, and Windows may call one of our window functions or an enumeration callback, which have the modifiers CALLBACK.

Thus if we write a function func in a DLL that accepts a pointer to a function as one of its parameters, the DLL function will be able to make calls back into the calling EXE file. We are using callbacks in the same way that Windows does. When func has made its callback, the call tree will look something like this:

  callback_func (in the EXE file)
  func (in the DLL)
  main (in EXE file)

Function main starts by calling func which is in the DLL and passing a pointer to callback_func. In the DLL func uses the function pointer to call back into the EXE file. Then we are back in callback_func.

Other DLL issues

DLLs must support multi tasking because they may be used by several clients simultaneously. This means all data should be passed as parameters, or should be local variables (stack based) or should be attached to window (HWND) structures. Because the DLL resides in the caller's stack, the local variables will be correct for each of the DLL's clients. There is a Build - Settings - Libraries option which should be used to choose the multi threaded RTL.

DLLs reside on their caller's stack so they should conserve stack space. A stack overflow in the client will produce an error message from the client RTL, but a stack overflow in the DLL will produce an error message from the C RTL.

DLLs should not use any static or global data as they should be fully re-entrant allowing code to be simultaneously executed by different applications. However there is no objection to the DLL storing constant data in its data segment. In effect this happens anyway as string literals are stored in a read-only data segment common to all instances of the DLL.

Any windows and window classes created in the DLL belong to the hInstance that created the window or class. It is best if the client passes its hInstance and the DLL uses this to create the window / class, so that these resources are cleared up properly when the client terminates.

The DLL should not malloc/free memory because it will be managed by the DLL's own version of the C RTL. There would be a danger of memory being allocated in the client and freed in the DLL (or vice versa) which would cause problems because the two would use different memory block lists. If the DLL must allocate memory use GlobalAllocPtr() and GlobalFreePtr(), and balance the calls within the one DLL function. Memory allocated in a DLL with GlobalAllocPtr() belongs to the client app, and will thus be automatically freed when the client terminates.

Memory leaks in the DLL are potentially more serious because the DLL will only be unloaded and the blocks returned when all clients terminate. If there are problems whereby the DLL uses memory, look for an obscure MS tech note that discusses the heapmin function. It may be necessary to pack the heap to return the blocks.

Again the client and DLL have different run time libraries so there is no way file handles or any C run time low-level structures can be shared between the DLL and the caller. If the DLL has to operate on files, it should do so within a single function call and balance the calls to fopen/fclose.

Auto loading DLLs under Windows

Under Windows NT / 2000 / XP it is possible to specify DLLs that are loaded into the context of all processes when they start. This is useful for custom controls as they can initialize themselves in the DLL startup code. These DLLs are stored in the registry entry:

HKEY_LOCAL_MACHINE\ Software\ Microsoft\ Windows NT\ CurrentVersion\ Windows \ APPINIT_DLLS

Refer to "Window Classes in Win32", Kyle Marsh, MSDN for more details.

Using Visual C/C++ v1.5 to write a 16 bit DLL

The information in this section is kept for historical interest only. 32 bit Windows has been available from 1995, and 16 bit DLLs are obsolete.

In 16 bit Windows using Visual C/C++ v1.5, give functions the modifiers "_export" and "_far _pascal". The "_export" modifier causes the function to be placed in the DLL exported names table and ensures the registers are set up correctly when the function is called, by loading the DS register from a constant value set up in the DLL at load time (DS cannot be loaded from SS as DS != SS for a DLL). The "_far _pascal" modifiers use the normal DLL parameter passing convention, which specifies DLLs are in a separate segment ("_far") and that the "_pascal" parameter passing system is used, which is that parameters are passed left to right and that the called function cleans up the stack. These conventions are the ones used when calling DLLs from Delphi Pascal or Visual Basic. Also these conventions are used by the Windows API which are really just large DLLs with a defined interface.

Strictly speaking you can write DLLs without either "_export" or "_far _pascal", by naming the exported functions in a DEF file and relying on the default "_cdecl" parameter passing convention. Such a DLL could be called up from other C programs, or even from Pascal by using inline assembler and cleaning up the stack properly. But this is more complex, and for the purposes of this discussion we want to write generic DLLs that can be called from Pascal or Basic with as little work as possible. Thus in the rest of this discussion we assume the "_export" and "_far _pascal" modifiers are used. To export a function "func" from a DLL called "mydll.dll":


  #define EXPORT _export
  // CALLBACK is defined in the Windows header files
  // It is defined as _far _pascal

  return_type EXPORT CALLBACK func(parameters)
  { ... }

To check the exported functions type:

  exehdr /verbose mydll.dll

To call this function from a 16 bit Delphi Pascal program it must be prototyped as follows:

  function func(parameters):return_type;
  implementation
  function func; external 'Mydll';

To explain these, the C code declares and exports func from source code that is compiled into a DLL. When compiling the DLL the large memory model (/ALw) should be used to avoid problems with near pointers that are a nasty hangover from Windows v3 real mode. Specify SS != DS as SS is the caller's DGROUP and DS is the DLL's DGROUP, and specify DS not loaded on entry, so the DLL can access its own DGROUP. Fixups are automatically set when the DLL is loaded so it can access its own data segment.

After compiling the C DLL use "exehdr.exe" supplied with Visual C/C++ v1.5 to produce a list of all exported functions to make sure the proper functions are exported. Client code will not be able to call routines from the DLL that are not in the exported list. The names in the list are not case sensitive and exehdr produces them all in upper case, thus an exported function MYFUNC can be successfully called from a Pascal function MyFunc. The same should also apply to 16 bit Visual Basic.

The 16 bit Delphi Pascal code will be placed in a unit that interfaces to the DLL. Normally a separate unit is produced for each DLL although this is not essential. First the unit declares a prototype for the function including all the parameters - the types of these must match the types expected by the function in the DLL. For example if a Delpi program specifies an integer followed by a string pointer, when Delphi calls the DLL function it will push a 2-byte integer then a 4-byte pointer onto the stack. In the DLL the function must be written to expect the same parameters, then it will interpret the stack frame as containing the same types and it will pop the same values off the stack when it returns. If the headers are mismatched the DLL function will receive and return bad data, and the stack will probably be misaligned on return, and the result will be the program will crash at a point sufficiently distant from the bad call so as to be confusing. To avoid this it is important to check the Pascal prototypes match the C prototypes.

16 bit code requires the LARGE memory model which supports multiple code segments and a single private DLL data segment. This model ensures we use versions of the C run time functions that cope with proper segmented (seg:ofs) 4 byte pointers, everything works properly and we never have to use the "near" or "far" keywords. It should be possible to grep the entire source to check "near" and "far" are not used anywhere. Any attempt to use other memory models or a mixed model is unnecessary and is a short cut to insanity. If enumeration callbacks such as EnumTaskWindows() and similar routines are used in a DLL they must also be declared EXPORT CALLBACK. The EXPORT keyword ensures the DS register is set correctly for 16 bits. Without EXPORT DS will be loaded from SS, which will cause the callback to use the wrong data segment when it is used from a DLL.

There is inconsistency between the same source compiled into a 16 bit and a 32 bit DLL, in that a 16 bit DLL shares its data between callers and a 32 bit DLL defaults to private data per caller. This would cause problems if a DLL used global variables as a 16 bit client would change the value for other 16 bit clients, but a 32 bit client could not set a global value that would be accessible to other 32 bit clients. In most cases this should not matter as DLLs should not use static or global variables - all data they use should be allocated and attached to windows, or passed as parameters.

If all functions are exported and prototyped correctly the Pascal code can call the C DLL as if it were an extension of its own code. This is the same way in which the Pascal RTL calls the Windows API functions - Borland provide a large import unit "windows.pas" which has external declarations for the entire API.

When using DLL callbacks (ie if the DLL calls back into the client EXE file), the usual compiler options have to be set. The DLL function func must have the EXPORT CALLBACK modifiers so the registers are set up correctly. In the DLL SS=client's SS, and DS=the DLL's own DGROUP. Thus SS!=DS and DS is not loaded from SS on entry applies for the DLL. Back in the EXE file the callback_func also needs to load DS on entry, as when the callback is made DS will be set to the DLLs DGROUP and DS will not equal SS, to fix this the callback function needs to load its DS from SS on entry. Otherwise unpleasentnesses will arise such as Windows calling a winproc in the client and using the DLL's DGROUP, or calling the DLL's winprocs with the client's DGROUP. This causes strange effects as constants are usually stored in DGROUP so if DS is wrongly set all the program's constants (usually string literals) will suddenly change their values.

When starting experiments with callbacks, it was useful to create a small function (CheckDS) that ensures the DS register has been correctly restored. CheckDS can be called at the start of DLL functions and the callback functions in the EXE file. A short string literal is declared which will automatically be put in DGROUP, then CheckDS checks the constant has the same value by getting and testing each character from the string literal. Then if the DS register has been changed the reference to the constant will pick out random data from someone else's DGROUP, the comparison test will fail and CheckDS can display a message box and halt.