Sunday, 4 January 2015

Win32 Hacks: Loading API functions from a process' Process Environment Block (PEB) Part 2 of 2

Win32 Hacks: Loading API functions from a process' Process Environment Block (PEB)

Preliminary

Last time in part 1 we figured out how to find the base address of persistent DLLs via the PEB. By reading the DLL image at the base address, we can read its export directory table and get the address to its functions. That's what we're going to do now.

PE memory layout

At the DLL base address, the DLL image is almost laid out the same way as in the DLL file on disk. The only difference is that sections are aligned and padded to larger sizes than in the file, and not all sections get mapped to memory at runtime. The latest PE specification can be found here. At runtime, the layout looks like this:

  • MSDOS header (starts with the magic value "MZ")
  • MSDOS stub
  • PE header
  • Optional header
  • Data directory table
  • Section headers
  • Section 1
  • Section 2
  • ...
  • Section N
Winnt.h has structures for all of these, and they are called IMAGE_DOS_HEADER and IMAGE_NT_HEADERS32. (IMAGE_NT_HEADERS64 for PE32+ executables)
IMAGE_NT_HEADERS32 contains both the PE file header and the optional header.

Our goal is to get the address to a function inside the library given the function's name, and that information is found in the export directory table. The address to the export directory table is found in the data directory table.

Validating the headers

We need to validate the headers in case some of the required fields don't exist, or the image is corrupt for some reason.

void* slGetModuleProcAddress(void* moduleBase, LPCSTR procName){
 //The MSDOS header starts directly at the first byte of the base address
 PIMAGE_DOS_HEADER dosHeader = (PIMAGE_DOS_HEADER)moduleBase;
 //MSDOS images start with Magic value 'M' 'Z'
 if (dosHeader->e_magic != IMAGE_DOS_SIGNATURE)
  return NULL;
 //The field e_lfanew in the MSDOS header contains the
 //PE header offset
 PIMAGE_NT_HEADERS32 headers32 =
  (PIMAGE_NT_HEADERS32)((char*)moduleBase + dosHeader->e_lfanew);
 if (headers32->Signature != IMAGE_NT_SIGNATURE)
  return NULL;
 //A normal PE32 file has an optional header which is 96
 //bytes long
 //NumberOfRvaAndSizes holds the number of entries in the
 //data directory table. We need at least one.
 if (headers32->FileHeader.SizeOfOptionalHeader < 96 ||
  headers32->OptionalHeader.NumberOfRvaAndSizes == 0)
  return NULL;
 
 DWORD EdtOffset =
 headers32->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress;

 if (!EdtOffset)
  return NULL;

 /* More code follows .. */

Pretty straight forward. The addresses and sizes of different tables are stored in the data directory table at specific positions. The size and address of the export directory table is located at index 0 of the data directory table. This is why NumberOfRvaAndSizes must be at least one. We checked all the magic values in the different headers, and we found the relative virtual address to the export directory table, which is EdtOffset. So what does the EDT look like?

Looking up a symbol from the Export Directory Table


typedef struct _EXPORT_DIRECTORY_TABLE {
 DWORD ExportFlags;
 DWORD TimeStamp;
 WORD MajorVersion;
 WORD MinorVersion;
 DWORD NameRVA;
 DWORD OrdinalBase;
 DWORD ExportAddressTableSize;
 DWORD NamePointerTableSize;
 DWORD ExportAddressTableRVA;
 DWORD NamePointerTableRVA;
 DWORD OrdinalTableRVA;
} EXPORT_DIRECTORY_TABLE, *PEXPORT_DIRECTORY_TABLE;

The interesting fields here are NameRVA, ExportAddressTableRVA, OrdinalTableRVA and OrdinalBase. The RVAs are relative addresses to the address-, ordinal- and string tables respectively, while OrdinalBase is an offset to add to ordinals.
NameRVA is an array of null-terminated ISO-8859-1 strings, and it is lexicographically sorted. Finding a symbol in the list is therefore O(log n) and pretty fast. If the symbol you searched for is at index 'i' in the name table, the symbol's ordinal is found at index 'i' in the ordinal table via OrdinalTableRVA. You use the ordinal as an index into the export address table, and that gives you the function address.

But all of these fields are relative virtual addresses. How do we turn them into absolute virtual addresses? Well, we did it before when we set up the MSDOS and PE headers. They are relative to the DLL image base, so just add the image base address to them.

 //slGetModuleProcAddress() from previous code snippet
 //continues here.

 //moduleBase + offset gives us the real VA to the EDT
 PEXPORT_DIRECTORY_TABLE EdtPtr =
 (PEXPORT_DIRECTORY_TABLE)((char*)moduleBase + EdtOffset);
 //Again, add base address to all the RVAs
 PVOID OrdinalTable = (PBYTE)moduleBase + EdtPtr->OrdinalTableRVA;
 PVOID NamePointerTable = (PBYTE)moduleBase + EdtPtr->NamePointerTableRVA;
 PVOID ExportAddressTable = (PBYTE)moduleBase + EdtPtr->ExportAddressTableRVA;
 //We're lazy and do a linear search for the function name
 //Real code should probably use binary search
 for (DWORD i = 0; i < EdtPtr->NamePointerTableSize; i++){  
  DWORD NameRVA = ((PDWORD)NamePointerTable)[i];
  const char* NameAddr = (char*)moduleBase + NameRVA;
  //If string comparison fails, skip to next iteration
  //Note that these strings are ISO-8859-1, not UTF-8
  //or UTF-16/USC2!
  if (slStrCompare(NameAddr, procName))
   continue;

  //The weird stuff with OrdinalBase is explained later
  WORD Ordinal = ((PWORD)OrdinalTable)[i] + (WORD)EdtPtr->OrdinalBase;
  WORD RealOrdinal = Ordinal - (WORD)EdtPtr->OrdinalBase;  
  DWORD ExportAddress = ((PDWORD)ExportAddressTable)[RealOrdinal];  
  void* FinalAddr = (char*)moduleBase + ExportAddress;
  return FinalAddr;
 }
 return NULL;
}

In the code above, we start with computing the actual virtual addresses to all the tables. Then for simplicity's sake we do a linear search as opposed to binary search. When we find a string match in the string table, we use the current index to look up the ordinal, and the ordinal is used as lookup in the export address table. Finally, we add the image base to the export address to make a real virtual address.

But one thing remains to explain. What's up with the OrdinalBase variable? As it's used in the code, it looks completely superfluous.
According to Microsoft's COFF/PE specification, one is supposed to subtract OrdinalBase from the ordinal value prior lookup into the export address table. But in reality, the entries in the ordinal array is already offset with OrdinalBase. So if you want to compare ordinal values with other tools, you have to add OrdinalBase to the ordinal. In the code above Ordinal is the ordinal value you would compare with other programs, while RealOrdinal is the ordinal you use for lookup.

The final code combined with a simple example looks like this (structures omitted):

void* slGetModuleBase(LPCWSTR moduleName){
 void* pebPtr = slGetPEB(); 
 PPEB32 peb = (PPEB32)pebPtr;
 PLIST_ENTRY moduleListTail = &peb->Ldr->InMemoryOrderModuleList;
 PLIST_ENTRY moduleList = moduleListTail->Flink;
 do {
  char* modulePtrWithOffset = (char*)moduleList;
  PLDR_DATA_TABLE_ENTRY module =
   (PLDR_DATA_TABLE_ENTRY)modulePtrWithOffset;
  if (!slWStrCompare(module->FullDllName.Buffer, moduleName)){
   void* DllBase = module->Reserved2[0];
   return DllBase;
  }
  moduleList = moduleList->Flink;
 } while (moduleList != moduleListTail);
 return NULL;
}

void* slGetModuleProcAddress(void* moduleBase, LPCSTR procName){
 PIMAGE_DOS_HEADER dosHeader = (PIMAGE_DOS_HEADER)moduleBase;
 if (dosHeader->e_magic != IMAGE_DOS_SIGNATURE)
  return NULL; 
 PIMAGE_NT_HEADERS32 headers32 =
  (PIMAGE_NT_HEADERS32)((char*)moduleBase + dosHeader->e_lfanew);
 if (headers32->Signature != IMAGE_NT_SIGNATURE)
  return NULL; 
 if (headers32->FileHeader.SizeOfOptionalHeader < 96 ||
  headers32->OptionalHeader.NumberOfRvaAndSizes == 0)
  return NULL;
 
 DWORD EdtOffset =
 headers32->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress;
 if (!EdtOffset)
  return NULL;

 PEXPORT_DIRECTORY_TABLE EdtPtr =
  (PEXPORT_DIRECTORY_TABLE)((char*)moduleBase + EdtOffset);
 PVOID OrdinalTable = (PBYTE)moduleBase + EdtPtr->OrdinalTableRVA;
 PVOID NamePointerTable = (PBYTE)moduleBase + EdtPtr->NamePointerTableRVA;
 PVOID ExportAddressTable = (PBYTE)moduleBase + EdtPtr->ExportAddressTableRVA;
 
 for (DWORD i = 0; i < EdtPtr->NamePointerTableSize; i++){  
  DWORD NameRVA = ((PDWORD)NamePointerTable)[i];
  const char* NameAddr = (char*)moduleBase + NameRVA;

  if (slStrCompare(NameAddr, procName))
   continue;

  WORD Ordinal = ((PWORD)OrdinalTable)[i] + (WORD)EdtPtr->OrdinalBase;
  WORD RealOrdinal = Ordinal - (WORD)EdtPtr->OrdinalBase;
  DWORD ExportAddress = 0;  
  ExportAddress = ((PDWORD)ExportAddressTable)[RealOrdinal];  
  void* FinalAddr = (char*)moduleBase + ExportAddress;
  return FinalAddr;
 }
 return NULL;
}

//kernel32
typedef HMODULE (WINAPI *LOADLIBRARYWPROC)(LPCWSTR);
typedef BOOL (WINAPI *FREELIBRARYPROC)(HMODULE);
typedef PVOID (WINAPI *GETPROCADDRESSSPROC)(HMODULE, LPCSTR);
typedef VOID (WINAPI *EXITPROCESSPROC)(UINT);
//user32
typedef int (WINAPI *MESSAGEBOXWPROC)(HWND, LPCWSTR, LPCWSTR, UINT);


void main(void){
 //kernel32
 EXITPROCESSPROC slExitProcess = NULL;
 LOADLIBRARYWPROC slLoadLibraryW = NULL;
 FREELIBRARYPROC slFreeLibrary = NULL;
 GETPROCADDRESSSPROC slGetProcAddress = NULL;
 //user32
 MESSAGEBOXWPROC slMessageBoxW = NULL;
 HMODULE m;

 void* kernel32 = slGetModuleBase(L"KERNEL32.DLL");

 slExitProcess = slGetModuleProcAddress(kernel32, "ExitProcess");
 slLoadLibraryW = slGetModuleProcAddress(kernel32, "LoadLibraryW");
 slFreeLibrary = slGetModuleProcAddress(kernel32, "FreeLibrary");
 slGetProcAddress = slGetModuleProcAddress(kernel32, "GetProcAddress");

 m = slLoadLibraryW(L"USER32.DLL");
 slMessageBoxW = slGetProcAddress(m, "MessageBoxW");

 slMessageBoxW(0, L"SelfLoader Example", L"Test", MB_OK);
 slFreeLibrary(m);

 slExitProcess(0);
}


Hopefully it's clear now that we use this trick to only get a few functions from kernel32.dll, and let GetProcAddress() do the rest. Our program has no external dependencies at link time.

Win32 Hacks: Loading API functions from a process' Process Environment Block (PEB) Part 1 of 2

Win32 Hacks: Loading API functions from a process' Process Environment Block (PEB)

Preliminary

Part 2 is here.

Edit: The original post became a bit too long and elaborate, so I split it into two posts. This first part shows how to get the base address of a persistently loaded library, and part 2 shows how to parse a module given its base addres and access its imports (get function pointers).
Occasionally one might have good reasons to make a program without any imports. That is, you want to make an executable or have pure code without any dependencies at link time (freestanding code) Or maybe one has code which is injected into another process (debuggers, game trainers, diagnostic tools, etc), in which case one needs a way to find the function pointers to the API functions one wants to use.

This can be done by reading the Process Environment Block, and it can be found via a pointer entry inside the Thread Information Block. Once you have the PEB, one can find module information for kernel32.dll, kernelbase.dll, and ntdll.dll in a linked list there, as these libraries are shared between all processes. They are always loaded into userspace memory when a new process is created.
There's plenty of information in the module list, but the only thing you'll need to load a library's functions is the image base address. When you have the base address, the only thing left to do is to parse the library image and its export address table.

A word of warning here. The TIB and PEB structures are highly susceptible for change between Windows versions. Memory layouts explained later might be completely different between my machine and your machine. If someone have definite information on the TIB and PEB layouts for all Windows versions, I'd be grateful for information on that. I would also appreciate more information on the layouts for 64-bit processes.
If you already are familiar with the TIB and PEB, skip the two following headings.

What is the Thread Information Block?

The TIB is an internal and undocumented/unofficial Windows data structure that contains information about a process' currently running thread, and which lives in user space memory.  For a general layout, Wikipedia has a nice overview. In 32-bit executables, the selector fs with some offset is used to access the TIB, while in 64-bit executables one uses the gs selector. (From now on, I assume 32-bit x86 executables unless specifically said otherwise.) If one wants to do some more heavy work, it might be more convenient with a direct pointer to the TIB. A direct pointer can be found by reading fs:[0x18]. Other information you can access via the TIB is the current thread ID, the current locale, the last error reported by GetLastError(), and so on. Remember that there is one TIB per thread. We're only interested in one field though, and that is the pointer to the Process Environment Block, and the PEB pointer in the TIBs is the same for all TIBs, as there is only one PEB per process. You can get that pointer by reading fs:[0x30].


;NASM x86 functions for getting the TIB
GLOBAL _GetTIB
SECTION .text;

_GetTIB: mov eax, [fs:0x18]
ret

What is the Process Environment Block?

The PEB is an internal and undocumented (but more documented than TIB) data structure that contains information about the process itself, and which lives in user space memory. Notably it contains information about modules the process shares with other processes. Again, Wikipedia to the rescue. To get the address to the PEB, access the PEB pointer field of any TIB. Its offset from the TIB start is 48 bytes. So fs:[0x30] gets you the pointer to the PEB.

;NASM x86 functions for getting the PEB
GLOBAL _GetPEB
SECTION .text;

_GetPEB: mov eax, [fs:0x30]
ret

Once we have the PEB, the Ldr field is the interesting one. It is itself a structure and contains a InMemoryOrderModuleList member. InMemoryOrderModuleList is of type LIST_ENTRY from winnt.h, line 1049. The list is an intrusive list, i.e the forward and back pointers are a part of the objects themselves. The larger structure is of type LDR_DATA_TABLE_ENTRY. MSDN has more information.

/* Direct member of the PEB */
typedef struct _PEB_LDR_DATA {
 BYTE       Reserved1[8];
 PVOID      Reserved2[3];
 LIST_ENTRY InMemoryOrderModuleList;
} PEB_LDR_DATA, *PPEB_LDR_DATA;

/* Already defined in wint.h (1049) */
/*
typedef struct _LIST_ENTRY {
struct _LIST_ENTRY *Flink;
struct _LIST_ENTRY *Blink;
} LIST_ENTRY, *PLIST_ENTRY, *RESTRICTED_POINTER PRLIST_ENTRY;
*/

/* Not used by us, but you need it to have a complete PEB structure */
typedef struct _RTL_USER_PROCESS_PARAMETERS {
 BYTE           Reserved1[16];
 PVOID          Reserved2[10];
 UNICODE_STRING ImagePathName;
 UNICODE_STRING CommandLine;
} RTL_USER_PROCESS_PARAMETERS, *PRTL_USER_PROCESS_PARAMETERS;

/* The actual module structure which is a part of the intrusive list */
typedef struct _LDR_DATA_TABLE_ENTRY {
 PVOID Reserved1[2];
 LIST_ENTRY InMemoryOrderLinks;
 PVOID Reserved2[2];
 PVOID DllBase;
 PVOID EntryPoint;
 PVOID Reserved3;
 UNICODE_STRING FullDllName;
 BYTE Reserved4[8];
 PVOID Reserved5[3];
 union {
  ULONG CheckSum;
  PVOID Reserved6;
 };
 ULONG TimeDateStamp;
} LDR_DATA_TABLE_ENTRY, *PLDR_DATA_TABLE_ENTRY;

typedef void(__stdcall * PPS_POST_PROCESS_INIT_ROUTINE)(void);

typedef struct _PEB32 {
 BYTE                          Reserved1[2];
 BYTE                          BeingDebugged;
 BYTE                          Reserved2[1];
 PVOID                         Reserved3[2];
 PPEB_LDR_DATA                 Ldr;
 PRTL_USER_PROCESS_PARAMETERS  ProcessParameters;
 BYTE                          Reserved4[104];
 PVOID                         Reserved5[52];
 PPS_POST_PROCESS_INIT_ROUTINE PostProcessInitRoutine;
 BYTE                          Reserved6[128];
 PVOID                         Reserved7[1];
 ULONG                         SessionId;
} PEB32, *PPEB32;
Because the list is intrusive, you have to iterate over the LIST_ENTRY list and cast the pointer to PLDR_DATA_TABLE_ENTRY each time. Also, the list is a circular list, so make sure you compare the current pointer against a reference point.
LDR_DATA_TABLE_ENTRY is the structure which contains information about a loaded module, for example kernel32.dll. We can now finally write a function which gives us the base address of a library.

//Look up base address for an module (for example "kernel32.dll")
void* slGetModuleBase(LPCWSTR moduleName){
 //Get Pointer to PEB structure
 void* pebPtr = slGetPEB(); 
 PPEB32 peb = (PPEB32)pebPtr;
 //Reference point / tail to compare against, since the list is circular
 PLIST_ENTRY moduleListTail = &peb->Ldr->InMemoryOrderModuleList;
 PLIST_ENTRY moduleList = moduleListTail->Flink;
 //Traverse the list until moduleList gets back to moduleListTail
 do {
  char* modulePtrWithOffset = (char*)moduleList;
  //List is intrusive, a part of a larger LDR_DATA_TABLE structure,
  //so cast the pointer
  PLDR_DATA_TABLE_ENTRY module = (PLDR_DATA_TABLE_ENTRY)modulePtrWithOffset;
  //Compare the name of the entry against our parameter name
  //Note that the name is a wide string
  if (!slWStrCompare(module->FullDllName.Buffer, moduleName)){
   //The actual position of the image base address inside
   //the LDR_DATA_TABLE_ENTRY seems to change *a lot*.
   //Apparently on Windows 8.1 it wasn't located in the
   //correct place according to my structures defined above.
   //It should have been "DllBase", but apparently it
   //was 8 bytes back, inside Reserved2[0]
   void* DllBase = module->Reserved2[0];
   return DllBase;
  }
  moduleList = moduleList->Flink;
 } while (moduleList != moduleListTail);
 return NULL;
}

//Now we can get the base address like so:
int main(void){
 void* kernel32Base = slGetModuleBase(L"KERNEL32.DLL");
 /* ... */
 return 0;
}
This post became longer than I expected, which is why I split it up. In part 2 I'll show you how to get the address of a function from the DLLs export table, by parsing the image DOS and PE headers from the beginning of the base address. Part 2 is here.

Wednesday, 13 November 2013

How to add docstrings for the C procedures when embedding Guile with libguile

I'm currently embedding scheme Guile into a game/demo engine. I've written an article on this blog before on how to embed Guile and add new subroutines from C/C++.
While doing that, what baffled me was that the docstrings I added when using Guile's snarfing didn't show up in Geiser's C-c C-d C-d "document symbol under cusor" mode. I asked around on #guile on Freenode, and they told me it was a complex beast involving texinfo and other black magic. So I set out to investigate (I need documentation when coding in Scheme damnit!). But first, what the hell is 'snarfing' ?

Snarfing


Snarfing is an extra preprocessor step guile has, to make it easier to write Scheme subroutines from the C interface. Without snarfing, you have to first define your function, and then call scm_g_define_subr somewhere else. Naturally, it follows that these two steps can get out of synch. scm_g_define_subr defines the number of arguments the subroutine takes and whatnot. Snarfing uses the SCM_DEFINE macro defined in snarf.h. It's included by libguile.h, so that's still the only header you need:



SCM_DEFINE(Scheme_CreateTexture, "texture-create", 0, 0, 1,
		   (SCM rest),
		   "Create a texture without any contents.\n"
		   "Optional keywords/parameters:\n"
		   "#:target      ('texture-1d | 'texture-2d | 'texture-3d)\n" 
		   "#:num-channels 1|2|3|4\n"
		   "#:format      ('fixedpoint | 'float | 'depth)\n"
		   "#:width       width\n"
		   "#:height      height\n"
		   "#:depth       depth\n")
#define FUNC_NAME s_Scheme_CreateTexture
{

	SCM_VALIDATE_SYMBOL(SCM_ARGn, scm_target);
	SCM_VALIDATE_SYMBOL(SCM_ARGn, scm_format);
	SCM_VALIDATE_NUMBER(SCM_ARGn, scm_numchannels);
	SCM_VALIDATE_NUMBER(SCM_ARGn, scm_width);
	SCM_VALIDATE_NUMBER(SCM_ARGn, scm_height);
	SCM_VALIDATE_NUMBER(SCM_ARGn, scm_depth);

	/* more code */

}
#undef FUNC_NAME


The arguments are: the symbol of the C function itself, then the scheme symbol, the number of mandatory, optional and vararg arguments, then the parameter list, and finally the docstring. Basically the same as scm_g_define_subr. Then we define the macro FUNC_NAME. And where did s_Scheme_CreateTexture come from? SCM_DEFINE defines < functionname > for you as the scheme symbol string. We must define FUNC_NAME for docstring snarfing to work, as we will see later. In addition, defining FUNC_NAME makes the macros SCM_VALIDATE_xxx, SCM_ASSERT and SCM_ASSSERT_TYPE work like they should, so you can bail out on type mismatches and stuff. Very handy.
So this showed how to define the actual function. Where did the call to scm_g_define_subr go? It's not gone. It's rather generated for you:



void initSchemeTextureBindings()
{
	createSchemeTextureType();
	#ifndef SCM_MAGIC_SNARFER
	#include "scheme_bind_texture.x"
	#endif
}


The ".x" file is generated by a tool named guile-snarf. It should be installed when you install guile. So guile-snarf runs som preprocessor magic on your .c or .cpp file and spits out an .x file which contains all your define_g_subr calls and other info, based on the information you added in the SCM_DEFINE macro. Now let's see how we invoke guile-snarf: guile-snarf -o scheme_bind_texture.x scheme_bind_texture.cpp -I/extra/include/paths Guile-snarf runs gcc -E, so it needs to know about your include paths in order to not bail with an error. It at least needs to know about where it can find libguile.h, which has all the SCM_DEFINE macro magic. If you want it to ignore all the other headers, wrap them inside a SCM_MAGIC_SNARFER #ifndef. SCM_MAGIC_SNARFER is defined while guile-snarf runs. In the example above, scheme_bind_texture.x is not included while snarfing is done, because it doesn't exist yet. That's all what normal snarfing does. Now when main calls initSchemeTextureBindings(), it takes care of the setup for all the guile functions inside scheme_bind_texture.cpp.


Docstring-snarfing


Nothing new has to be done with the source. If you use SCM_DEFINE as above and #define and #undef FUNC_NAME above, the next steps should work to get you the docstrings. You will need the guile source. I base this article on stock guile 2.0.9. Let's assume the source code is in a directory called guile-2.0.9.

First, build guile and libguile as usual. Install it if you like, but you don't have to. Next, get the scripts and programs guile_filter_doc_snarfage, guile-func-name-check guile-snarf-docs from the directory 
guile-2.0.9/libguile. You also need the script guild from guile-2.0.9/meta. Then you follow these steps to first output a .doc file, which in turn is converted to a .texi texinfo file, and finally a texinfo file in the text format:

./guile-snarf-docs -o foo.doc foo.cpp -- -I/extra/include/paths
cat foo.doc | guild snarf-check-and-output-texi > foo.texi
makeinfo --force --plaintext -o foo-procedures.txt foo.texi


Guile-snarf-docs generates dot .doc files similar to how guile-snarf works; by running the preprocessor with gcc -E. The "--" separates snarf-doc parameters from the parameters sent to the preprocessor. Just like guile-snarf you need to pass on the extra include paths, if any. 

The second step guild snarf-check-and-output-texi is an undocumented argument guild takes. It converts dot .doc files into the texinfo format .texi and spits it out to stdout. Note that if you got several .doc files from several .c or .cpp files, you can cat them together here.

The third and final step just runs makeinfo on the .texi file and turns it into a final .txt file we can give to guile.


How to use the doc .txt file with guile

Guile has a module named ice-9 documentation. It works like this:

(use-modules (ice-9 documentation))
(set! documentation-files (cons "/path/to/docs.txt" documentation-files))

I haven't tested to see whether the string can take a relative path, but I think it does.

That's it. Now the documentation should pop up with Geiser when you press C-c C-d C-d :-)

Monday, 5 November 2012

Using GNU guile for fun and profit

Many developers want to add scripting to their games, especially for adding events to levels, controlling AI and controlling graphical effects. Me myself have always enjoyed making simple AI games where a "server" controls the gaming world, and AI programs connects to it and battle it out.
  Now, some people enjoy Lua for scripting, and others have used Python. I don't particularly like Lua's semantics, and I think Python is a bit clumsy to embed. But GNU Guile turned out to be perfect for my use. For those not in the know, GNU Guile is a Scheme implementation, and has been around for ages. It supports:
  • Almost everything from the Scheme R5RS standard
  • Almost all the SRFI recommendations
  • Multithreading and multiple environments/modules
  • Unicode
  • Extremely simple embedding

  The garbage collector in Guile is based on libgc which can be a bit confusing at first because of the stack magic, but code-wise it's extremely easy to use. Unlike Python's reference juggling, you hardly have to worry about such things when extending or embedding Guile. Let's look at a simple example which creates a function in C which can be called from Scheme, in addition to a REPL we can use to test code:

#include <libguile.h>
SCM our_scheme_func(void)
{
    printf("inside our scheme function!\n");
    return SCM_UNSPECIFIED;
}

void boostrap_main(int argc, char** argv)
{
    /* register our function with scheme */
    scm_c_define_gsubr("our-scheme-func", 0, 0, 0, our_scheme_func);
    scm_shell (argc, argv);
}

int main(int argc, char** argv)
{
    scm_boot_guile (argc, argv, boostrap_main, 0);
    return 0;
}

  Lets go through this in detail. scm_boot_guile() is one of many ways to initialize a new scheme environment, as well as initializing the garbage collector on the current thread. scm_boot_guile() never returns here because of the call to scm_shell which starts a new Guile interactive shell, also known as a read-eval-print-loop, or REPL for short. scm_c_define_gsubr() is the way to create new Scheme functions, which is probably the most useful aspect of this code. The second to fourth parameters are the number of required and optional arguments, as well as "rest" arguments. Rest arguments for scheme are kind of like varargs in C. The last parameter is simply a pointer to the function. So our function returns an unspecified value, and takes no parameters at all (the parameter counts are all 0). A function taking no parameters is known as a 'thunk' in Scheme. You call the function from Scheme like this:
(our-scheme-func)

If our function had more parameters, it would look like this:
(our-scheme-func 'foo '(1 2 3) 4.5)

  ..which would be a function taking three parameters, a symbol, a list and a decimal number. Scheme supports several datatypes, but the actual code (known as S-expressions) are always lists. So what is this SCM type that is the return type of our function? It's the generic C representation of all Scheme types. SCM::type is a value you can test against enums to get the underlying type, but libguile fortunately has some handy macros for testing each type. You can make your own Scheme types, but I will not explain that in this post.

  But running a Scheme interpreter on a single thread isn't very useful, especially if it blocks. Perhaps we even want to run multiple Scheme environments simultaneously. Wouldn't it be great to have a game where each AI or NPC is controlled by a Scheme script alone? That was my main intention behind using Guile, so I made it work with multiple threads. I did run into some obstacles which I'll explain first. Big thanks to Mark Weaver on #guile @ FreeNode for his patience! The first problem I had was that the documentation clearly states that libguile is compatible with posix threads. At least for guile version 2.0.5 which is available in Ubuntu 12.04 LTS, this is not the case. Creating new Scheme environments inside a thread made with pthread_create() crashes as soon as the garbage collector is initialized. The workaround is to first create a scheme environment in the main thread, and then create new threads with scm_spawn_thread(). The second bug I encountered while testing was that scm_c_eval_string() did lazy evaluation, which caused its Scheme symbol to not show up in the thread Scheme environments. Calling it once in the main thread did the trick. But if you want to load Scheme files, it is better to just call scm_c_primitive_load() instead. Anyways, enough beating around the bush; let's make some code:
#include <libguile.h>

typedef struct
{
  int val;
  const char* filename;
} threaddata_t;

threaddata_t threadData = {99, "file.ss"};

SCM test_func(void)
{
  printf("(test-func) : File %s, Value %d\n", threadData.filename, threadData.val);
  return SCM_UNSPECIFIED;
}

static SCM scheme_environment(void* userdata)
{
  scm_c_primitive_load(threadData.filename);
  return SCM_UNSPECIFIED;
}

void* bootstrap_main(void* v)
{
  SCM th;
  (void)v;
  scm_c_define_gsubr("test-func", 0, 0, 0, test_func);
  th = scm_spawn_thread(scheme_environment, NULL, NULL, NULL);
  scm_join_thread(th);
  return NULL;
}

int main(int argc, char* argv[])
{
  (void)scm_with_guile(bootstrap_main, NULL);
  return 0;
}


  First we initialize a Scheme environment in our main thread with scm_with_guile(). The extra indirection is kind of weird, but it is the most portable way to initialize Guile. The actual function executed on the main thread calls scm_c_define_gsubr() which we already know registers a C function with Scheme. Then scm_spawn_thread() is called, which works just like pthread_create(), only it also takes optional parameters to an exception handler. Finally the main thread waits for the thread to finish by polling on scm_join_thread().
  The actual work done by the thread is to call scm_c_primitive_load(), which loads a Scheme source file. If the toplevel contains an actual function call, it is also executed. Any C function we register with scm_c_define_gsubr() is available for Scheme to use, and it does not belong to any specific module (nothing to import or require). If we created more threads, all such extended functionality would be visible in all the threads.
  But we have a last issue to resolve. scm_c_define_gsubr() doesn't take a void* userdata parameter, so we are unable to pass on data from C into our Scheme functions. The only parameters we can specify are of type SCM, which comes from the Scheme environment itself. There are three ways around this problem:
  • We could call scm_current_thread() and use it as a lookup
  • We could encapsulate all our state in one or several Scheme types and make them visible at toplevel
  • Use environment-specific state, called 'Fluids'


  Fluids are state objects whose values are dependent on the current module/environment. It basically works as thread local storage. First you create a single fluid object with scm_make_fluid(). Then each thread can set it individually with scm_fluid_set_x() and fetch the value with scm_fluid_ref(). Putting everything together, we get:

#include <libguile.h>

#define NUM_THREADS 4
/* Our local data thread storage */
SCM fluid_thread_id;

typedef struct
{
  int val;
  const char* filename;
} threaddata_t;

/* Multiple threads this time */
threaddata_t threadData[NUM_THREADS] = {
  {99,  "file1.ss"},
  {100, "file2.ss"},
  {101, "file3.ss"},
  {102, "file4.ss"}
};

/* Get the SCM value from the fluid, then convert the SCM to an actual C int.
   The value we stored was the iteration value from creating the threads (0 to 3)*/
static int get_thread_id(void)
{
  int r = scm_to_int(scm_fluid_ref(fluid_thread_id));
  return r;
}


SCM test_func(void)
{
  int num;
  /* By getting the iteration value we used when creating the threads,
     we can look up per-thread data. Our function is reentrant. */
  num = get_thread_id();
  printf("(test-func) : File %s, Thread %d, value %d\n", threadData[num].filename, num, threadData[num].val);
  return SCM_UNSPECIFIED;
}


static SCM scheme_environment(void* userdata)
{
  SCM sint;
  /* void* to int to SCM, see main for the passed value */
  sint = scm_from_int32(*((int*)userdata));
  /* Set the local thread storage to the thread number 'sint'.
     Thread 0 with be 0, thread 1 will be 1, and so on */
  scm_fluid_set_x(fluid_thread_id, sint);
  /* Load a Scheme file for this Guile instance */
  scm_c_primitive_load(threadData[get_thread_id()].filename);
  return SCM_UNSPECIFIED;
}

void* bootstrap_main(void* v)
{
  int i;
  SCM th[NUM_THREADS];
  (void)v;
  fluid_thread_id = scm_make_fluid();
  /* The Scheme function test-func is available in all the threads. */
  scm_c_define_gsubr("test-func", 0, 0, 0, test_func);
  for(i=0; i<NUM_THREADS; ++i)
    th[i] = scm_spawn_thread(scheme_environment, &i, NULL, NULL);
  for(i=0; i<NUM_THREADS; ++i)
    scm_join_thread(th[i]);
  return NULL;
}

int main(int argc, char* argv[])
{
  (void)scm_with_guile(bootstrap_main, NULL);
  return 0;
}



  That's all there is to it! Now you can write four Scheme files with file extension .ss and they will all run concurrently. Any state changes to the application should of course be synchronized properly with a mutex or similar.

Friday, 15 July 2011

Fixedpoint math with Newton-Raphson and the Taylor-series [Part 1]

When working with embedded platforms like ARM and AVR, you might not have a floating-point unit. And you might lack support for division in hardware. Floating-point emulation is slow, and so are most C-runtime's implementations for division. So what do you do? Use integers of course! I assume my readers already know the basics of fixedpoint, like how you add, subtract, multiply and divide with fixedpoint, and how you convert between different bases. What I know a lot of people struggle with is to implement fixedpoint versions of log2, ln, pow, sqrt, cos, sin and division. While I make little mention on performance, I'll try to cover general algorithms which can be made very efficient. Especially if you can make a trade-off between memory usage, performance and accuracy.

Before we can implement our math functions, I need to explain the general idea behind Newton-Raphson. So what is Newton-Raphson anyway?

Newton-Raphson is an algorithm for iteratively solving the roots of a function. The function's input is its output, such that the accuracy of your answer increases quadratically with the number of iterations. The initial input is a guess or approximation. The general algorithm is:

X[n+1] = X[n] - f(X[n]) / f'(X[n])

Where f is the root function, f' is the derivative of the root function, X[n] is the value from the last iteration, and X[n+1] is the value for the next iteration. When you have iterated enough, X[n+1] is also the answer. In C-code:

unsigned newton_raphson_unary(
    unsigned int val,
    void (*f)(unsigned),
    void (*fd)(unsigned)
)
{
    for(unsigned int i=0; i<5; ++i)
        val = val - f(val) / fd(val); 
    return val;
}
But what does the root-function contain? It's very simple. Remember from highschool when you learned that the two answers for a quadratic equation ax² + bx + c is where the parabola intersected the x-axis? That's exactly what the root function is. The exact answer is where the function intersects the x-axis, because we solve for y = f(x), where y = 0. For every iteration the result of the root function gets closer and closer to 0. In fact, the accuracy growth is quadratic, and the algorithm is self-corrective. You rewrite/expand your root function such that you get rid of the function you want to find, and you put everything on the right side of the equation, so that it equals 0. For example, a possible root function to find a square root could be: x² - N = 0, where N is the input for sqrt, and X is the answer.

But we also need the derivative of the root function. Even if you hate calculus, fear not. All you need to understand are the differentiation rules [Wikipedia], and those can be looked up instead of remembered. If we keep our square root example, the derivative of the function f(x) = x² - N is f'(x) = 2x. This is due to the rule x^k = kx^(k-1). And N is a constant, so it's removed.

A full square root example below, but we have simplified the equation like this:
x = x - f(x) / f'(x)
x = x - (x² - val) / (2x)
x = (2x²/2x) - ((x² - val) / 2x)
x = (x² + val) / 2x
x = (x² / 2x) + (val / 2x)
x = (x / 2) + ((val/2) / x)

/* BASE is a constant for the fixedpoint scale */
static unsigned fp_sqrt(unsigned val)
{
    unsigned x;
    int bitpos;
    unsigned long long v;

    if(!val)
        return val;

    /* clz = count-leading-zeros. bitpos is the position of the most significant bit,
        relative to "1" or 1 << base */
    bitpos = base - clz(val);
    
    /* Calculate our first estimate.
        We use the identity 2^a * 2^a = 2^(2*a) or:
         sqrt(2^a) = 2^(a/2)
    */
    if(bitpos > 0u) /* val > 1 */
        x = (1u<<BASE)<<(bitpos >> 1u);
    else if(bitpos < 0) /* 0 < val < 1 */
        x = (1u<<BASE)<<((unsigned)(-bitpos) << 1u);
    else /* val == 1 */
        x = (1u<<BASE);
    
    /* We need to scale val with base due to the division.
       Also val /= 2, hence the subtraction of one*/
    v = (unsigned long long)val <<  (BASE - 1u);

    /* The actual iteration */
    x = (x >> 1u) + v/x;
    x = (x >> 1u) + v/x;
    x = (x >> 1u) + v/x;
    x = (x >> 1u) + v/x;
    return x;
}
I hope this article was easy enough to understand. Since I've covered the basics on how to use Newton-Raphson and implemented sqrt(), I'll continue with the implementation of more math functions next time.
If anyone has corrections, criticism or questions, I'm all ears. The whole point of this post is to explain this to people who struggle with math (including myself), but are otherwise decent programmers.

Saturday, 18 September 2010

Linear Programming

I read a nice introduction to linear programming the other day, which I found really interesting. Why? Because I thought all problems that are easily solvable with integer programming and linear programming were NP-complete. Take this classical example:
You have two types of crops x and y, where crop x gives you 20 dollars per acre, and crop y gives you 30 dollars per acre. In addition, crop x takes two hours of time per acre to harvest, while crop y takes four hours per acre to harvest. You have 200 acres of land, and 70 working hours to spare. Which combination of crops gives you the highest profits?
Sounds like a lot of combinations to try out, right? But luckily a trial-and-error approach isn't needed. With Linear Programming, this problem can be rewritten as a linear functions with restrictions as hyperplanes.

First of all, we need a function to maximize for. In our example above, it becomes:
Z = 20x+30y.
And the number of acres of land we have, together with the working hours become our constraints. For example, the number of acres we want to plant must be less than the number of acres we have for disposition:
x+y <= 200
The total time it takes to harvest the crops needs to be less than 70:
2x+4y <= 70
And we can't plant a negative area:
x >= 0
y >= 0


What happens is that if we plot this on a 2D graph, we get four lines/planes that define a convex polygon where they intersect. This is called the 'feasible region'. The optimal result is always one of the vertices, where the origin is a part of the vertex set. When you have the vertices, you can plug the x and y values into your profit function and find the biggest one. For this example, the maximum profits are found at the vertex x=35, y=0, and the result from our function becomes 20 * 35 + 30 * 0 = 700.
I said that these are kind of like planes, but how? Wait, doesn't the constraints look like scalar products? In fact, they do look like ax+by = d, or some prefer ax+by-d = 0. And finding the intersection between two lines, or three planes (3D) is fairly trivial. But what about higher dimensions than 3D, if you have a lot of unknowns? That's the problem I initially had. I suck at math, so unknown to me, you can generalize plane intersection tests in k dimensions with a system of linear equations. To algorithmically solve such a system you can use gauss elimination, which does a series of identity transformations on the matrix. The matrix simply consists of all the coefficients for the unknows, with the 'd' on as the last element in each row. I made a gaussian solver, which can now be found on GitHub as a part of a larger library I have in the works: http://github.com/Madsy/LGML

Friday, 22 January 2010

Nokia N900 specifications.

I bought a Nokia N900 today. While it's not shipping before February 4th, I'm already hunting down those hardware and platform specifications. As we all know, hardware specifications are difficult to find. They're always hidden behind 10 layers of links and a non-working search function. It's just how the Universe works.
So, to save you N900/Maemo-lovers the trouble, here's a nice list:

Hardware specifications:


OMAP34xx SoC (The main chip)
TMS320C64x/C64x+ DSP CPU and Instruction Set Reference
ARM Cortex A8 specification
ARM Architecture Reference Manual (For ARMv6 and earlier.)

Note that this revision of the ARM Architecture Reference Manual is outdated. The ARMv7 one which applies to the Cortex-series is not yet released for the public. I have no idea why they're still keeping it a secret, but you can apply for a copy here. Registration is free of charge.
The information definitely missing in this revision are the new ARMv7 instructions, as well as the whole NEON instruction set, but there are probably other things as well.

Maemo.org seem to have links to all the hardware not so far listed here. The video camera, accelerometer and the FM radio transmitter among other peripherals. So it's a nice supplement.

Platform specifications (ABIs):


Unix System V ELF base specification
ELF specification from the ARM EABI (extends on the former)
ARM ELF supplement from CodeSourcery

The Unix System V ELF base specification is pretty old, and barely applies to the x86 architecture. So the ELF ARM EABI document extends on that one for the ARM architecture. The last document is a supplement to the previous ARM ELF supplement, which nails down some things which applies to Linux only. The ARM EABI is a bit vague, since it is meant for multiple platforms. The CodeSourcery supplement handles that.

Libraries/APIs:


OpenGL ES 2.0.24 specification

Ah, OpenGL ES 2.0 with vertex and fragment shader support! iPhone users should be envious. Since the OMAP3 SoC supports OpenGL ES features beyond the OpenGL ES 2.0 core, I'm sure it has some extensions as well. Be sure to check the Khronos extension registry for specifications on those.

If you feel something is missing which you can't find here nor on the maemo.org wiki, tell me in the comments so I can add it here.

Happy coding.

Subscribe to RSS feed