MSVC – 0x2B|~0x2B

Two-phase Lookup in C++ Templates

Posted on December 12, 2014 by gonwan — No Comments ↓

This is a quick note to C++ Templates: The Complete Guide. Name Taxonomy comes first in chapter 9:

Qualified name: This term is not defined in the standard, but we use it to refer to names that undergo so-called qualified lookup. Specifically, this is a qualified-id or an unqualified-id that is used after an explicit member access operator (. or ->). Examples are S::x, this->f, and p->A::m. However, just class_mem in a context that is implicitly equivalent to this->class_mem is not a qualified name: The member access must be explicit.

Unqualified name: An unqualified-id that is not a qualified name. This is not a standard term but corresponds to names that undergo what the standard calls unqualified lookup.

Dependent name: A name that depends in some way on a template parameter. Certainly any qualified or unqualified name that explicitly contains a template parameter is dependent. Furthermore, a qualified name that is qualified by a member access operator (. or ->) is dependent if the type of the expression on the left of the access operator depends on a template parameter. In particular, b in this->b is a dependent name when it appears in a template. Finally, the identifier ident in a call of the form ident(x, y, z) is a dependent name if and only if any of the argument expressions has a type that depends on a template parameter.

Nondependent name: A name that is not a dependent name by the above description.

And the definition from Chapter 10:

This leads to the concept of two-phase lookup: The first phase is the parsing of a template, and the second phase is its instantiation.

During the first phase, nondependent names are looked up while the template is being parsed using both the ordinary lookup rules and, if applicable, the rules for argument-dependent lookup (ADL). Unqualified dependent names (which are dependent because they look like the name of a function in a function call with dependent arguments) are also looked up that way, but the result of the lookup is not considered complete until an additional lookup is performed when the template is instantiated.

During the second phase, which occurs when templates are instantiated at a point called the point of instantiation(POI), dependent qualified names are looked up (with the template parameters replaced with the template arguments for that specific instantiation), and an additional ADL is performed for the unqualified dependent names.

To summarize: nondependent names are looked up in first phase, qualified dependent names are looked up in second phase, and unqualified dependent names are looked up in both phases. Some code to illustrate how this works:

#include <iostream>

template <typename T>
struct Base {
    typedef int I;
};

template <typename T>
struct Derived : Base<T> {
    void foo() {
        //typename Base<T>::I i = 1.024;
        I i = 1.024;
        std::cout << i << std::endl;
    }
};

template <>
struct Base<void> {
    //const static int I = 0;
    typedef double I;
};

int main() {
    Derived<bool> d1;
    d1.foo();
    Derived<void> d2;
    d2.foo();
    return 0;
}

#include <iostream>

template <typename T>

struct Base {

typedef int I;

};

template <typename T>

struct Derived : Base<T> {

void foo() {

//typename Base<T>::I i = 1.024;

I i = 1.024;

std::cout << i << std::endl;

}

};

template <>

struct Base<void> {

//const static int I = 0;

typedef double I;

};

int main() {

Derived<bool> d1;

d1.foo();

Derived<void> d2;

d2.foo();

return 0;

}

Now look into Derived::foo(). I is a nondependent name, it should be looked up only in first phase. But at that point, the compiler cannot decide the type of it. When instantiated with Derived<bool>, I is type int. When instantiated with Derived<void>, I is type double. So it’s better to look up I in the second phase. We can use typename Base<T>::I i = 1.024; to delay the look up, for I is a qualified dependent name now.

Unfortunately, two-phase lookup(C++03 standard) is not fully supported in VC++ even in VC++2013. It compiles well and gives your most expecting result(output 1 and 1.024). With gcc-4.6, it gives errors like:

temp1.cpp: In member function ‘void Derived<T>::foo()’:
temp1.cpp:12:9: error: ‘I’ was not declared in this scope
temp1.cpp:12:11: error: expected ‘;’ before ‘i’
temp1.cpp:13:22: error: ‘i’ was not declared in this scope

temp1.cpp: In member function ‘void Derived<T>::foo()’:

temp1.cpp:12:9: error: ‘I’ was not declared in this scope

temp1.cpp:12:11: error: expected ‘;’ before ‘i’

temp1.cpp:13:22: error: ‘i’ was not declared in this scope

Another code snippet:

#ifdef _USE_STRUCT
/* ADL of nondependent names in two-phase lookup should
 * only works for types that have an associated namespace. */
struct Int { 
    Int(int) { };
};
#else
typedef int Int;
#endif

template <typename T>
void f(T i) {
    g(i);
};

void g(Int i) {
}

int main() {
    f(Int(1024));
    return 0;
}

#ifdef _USE_STRUCT

/* ADL of nondependent names in two-phase lookup should

* only works for types that have an associated namespace. */

struct Int {

Int(int) { };

};

#else

typedef int Int;

#endif

template <typename T>

void f(T i) {

g(i);

};

void g(Int i) {

}

int main() {

f(Int(1024));

return 0;

}

When the compiler sees f(), g() has not been declared. This code should not compile, if f() is a nontemplate function. Since f() is a template function and g() is a nondependent name, the compiler can use ADL in first phase to find the declaration of g(). Note, a user-defined type like Int is required here. Since int is a primitive type, it has no associated namespace, and no ADL is performed.

VC++2013 still compiles well with this code. You can find some clue that they will not support it in the next VC++2015 release. With gcc, they declared to fully support two-phase lookup in gcc-4.7. I used gcc-4.8, error output looks like:

temp2.cpp: In instantiation of ‘void f(T) [with T = int]’:
temp2.cpp:20:16:   required from here
temp2.cpp:13:8: error: ‘g’ was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation [-fpermissive]
     g(i);
        ^
temp2.cpp:16:6: note: ‘void g(Int)’ declared here, later in the translation unit
 void g(Int i) {
      ^

temp2.cpp: In instantiation of ‘void f(T) [with T = int]’:

temp2.cpp:20:16: required from here

temp2.cpp:13:8: error: ‘g’ was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation [-fpermissive]

g(i);

temp2.cpp:16:6: note: ‘void g(Int)’ declared here, later in the translation unit

void g(Int i) {

And the code compiles well with self-defined type Int(using -D_USE_STRUCT switch).

Pre/Post-main Function Call Implementation in C

Posted on February 13, 2014 by gonwan — No Comments ↓

In C++, pre/post-main function call can be implemented using a global class instance. Its constructor and destructor are invoked automatically before and after the main function. But in C, no such mechanism. Actually, there’s a glib implementation that can help. You may want to read my previous post about CRT sections of MSVC. I just copy the code and do some renaming:

#include <stdlib.h>
#if defined (_MSC_VER)
#if (_MSC_VER >= 1500)
/* Visual Studio 2008 and later have __pragma */
#define HAS_CONSTRUCTORS
#define DEFINE_CONSTRUCTOR(_func) \
    static void _func(void); \
    static int _func ## _wrapper(void) { _func(); return 0; } \
    __pragma(section(".CRT$XCU",read)) \
    __declspec(allocate(".CRT$XCU")) static int (* _array ## _func)(void) = _func ## _wrapper;
#define DEFINE_DESTRUCTOR(_func) \
    static void _func(void); \
    static int _func ## _constructor(void) { atexit (_func); return 0; } \
    __pragma(section(".CRT$XCU",read)) \
    __declspec(allocate(".CRT$XCU")) static int (* _array ## _func)(void) = _func ## _constructor;
#elif (_MSC_VER >= 1400)
/* Visual Studio 2005 */
#define HAS_CONSTRUCTORS
#pragma section(".CRT$XCU",read)
#define DEFINE_CONSTRUCTOR(_func) \
    static void _func(void); \
    static int _func ## _wrapper(void) { _func(); return 0; } \
    __declspec(allocate(".CRT$XCU")) static int (* _array ## _func)(void) = _func ## _wrapper;
#define DEFINE_DESTRUCTOR(_func) \
    static void _func(void); \
    static int _func ## _constructor(void) { atexit (_func); return 0; } \
    __declspec(allocate(".CRT$XCU")) static int (* _array ## _func)(void) = _func ## _constructor;
#else
/* Visual Studio 2003 and early versions should use #pragma code_seg() to define pre/post-main functions. */
#error Pre/Post-main function not supported on your version of Visual Studio.
#endif
#elif (__GNUC__ > 2) || (__GNUC__ == 2 && __GNUC_MINOR__ >= 7)
#define HAS_CONSTRUCTORS
#define DEFINE_CONSTRUCTOR(_func) static void __attribute__((constructor)) _func (void);
#define DEFINE_DESTRUCTOR(_func) static void __attribute__((destructor)) _func (void);
#else
/* not supported */
#endif

#include <stdlib.h>

#if defined (_MSC_VER)

#if (_MSC_VER >= 1500)

/* Visual Studio 2008 and later have __pragma */

#define HAS_CONSTRUCTORS

#define DEFINE_CONSTRUCTOR(_func) \

static void _func(void); \

static int _func ## _wrapper(void) { _func(); return 0; } \

__pragma(section(".CRT$XCU",read)) \

__declspec(allocate(".CRT$XCU")) static int (* _array ## _func)(void) = _func ## _wrapper;

#define DEFINE_DESTRUCTOR(_func) \

static void _func(void); \

static int _func ## _constructor(void) { atexit (_func); return 0; } \

__pragma(section(".CRT$XCU",read)) \

__declspec(allocate(".CRT$XCU")) static int (* _array ## _func)(void) = _func ## _constructor;

#elif (_MSC_VER >= 1400)

/* Visual Studio 2005 */

#define HAS_CONSTRUCTORS

#pragma section(".CRT$XCU",read)

#define DEFINE_CONSTRUCTOR(_func) \

static void _func(void); \

static int _func ## _wrapper(void) { _func(); return 0; } \

__declspec(allocate(".CRT$XCU")) static int (* _array ## _func)(void) = _func ## _wrapper;

#define DEFINE_DESTRUCTOR(_func) \

static void _func(void); \

static int _func ## _constructor(void) { atexit (_func); return 0; } \

__declspec(allocate(".CRT$XCU")) static int (* _array ## _func)(void) = _func ## _constructor;

#else

/* Visual Studio 2003 and early versions should use #pragma code_seg() to define pre/post-main functions. */

#error Pre/Post-main function not supported on your version of Visual Studio.

#endif

#elif (__GNUC__ > 2) || (__GNUC__ == 2 && __GNUC_MINOR__ >= 7)

#define HAS_CONSTRUCTORS

#define DEFINE_CONSTRUCTOR(_func) static void __attribute__((constructor)) _func (void);

#define DEFINE_DESTRUCTOR(_func) static void __attribute__((destructor)) _func (void);

#else

/* not supported */

#endif

One limitation in glib code is the lack of support for VS2003 and early versions. #pragma code_seg() is used to implement the same function:

/*
 * cl ctor.c
 * gcc ctor.c -o ctor
 */
#include "ctor.h"
#include <stdio.h>

#ifdef HAS_CONSTRUCTORS
DEFINE_CONSTRUCTOR(before)
DEFINE_DESTRUCTOR(after)
#else
#ifdef _MSC_VER
static void before(void);
static void after(void);
#pragma data_seg(".CRT$XCU")
static void (*msc_ctor)(void) = before;
#pragma data_seg(".CRT$XPU")
static void (*msc_dtor)(void) = after;
#pragma data_seg()
#endif
#endif

void before()
{
    printf("before main\n");
}

void after()
{
    printf("after main\n");
}

int main()
{
    printf("in main\n");
    return 0;
}

* cl ctor.c

* gcc ctor.c -o ctor

#include "ctor.h"

#include <stdio.h>

#ifdef HAS_CONSTRUCTORS

DEFINE_CONSTRUCTOR(before)

DEFINE_DESTRUCTOR(after)

#else

#ifdef _MSC_VER

static void before(void);

static void after(void);

#pragma data_seg(".CRT$XCU")

static void (*msc_ctor)(void) = before;

#pragma data_seg(".CRT$XPU")

static void (*msc_dtor)(void) = after;

#pragma data_seg()

#endif

void before()

{

printf("before main\n");

}

void after()

{

printf("after main\n");

}

int main()

{

printf("in main\n");

return 0;

}

Output from msvc/gcc:

before main
in main
after main

before main

in main

after main

MSVC CRT Initialization

Posted on February 13, 2014 by gonwan — No Comments ↓

This post provides a detailed view of the MSDN article CRT Initialization. Just paste some content here:

The CRT obtains the list of function pointers from the Visual C++ compiler. When the compiler sees a global initializer, it generates a dynamic initializer in the .CRT$XCU section (where CRT is the section name and XCU is the group name). To obtain a list of those dynamic initializers run the command dumpbin /all main.obj, and then search the .CRT$XCU section (when main.cpp is compiled as a C++ file, not a C file).

The CRT defines two pointers:
– __xc_a in .CRT$XCA
– __xc_z in .CRT$XCZ

Both groups do not have any other symbols defined except __xc_a and __xc_z. Now, when the linker reads various .CRT groups, it combines them in one section and orders them alphabetically. This means that the user-defined global initializers (which the Visual C++ compiler puts in .CRT$XCU) will always come after .CRT$XCA and before .CRT$XCZ.

So, the CRT library uses both __xc_a and __xc_z to determine the start and end of the global initializers list because of the way in which they are laid out in memory after the image is loaded.

Let’s run our VS debugger to further investigate the CRT implementation. I’m using VS2010, and a global instance of class A is declared and initialized:

class A
{
public:
    A();
    ~A();
};

A::A()
{
    std::cout << "in A::A()" << std::endl;
}

A::~A()
{
    std::cout << "in A::~A()" << std::endl;
}

A a;

class A

{

public:

A();

~A();

};

A::A()

{

std::cout << "in A::A()" << std::endl;

}

A::~A()

{

std::cout << "in A::~A()" << std::endl;

}

A a;

Now set the breakpoints in the constructor and destructor, and start debugging. I’ve tried exe/dll and dynamic/static CRT combinations to view the call stacks:

1) exe with crt dynamic linked:
  crtexe.c: (w)mainCRTStartup()
    +--> crtexe.c: __tmainCRTStartup()
           +--> crt0dat.c: _initterm()
2) exe with crt static linked:
  crt0.c: _tmainCRTStartup()
    +--> crt0.c: __tmainCRTStartup()
           +--> crt0dat.c: _cinit()
                  +--> crt0dat.c: _initterm()
3) dll with crt dynamic linked:
  crtdll.c: _DllMainCRTStartup()
    +--> crtdll.c: __DllMainCRTStartup()
           +--> crtdll.c: _CRT_INIT()
                  +--> crt0dat.c: _initterm()
4) dll with crt static linked:
  dllcrt0.c: _DllMainCRTStartup()
    +--> dllcrt0.c: __DllMainCRTStartup()
           +--> dllcrt0.c: _CRT_INIT()
                  +--> crt0dat.c: _cinit()
                         +--> crt0dat.c: _initterm()

1) exe with crt dynamic linked:

crtexe.c: (w)mainCRTStartup()

+--> crtexe.c: __tmainCRTStartup()

+--> crt0dat.c: _initterm()

2) exe with crt static linked:

crt0.c: _tmainCRTStartup()

+--> crt0.c: __tmainCRTStartup()

+--> crt0dat.c: _cinit()

+--> crt0dat.c: _initterm()

3) dll with crt dynamic linked:

crtdll.c: _DllMainCRTStartup()

+--> crtdll.c: __DllMainCRTStartup()

+--> crtdll.c: _CRT_INIT()

+--> crt0dat.c: _initterm()

4) dll with crt static linked:

dllcrt0.c: _DllMainCRTStartup()

+--> dllcrt0.c: __DllMainCRTStartup()

+--> dllcrt0.c: _CRT_INIT()

+--> crt0dat.c: _cinit()

+--> crt0dat.c: _initterm()

_initterm is defined as follow. It is used to walk through __xc_a and __xc_z mentioned above:

// crt0dat.c
void __cdecl _initterm (
        _PVFV * pfbegin,
        _PVFV * pfend
        )
{
        /*
         * walk the table of function pointers from the bottom up, until
         * the end is encountered.  Do not skip the first entry.  The initial
         * value of pfbegin points to the first valid entry.  Do not try to
         * execute what pfend points to.  Only entries before pfend are valid.
         */
        while ( pfbegin < pfend )
        {
            /*
             * if current table entry is non-NULL, call thru it.
             */
            if ( *pfbegin != NULL )
                (**pfbegin)();
            ++pfbegin;
        }
}

// crt0dat.c

void __cdecl _initterm (

_PVFV * pfbegin,

_PVFV * pfend

)

{

* walk the table of function pointers from the bottom up, until

* the end is encountered. Do not skip the first entry. The initial

* value of pfbegin points to the first valid entry. Do not try to

* execute what pfend points to. Only entries before pfend are valid.

while ( pfbegin < pfend )

{

* if current table entry is non-NULL, call thru it.

if ( *pfbegin != NULL )

(**pfbegin)();

++pfbegin;

}

__xc_a, __xc_z and other section groups are defined as:

// crt0dat.c
/*
 * pointers to initialization sections
 */
extern _CRTALLOC(".CRT$XIA") _PIFV __xi_a[];
extern _CRTALLOC(".CRT$XIZ") _PIFV __xi_z[];    /* C initializers */
extern _CRTALLOC(".CRT$XCA") _PVFV __xc_a[];
extern _CRTALLOC(".CRT$XCZ") _PVFV __xc_z[];    /* C++ initializers */
extern _CRTALLOC(".CRT$XPA") _PVFV __xp_a[];
extern _CRTALLOC(".CRT$XPZ") _PVFV __xp_z[];    /* C pre-terminators */
extern _CRTALLOC(".CRT$XTA") _PVFV __xt_a[];
extern _CRTALLOC(".CRT$XTZ") _PVFV __xt_z[];    /* C terminators */
// sect_attribs.h
#define _CRTALLOC(x) __declspec(allocate(x))

// crt0dat.c

* pointers to initialization sections

extern _CRTALLOC(".CRT$XIA") _PIFV __xi_a[];

extern _CRTALLOC(".CRT$XIZ") _PIFV __xi_z[]; /* C initializers */

extern _CRTALLOC(".CRT$XCA") _PVFV __xc_a[];

extern _CRTALLOC(".CRT$XCZ") _PVFV __xc_z[]; /* C++ initializers */

extern _CRTALLOC(".CRT$XPA") _PVFV __xp_a[];

extern _CRTALLOC(".CRT$XPZ") _PVFV __xp_z[]; /* C pre-terminators */

extern _CRTALLOC(".CRT$XTA") _PVFV __xt_a[];

extern _CRTALLOC(".CRT$XTZ") _PVFV __xt_z[]; /* C terminators */

// sect_attribs.h

#define _CRTALLOC(x) __declspec(allocate(x))

gcc uses similar technology to deal with pre/post-main stuff. The section names are .init and .fini .

Compiler Intrinsic Functions

Posted on October 30, 2013 by gonwan — No Comments ↓

Copied from Wikipedia:

An intrinsic function is a function available for use in a given programming language whose implementation is handled specially by the compiler. Typically, it substitutes a sequence of automatically generated instructions for the original function call, similar to an inline function. Unlike an inline function though, the compiler has an intimate knowledge of the intrinsic function and can therefore better integrate it and optimize it for the situation. This is also called builtin function in many languages.

A code snippet is written to check the code generation when intrinsic is enabled or not:

/*
 * # gcc -S intrinsic.c -o intrinsic.s
 * # gcc -S -fno-builtin intrinsic.c -o intrinsic2.s
 * # cl /c /Oi intrinsic.c /FAs /Faintrinsic.asm
 * # cl /c intrinsic.c /FAs /Faintrinsic2.asm
 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

const char *c = "Hello World!";
char c2[16];

int main(int argc, char *argv[])
{
    int a = abs(argc);
    memcpy(c2, c, 12);
    printf("%d,%s\n", a, c2);
    return 0;
}

* # gcc -S intrinsic.c -o intrinsic.s

* # gcc -S -fno-builtin intrinsic.c -o intrinsic2.s

* # cl /c /Oi intrinsic.c /FAs /Faintrinsic.asm

* # cl /c intrinsic.c /FAs /Faintrinsic2.asm

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

const char *c = "Hello World!";

char c2[16];

int main(int argc, char *argv[])

{

int a = abs(argc);

memcpy(c2, c, 12);

printf("%d,%s\n", a, c2);

return 0;

}

Generated assembly:

main:
    pushl   %ebp
    movl    %esp, %ebp
    andl    $-16, %esp
    subl    $32, %esp
    movl    8(%ebp), %eax
    sarl    $31, %eax
    movl    %eax, %edx
    xorl    8(%ebp), %edx
    movl    %edx, 28(%esp)
    subl    %eax, 28(%esp)
    movl    c, %eax
    movl    %eax, %edx
    movl    $c2, %eax
    movl    (%edx), %ecx
    movl    %ecx, (%eax)
    movl    4(%edx), %ecx
    movl    %ecx, 4(%eax)
    movl    8(%edx), %edx
    movl    %edx, 8(%eax)
    movl    $.LC1, %eax
    movl    $c2, 8(%esp)
    movl    28(%esp), %edx
    movl    %edx, 4(%esp)
    movl    %eax, (%esp)
    call    printf
    movl    $0, %eax
    leave
    ret

main:

pushl %ebp

movl %esp, %ebp

andl $-16, %esp

subl $32, %esp

movl 8(%ebp), %eax

sarl $31, %eax

movl %eax, %edx

xorl 8(%ebp), %edx

movl %edx, 28(%esp)

subl %eax, 28(%esp)

movl c, %eax

movl %eax, %edx

movl $c2, %eax

movl (%edx), %ecx

movl %ecx, (%eax)

movl 4(%edx), %ecx

movl %ecx, 4(%eax)

movl 8(%edx), %edx

movl %edx, 8(%eax)

movl $.LC1, %eax

movl $c2, 8(%esp)

movl 28(%esp), %edx

movl %edx, 4(%esp)

movl %eax, (%esp)

call printf

movl $0, %eax

leave

ret

Only printf() is in code. No abs() nor memcpy(). Since they are intrinsic, as listed here in gcc’s online document.

Intrinsic can be explicitly disabled. For instance, CRT intrinsic must be disabled for kernel development. Add -fno-builtin flag to gcc, or remove /Oi switch in MSVC. Only paste the generated code in gcc case here:

main:
    pushl   %ebp
    movl    %esp, %ebp
    andl    $-16, %esp
    subl    $32, %esp
    movl    8(%ebp), %eax
    movl    %eax, (%esp)
    call    abs
    movl    %eax, 28(%esp)
    movl    c, %eax
    movl    %eax, %edx
    movl    $c2, %eax
    movl    $12, 8(%esp)
    movl    %edx, 4(%esp)
    movl    %eax, (%esp)
    call    memcpy
    movl    $.LC1, %eax
    movl    $c2, 8(%esp)
    movl    28(%esp), %edx
    movl    %edx, 4(%esp)
    movl    %eax, (%esp)
    call    printf
    movl    $0, %eax
    leave
    ret

main:

pushl %ebp

movl %esp, %ebp

andl $-16, %esp

subl $32, %esp

movl 8(%ebp), %eax

movl %eax, (%esp)

call abs

movl %eax, 28(%esp)

movl c, %eax

movl %eax, %edx

movl $c2, %eax

movl $12, 8(%esp)

movl %edx, 4(%esp)

movl %eax, (%esp)

call memcpy

movl $.LC1, %eax

movl $c2, 8(%esp)

movl 28(%esp), %edx

movl %edx, 4(%esp)

movl %eax, (%esp)

call printf

movl $0, %eax

leave

ret

There _are_ abs() and memcpy() now. General MSVC intrinsic can be found here.

Intrinsic is easier than inline assembly. It is used to increase performance in most cases. Both gcc and MSVC provide intrinsic support for Intel’s MMX, SSE and SSE2 instrument set. Code snippet to use MMX:

/*
 * # gcc -O2 -S -mmmx intrinsic_mmx.c -o intrinsic_mmx.s
 * # cl /O2 /c intrinsic_mmx.c /FAs /Faintrinsic_mmx.asm
 */
#include <stdio.h>
#include <mmintrin.h>

int main()
{
    __m64 m1, m2, m3;
    int out1, out2;
    int in1[] = { 222, 111 };
    int in2[] = { 444, 333 };
#if 0
    m1 = _mm_setr_pi32(in1[0], in1[1]);
    m2 = _mm_setr_pi32(in2[0], in2[1]);
#else
    m1 = *(__m64 *)in1;
    m2 = *(__m64 *)in2;
#endif
    m3 = _mm_add_pi32(m1, m2); 
    out1 = _mm_cvtsi64_si32(m3);
    m3  = _mm_srli_si64(m3, 32);
    out2 = _mm_cvtsi64_si32(m3);
    _mm_empty();
    printf("out1=%d,out2=%d\n", out1, out2);
    return 0;
}

* # gcc -O2 -S -mmmx intrinsic_mmx.c -o intrinsic_mmx.s

* # cl /O2 /c intrinsic_mmx.c /FAs /Faintrinsic_mmx.asm

#include <stdio.h>

#include <mmintrin.h>

int main()

{

__m64 m1, m2, m3;

int out1, out2;

int in1[] = { 222, 111 };

int in2[] = { 444, 333 };

#if 0

m1 = _mm_setr_pi32(in1[0], in1[1]);

m2 = _mm_setr_pi32(in2[0], in2[1]);

#else

m1 = *(__m64 *)in1;

m2 = *(__m64 *)in2;

#endif

m3 = _mm_add_pi32(m1, m2);

out1 = _mm_cvtsi64_si32(m3);

m3 = _mm_srli_si64(m3, 32);

out2 = _mm_cvtsi64_si32(m3);

_mm_empty();

printf("out1=%d,out2=%d\n", out1, out2);

return 0;

}

Assembly looks like:

main:
    pushl   %ebp
    movl    %esp, %ebp
    andl    $-16, %esp
    subl    $16, %esp
    movq    .LC1, %mm0
    paddd   .LC2, %mm0
    movd    %mm0, 8(%esp)
    psrlq   $32, %mm0
    movd    %mm0, 12(%esp)
    emms
    movl    $.LC0, 4(%esp)
    movl    $1, (%esp)
    call    __printf_chk
    xorl    %eax, %eax
    leave
    ret

main:

pushl %ebp

movl %esp, %ebp

andl $-16, %esp

subl $16, %esp

movq .LC1, %mm0

paddd .LC2, %mm0

movd %mm0, 8(%esp)

psrlq $32, %mm0

movd %mm0, 12(%esp)

emms

movl $.LC0, 4(%esp)

movl $1, (%esp)

call __printf_chk

xorl %eax, %eax

leave

ret

You see MMX registers and instruments this time. -mmmx flag is required to build for gcc. MSVC also generate similar code. Reference for these instrument set is available on Intel’s website.

A simple benchmark to use SSE is avalable here.

MSVC Inline Assembly

Posted on October 23, 2013 by gonwan — No Comments ↓

MSVC’s inline assembly is easier to use, as compared to gcc’s version. It is easier to write right code than wrong one, I think. Still a simple add function is used to illustrate:

int add1(int a, int b)
{
    return a + b;
}

int add1(int a, int b)

{

return a + b;

}

The corresponding inline version:

int add2(int a, int b)
{
    __asm {
        mov eax, a;
        add eax, b;
    }
}

int add2(int a, int b)

{

__asm {

mov eax, a;

add eax, b;

}

__asm keyword is used to specify a inline assembly block. From MSDN, there is another asm keyword which is not recommended:

Visual C++ support for the Standard C++ asm keyword is limited to the fact that the compiler will not generate an error on the keyword. However, an asm block will not generate any meaningful code. Use __asm instead of asm.

Symbols in C/C++ code can be used directly in inline assembly. This is much more convenient than gcc. And it is also not necessary to load parameters into registers before usage as in gcc. MSVC does the job right even in optimization case.

NOTE: Inline assembly is not supported on the Itanium and x64 processors.

Let’s see the generated code:

# cl /c /FA testasm_windows.c

1	# cl /c /FA testasm_windows.c

Output:

PUBLIC _add2
_TEXT SEGMENT
_a$ = 8
_b$ = 12
_add2 PROC
 push ebp
 mov ebp, esp
 mov eax, DWORD PTR _a$[ebp]
 add eax, DWORD PTR _b$[ebp]
 pop ebp
 ret 0
_add2 ENDP
_TEXT ENDS

PUBLIC _add2

_TEXT SEGMENT

_a$ = 8

_b$ = 12

_add2 PROC

push ebp

mov ebp, esp

mov eax, DWORD PTR _a$[ebp]

add eax, DWORD PTR _b$[ebp]

pop ebp

ret 0

_add2 ENDP

_TEXT ENDS

Function parameters are located in [ebp+12] and [ebp+8] as referred by symbol a and b. Then, what happened if registers other than scratch registers are specified?

int add3(int a, int b)
{
    __asm {
        mov ebx, a;
        add ebx, b;
        mov eax, ebx;
    }
}

int add3(int a, int b)

{

__asm {

mov ebx, a;

add ebx, b;

mov eax, ebx;

}

Output assembly code:

PUBLIC _add3
_TEXT SEGMENT
_a$ = 8
_b$ = 12
_add3 PROC
 push ebp
 mov ebp, esp
 push ebx
 mov ebx, DWORD PTR _a$[ebp]
 add ebx, DWORD PTR _b$[ebp]
 mov eax, ebx
 pop ebx
 pop ebp
 ret 0
_add3 ENDP
_TEXT ENDS

PUBLIC _add3

_TEXT SEGMENT

_a$ = 8

_b$ = 12

_add3 PROC

push ebp

mov ebp, esp

push ebx

mov ebx, DWORD PTR _a$[ebp]

add ebx, DWORD PTR _b$[ebp]

mov eax, ebx

pop ebx

pop ebp

ret 0

_add3 ENDP

_TEXT ENDS

As you see, MSVC automatically preserves ebx for us. From MSDN:

When using __asm to write assembly language in C/C++ functions, you don’t need to preserve the EAX, EBX, ECX, EDX, ESI, or EDI registers.

Let’s see the case when stdcall calling convention is used:

int __stdcall add4(int a, int b)
{
    __asm {
        mov eax, a;
        add eax, b;
    }
}

int __stdcall add4(int a, int b)

{

__asm {

mov eax, a;

add eax, b;

}

Output:

PUBLIC _add4@8
_TEXT SEGMENT
_a$ = 8
_b$ = 12
_add4@8 PROC
 push ebp
 mov ebp, esp
 mov eax, DWORD PTR _a$[ebp]
 add eax, DWORD PTR _b$[ebp]
 pop ebp
 ret 8
_add4@8 ENDP
_TEXT ENDS

PUBLIC _add4@8

_TEXT SEGMENT

_a$ = 8

_b$ = 12

_add4@8 PROC

push ebp

mov ebp, esp

mov eax, DWORD PTR _a$[ebp]

add eax, DWORD PTR _b$[ebp]

pop ebp

ret 8

_add4@8 ENDP

_TEXT ENDS

In stdcall, stack is cleaned up by callee. So, there’s a ret 8 before return. And the function name is mangled to _add4@8.

MSVC also supports fastcall calling convention, but it causes register conflicts as mentioned on MSDN, and is not recommended. Just test it here, the code happens to work:)

int __fastcall add5(int a, int b)
{
    __asm {
        mov eax, a;
        add eax, b;
    }
}

int __fastcall add5(int a, int b)

{

__asm {

mov eax, a;

add eax, b;

}

Output:

PUBLIC @add5@8
_TEXT SEGMENT
_b$ = -8
_a$ = -4
@add5@8 PROC
 push ebp
 mov ebp, esp
 sub esp, 8
 mov DWORD PTR _b$[ebp], edx
 mov DWORD PTR _a$[ebp], ecx
 mov eax, DWORD PTR _a$[ebp]
 add eax, DWORD PTR _b$[ebp]
 mov esp, ebp
 pop ebp
 ret 0
@add5@8 ENDP
_TEXT ENDS

PUBLIC @add5@8

_TEXT SEGMENT

_b$ = -8

_a$ = -4

@add5@8 PROC

push ebp

mov ebp, esp

sub esp, 8

mov DWORD PTR _b$[ebp], edx

mov DWORD PTR _a$[ebp], ecx

mov eax, DWORD PTR _a$[ebp]

add eax, DWORD PTR _b$[ebp]

mov esp, ebp

pop ebp

ret 0

@add5@8 ENDP

_TEXT ENDS

Function parameters are passed in ecx and edx when using fastcall. But they are saved to stack first. It seems we get no benefit using this calling convention. Maybe MSVC does not implement it well. The function name is mangled to @add5@8.

Last, we can tell MSVC that we want to write our own prolog/epilog code sequences using __declspec(naked) directive:

__declspec(naked) int __cdecl add6(int a, int b)
{
    __asm {
        push ebp;
        mov ebp, esp;
        mov eax, a;
        add eax, b;
        pop ebp;
        ret;
    }
}

__declspec(naked) int __cdecl add6(int a, int b)

{

__asm {

push ebp;

mov ebp, esp;

mov eax, a;

add eax, b;

pop ebp;

ret;

}

Output:

PUBLIC _add6
_TEXT SEGMENT
_a$ = 8
_b$ = 12
_add6 PROC
 push ebp
 mov ebp, esp
 mov eax, DWORD PTR _a$[ebp]
 add eax, DWORD PTR _b$[ebp]
 pop ebp
 ret 0
_add6 ENDP
_TEXT ENDS

PUBLIC _add6

_TEXT SEGMENT

_a$ = 8

_b$ = 12

_add6 PROC

push ebp

mov ebp, esp

mov eax, DWORD PTR _a$[ebp]

add eax, DWORD PTR _b$[ebp]

pop ebp

ret 0

_add6 ENDP

_TEXT ENDS

Normal prolog/epilog is used here. MSVC does not generate duplicate these code when using __declspec(naked) directive.

Windows XP Targeting with C++ in Visual Studio 2012

Posted on April 8, 2013 by gonwan — No Comments ↓

Just downloaded and tried Visual Studio 2012(with update 2, version 11.0.60315.01). The Windows XP targeting is available(actually already available in update 1):

The executable generated by the original VS2012 toolchain does not run under Windows XP. A error message box is shown:

In update 1, the static and dynamic link libraries for the CRT, STL and MFC have been updated in-place to add runtime support for Windows XP and Windows Server 2003. And the runtime version is upgraded to 11.0.51106.1 from 11.0.50727.1.

Except the library update, there’s none real difference when selecting “v110” or “v110_xp” toolchain. I wrote a simple HelloWorld application and compare the two generated binary.

# hexdump hello_vs2012.exe > vs2012.txt
# hexdump hello_vs2012_xp.exe > vs2012_xp.txt
# diff -ru vs2012.txt vs2012_xp.txt

# hexdump hello_vs2012.exe > vs2012.txt

# hexdump hello_vs2012_xp.exe > vs2012_xp.txt

# diff -ru vs2012.txt vs2012_xp.txt

And the output:

--- vs2012.txt 2013-04-08 17:25:32.253623916 +0800
+++ vs2012_xp.txt 2013-04-08 17:25:41.321624132 +0800
@@ -12,11 +12,11 @@
 00000b0 1544 65e8 d30d 6526 382e 65e9 d3b4 6526
 00000c0 382e 65ea d3b4 6526 6952 6863 d3b5 6526
 00000d0 0000 0000 0000 0000 4550 0000 014c 0004
-00000e0 3d72 5162 0000 0000 0000 0000 00e0 0102
+00000e0 3d82 5162 0000 0000 0000 0000 00e0 0102
 00000f0 010b 000b 7e00 0001 fa00 0000 0000 0000
 0000100 614e 0000 1000 0000 9000 0001 0000 0040
-0000110 1000 0000 0200 0000 0006 0000 0000 0000
-0000120 0006 0000 0000 0000 a000 0002 0400 0000
+0000110 1000 0000 0200 0000 0005 0001 0000 0000
+0000120 0005 0001 0000 0000 a000 0002 0400 0000
 0000130 0000 0000 0003 8140 0000 0010 1000 0000
 0000140 0000 0010 1000 0000 0000 0000 0010 0000
 0000150 0000 0000 0000 0000 0b44 0002 0028 0000

--- vs2012.txt 2013-04-08 17:25:32.253623916 +0800

+++ vs2012_xp.txt 2013-04-08 17:25:41.321624132 +0800

@@ -12,11 +12,11 @@

00000b0 1544 65e8 d30d 6526 382e 65e9 d3b4 6526

00000c0 382e 65ea d3b4 6526 6952 6863 d3b5 6526

00000d0 0000 0000 0000 0000 4550 0000 014c 0004

-00000e0 3d72 5162 0000 0000 0000 0000 00e0 0102

+00000e0 3d82 5162 0000 0000 0000 0000 00e0 0102

00000f0 010b 000b 7e00 0001 fa00 0000 0000 0000

0000100 614e 0000 1000 0000 9000 0001 0000 0040

-0000110 1000 0000 0200 0000 0006 0000 0000 0000

-0000120 0006 0000 0000 0000 a000 0002 0400 0000

+0000110 1000 0000 0200 0000 0005 0001 0000 0000

+0000120 0005 0001 0000 0000 a000 0002 0400 0000

0000130 0000 0000 0003 8140 0000 0010 1000 0000

0000140 0000 0010 1000 0000 0000 0000 0010 0000

0000150 0000 0000 0000 0000 0b44 0002 0028 0000

The first difference represents the timestamps of the two binary. The other two differences standard for “Operating System Version” and “Subsystem Version”. We have 5.1 for Windows XP, 6.0 for Windows Vista and later. That’s all. And we can easily build a Windows XP binary from the command line with only one additional linker switch:

# cl hello.cpp /link /subsystem:console,5.01

1	# cl hello.cpp /link /subsystem:console,5.01

I also built a simple MFC application(dynamic link to MFC) with Windows XP target in VS2012. It runs fine under Windows XP with MFC DLLs copied in the same directory. From VS2010, the SxS assembly is not used any more. All you need to do is copy the dependent DLLs to the application directory and run.

Reference: http://blogs.msdn.com/b/vcblog/archive/2012/10/08/10357555.aspx

C++ Class Layout Using MSVC

Posted on September 20, 2010 by gonwan — No Comments ↓

The article is originally inspired by this one: http://www.openrce.org/articles/full_view/23. The undocumented parameters in MSVC++ compiler are: /d1reportSingleClassLayout<classname> and /d1reportAllClassLayout.

A simple example:

class CBase {
    int a;
public:
    virtual void foo() { }
};

class CDerived1: public CBase {
    int a1;
public:
    virtual void foo1() { }
};

class CDerived2: virtual public CBase {
    int a2;
public:
    virtual void foo() { }
    virtual void foo2() { }
};

class CBase {

int a;

public:

virtual void foo() { }

};

class CDerived1: public CBase {

int a1;

public:

virtual void foo1() { }

};

class CDerived2: virtual public CBase {

int a2;

public:

virtual void foo() { }

virtual void foo2() { }

};

The dumped layout:

class CBase size(8):
        +---
 0      | {vfptr}
 4      | a
        +---

CBase::$vftable@:
        | &CBase_meta
        |  0
 0      | &CBase::foo

CBase::foo this adjustor: 0


class CDerived1 size(12):
        +---
        | +--- (base class CBase)
 0      | | {vfptr}
 4      | | a
        | +---
 8      | a1
        +---

CDerived1::$vftable@:
        | &CDerived1_meta
        |  0
 0      | &CBase::foo
 1      | &CDerived1::foo1

CDerived1::foo1 this adjustor: 0


class CDerived2 size(20):
        +---
 0      | {vfptr}
 4      | {vbptr}
 8      | a2
        +---
        +--- (virtual base CBase)
12      | {vfptr}
16      | a
        +---

CDerived2::$vftable@CDerived2@:
        | &CDerived2_meta
        |  0
 0      | &CDerived2::foo2

CDerived2::$vbtable@:
 0      | -4
 1      | 8 (CDerived2d(CDerived2+4)CBase)

CDerived2::$vftable@CBase@:
        | -12
 0      | &CDerived2::foo

CDerived2::foo this adjustor: 12
CDerived2::foo2 this adjustor: 0

vbi:       class  offset o.vbptr  o.vbte fVtorDisp
           CBase      12       4       4 0

class CBase size(8):

+---

0 | {vfptr}

4 | a

+---

CBase::$vftable@:

| &CBase_meta

| 0

0 | &CBase::foo

CBase::foo this adjustor: 0

class CDerived1 size(12):

+---

| +--- (base class CBase)

0 | | {vfptr}

4 | | a

| +---

8 | a1

+---

CDerived1::$vftable@:

| &CDerived1_meta

| 0

0 | &CBase::foo

1 | &CDerived1::foo1

CDerived1::foo1 this adjustor: 0

class CDerived2 size(20):

+---

0 | {vfptr}

4 | {vbptr}

8 | a2

+---

+--- (virtual base CBase)

12 | {vfptr}

16 | a

+---

CDerived2::$vftable@CDerived2@:

| &CDerived2_meta

| 0

0 | &CDerived2::foo2

CDerived2::$vbtable@:

0 | -4

1 | 8 (CDerived2d(CDerived2+4)CBase)

CDerived2::$vftable@CBase@:

| -12

0 | &CDerived2::foo

CDerived2::foo this adjustor: 12

CDerived2::foo2 this adjustor: 0

vbi: class offset o.vbptr o.vbte fVtorDisp

CBase 12 4 4 0

You see: When using virtual inheritance, an additional vbptr is added into class layout. There is also a separated section containing the virtual base class, with vbptr pointing to it. So, the object size of virtual inheritance is bigger than non-virtual inheritance.

Now, here is a complex example:

class CBase1 {
    int a1;
public:
    virtual void foo1() { }
};

class CBase2 : public virtual CBase1 {
    int a2;
public:
    virtual void foo2() { }
};

class CBase3 : public virtual CBase1 {
    int a3;
public:
    virtual void foo3() { }
};

class CBase4 : public CBase1 {
    int a4;
public:
    virtual void foo4() { }
};

class CDerive: public CBase2, public CBase3, public CBase4 {
    int b;
public:
    virtual void bar() { }
};

class CBase1 {

int a1;

public:

virtual void foo1() { }

};

class CBase2 : public virtual CBase1 {

int a2;

public:

virtual void foo2() { }

};

class CBase3 : public virtual CBase1 {

int a3;

public:

virtual void foo3() { }

};

class CBase4 : public CBase1 {

int a4;

public:

virtual void foo4() { }

};

class CDerive: public CBase2, public CBase3, public CBase4 {

int b;

public:

virtual void bar() { }

};

The dumped layout:

class CBase1 size(8):
        +---
 0      | {vfptr}
 4      | a1
        +---

CBase1::$vftable@:
        | &CBase1_meta
        |  0
 0      | &CBase1::foo1
CBase1::foo1 this adjustor: 0


class CBase2 size(20):
        +---
 0      | {vfptr}
 4      | {vbptr}
 8      | a2
        +---
        +--- (virtual base CBase1)
12      | {vfptr}
16      | a1
        +---

CBase2::$vftable@CBase2@:
        | &CBase2_meta
        |  0
 0      | &CBase2::foo2

CBase2::$vbtable@:
 0      | -4
 1      | 8 (CBase2d(CBase2+4)CBase1)

CBase2::$vftable@CBase1@:
        | -12
 0      | &CBase1::foo1

CBase2::foo2 this adjustor: 0

vbi:       class  offset o.vbptr  o.vbte fVtorDisp
          CBase1      12       4       4 0


class CBase3 size(20):
        +---
 0      | {vfptr}
 4      | {vbptr}
 8      | a3
        +---
        +--- (virtual base CBase1)
12      | {vfptr}
16      | a1
        +---

CBase3::$vftable@CBase3@:
        | &CBase3_meta
        |  0
 0      | &CBase3::foo3

CBase3::$vbtable@:
 0      | -4
 1      | 8 (CBase3d(CBase3+4)CBase1)

CBase3::$vftable@CBase1@:
        | -12
 0      | &CBase1::foo1

CBase3::foo3 this adjustor: 0

vbi:       class  offset o.vbptr  o.vbte fVtorDisp
          CBase1      12       4       4 0


class CBase4 size(12):
        +---
        | +--- (base class CBase1)
 0      | | {vfptr}
 4      | | a1
        | +---
 8      | a4
        +---

CBase4::$vftable@:
        | &CBase4_meta
        |  0
 0      | &CBase1::foo1
 1      | &CBase4::foo4

CBase4::foo4 this adjustor: 0


class CDerive size(48):
        +---
        | +--- (base class CBase2)
 0      | | {vfptr}
 4      | | {vbptr}
 8      | | a2
        | +---
        | +--- (base class CBase3)
12      | | {vfptr}
16      | | {vbptr}
20      | | a3
        | +---
        | +--- (base class CBase4)
        | | +--- (base class CBase1)
24      | | | {vfptr}
28      | | | a1
        | | +---
32      | | a4
        | +---
36      | b
        +---
        +--- (virtual base CBase1)
40      | {vfptr}
44      | a1
        +---

CDerive::$vftable@CBase2@:
        | &CDerive_meta
        |  0
 0      | &CBase2::foo2
 1      | &CDerive::bar

CDerive::$vftable@CBase3@:
        | -12
 0      | &CBase3::foo3

CDerive::$vftable@:
        | -24
 0      | &CBase1::foo1
 1      | &CBase4::foo4

CDerive::$vbtable@CBase2@:
 0      | -4
 1      | 36 (CDerived(CBase2+4)CBase1)

CDerive::$vbtable@CBase3@:
 0      | -4
 1      | 24 (CDerived(CBase3+4)CBase1)

CDerive::$vftable@CBase1@:
        | -40
 0      | &CBase1::foo1

CDerive::func5 this adjustor: 0

vbi:       class  offset o.vbptr  o.vbte fVtorDisp
          CBase1      40       4       4 0

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

class CBase1 size(8):

+---

0 | {vfptr}

4 | a1

+---

CBase1::$vftable@:

| &CBase1_meta

| 0

0 | &CBase1::foo1

CBase1::foo1 this adjustor: 0

class CBase2 size(20):

+---

0 | {vfptr}

4 | {vbptr}

8 | a2

+---

+--- (virtual base CBase1)

12 | {vfptr}

16 | a1

+---

CBase2::$vftable@CBase2@:

| &CBase2_meta

| 0

0 | &CBase2::foo2

CBase2::$vbtable@:

0 | -4

1 | 8 (CBase2d(CBase2+4)CBase1)

CBase2::$vftable@CBase1@:

| -12

0 | &CBase1::foo1

CBase2::foo2 this adjustor: 0

vbi: class offset o.vbptr o.vbte fVtorDisp

CBase1 12 4 4 0

class CBase3 size(20):

+---

0 | {vfptr}

4 | {vbptr}

8 | a3

+---

+--- (virtual base CBase1)

12 | {vfptr}

16 | a1

+---

CBase3::$vftable@CBase3@:

| &CBase3_meta

| 0

0 | &CBase3::foo3

CBase3::$vbtable@:

0 | -4

1 | 8 (CBase3d(CBase3+4)CBase1)

CBase3::$vftable@CBase1@:

| -12

0 | &CBase1::foo1

CBase3::foo3 this adjustor: 0

vbi: class offset o.vbptr o.vbte fVtorDisp

CBase1 12 4 4 0

class CBase4 size(12):

+---

| +--- (base class CBase1)

0 | | {vfptr}

4 | | a1

| +---

8 | a4

+---

CBase4::$vftable@:

| &CBase4_meta

| 0

0 | &CBase1::foo1

1 | &CBase4::foo4

CBase4::foo4 this adjustor: 0

class CDerive size(48):

+---

| +--- (base class CBase2)

0 | | {vfptr}

4 | | {vbptr}

8 | | a2

| +---

| +--- (base class CBase3)

12 | | {vfptr}

16 | | {vbptr}

20 | | a3

| +---

| +--- (base class CBase4)

| | +--- (base class CBase1)

24 | | | {vfptr}

28 | | | a1

| | +---

32 | | a4

| +---

36 | b

+---

+--- (virtual base CBase1)

40 | {vfptr}

44 | a1

+---

CDerive::$vftable@CBase2@:

| &CDerive_meta

| 0

0 | &CBase2::foo2

1 | &CDerive::bar

CDerive::$vftable@CBase3@:

| -12

0 | &CBase3::foo3

CDerive::$vftable@:

| -24

0 | &CBase1::foo1

1 | &CBase4::foo4

CDerive::$vbtable@CBase2@:

0 | -4

1 | 36 (CDerived(CBase2+4)CBase1)

CDerive::$vbtable@CBase3@:

0 | -4

1 | 24 (CDerived(CBase3+4)CBase1)

CDerive::$vftable@CBase1@:

| -40

0 | &CBase1::foo1

CDerive::func5 this adjustor: 0

vbi: class offset o.vbptr o.vbte fVtorDisp

CBase1 40 4 4 0

The layout of CDerive class is so complicated. First, it has 3 base classes, 1 field and 1 virtual base section. The the first 2 base classes(CBase2, CBase3) have their vbptr pointed to the address of the virtual base section.

Building Apache Web Server with Visual Studio 2005

Posted on September 8, 2009 by gonwan — No Comments ↓

1. Source

a) apache 2.2.13: http://www.apache.org/dist/httpd/httpd-2.2.13-win32-src.zip
b) zlib 1.2.3 (for mod_deflate): http://www.zlib.net/zlib-1.2.3.tar.gz
c) openssl 0.9.8k (for mod_ssl): http://www.openssl.org/source/openssl-0.9.8k.tar.gz

2. Tools

a) ActivePerl: http://aspn.activestate.com/ASPN/Downloads/ActivePerl/
b) awk & patch tools: http://gnuwin32.sourceforge.net/packages.html

3. Steps

a) Setup Perl environment, add %Perl%/bin to %PATH%.
b) Also add awk, path tools to %PATH%.
c) Decompress the apache source code to %Apache%, D:Apache maybe.
d) Decompress the zlib into srclib subdirectory named zlib.
e) Decompress the openssl into srclib subdirectory named openssl.
f) Now the source tree should look like:

%Apache%
　　|
　　+ srclib
　　|   |
　　|   + apr
　　|   |
　　|   + openssl
　　|   |
　　|   + zlib
　　|   |
　　|   + ...
　　|
　　+ ...

%Apache%

　　+ srclib

　　| |

　　| + apr

　　| |

　　| + openssl

　　| |

　　| + zlib

　　| |

　　| + ...

　　+ ...

g) Patch zlib:
Download the patch from: http://www.apache.org/dist/httpd/binaries/win32/patches_applied/zlib-1.2.3-vc32-2005-rcver.patch. This patch contains minor fixes and enable generation of *.pdb files.
Copy the patch file into zlib subdirectory, swith to the directory in cmd.exe and run the command:

# patch -p0 < zlib-1.2.3-vc32-2005-rcver.patch

1	# patch -p0 < zlib-1.2.3-vc32-2005-rcver.patch

h) Patch openssl:
Download the patch from: http://www.apache.org/dist/httpd/binaries/win32/patches_applied/openssl-0.9.8k-vc32.patch. This patch will correct a link issue with zlib and enable generation of *.pdb files.
Copy the patch file into openssl subdirectory, swith to the directory in cmd.exe and run the command:

# patch -p0 < openssl-0.9.8k-vc32.patch

1	# patch -p0 < openssl-0.9.8k-vc32.patch

i) Build zlib:

# nmake -f win32Makefile.msc

1	# nmake -f win32Makefile.msc

j) Build openssl:

# perl Configure no-rc5 no-idea enable-mdc2 enable-zlib VC-WIN32 -I../zlib -L../zlib
# msdo_masm.bat
# nmake -f msntdll.mak

# perl Configure no-rc5 no-idea enable-mdc2 enable-zlib VC-WIN32 -I../zlib -L../zlib

# msdo_masm.bat

# nmake -f msntdll.mak

k) Patch Apache:
There’s an issue in the Makefile.win that build Apache in 2.2.13: https://issues.apache.org/bugzilla/show_bug.cgi?id=47659. Download the patch against branch into the %Apache% directory and run the command:

# patch -p0 < r799070_branch_makefile_fix.diff

1	# patch -p0 < r799070_branch_makefile_fix.diff

l) Build Apache using command line:
Now you can buid Apache by:

# nmake -f Makefile.win _apache[d|r]

1	# nmake -f Makefile.win _apache[d\|r]

And install Apache by:

# nmake -f Makefile.win install[d|r]

1	# nmake -f Makefile.win install[d\|r]

m) Build Apache using Visual Studio 2005:
There’s also a flaw in the *.vcproj conversion of *.dsp through Visual Studio 2005. We must run a perl script to fix it first:

# perl srclibaprbuildcvtdsp.pl -2005

1	# perl srclibaprbuildcvtdsp.pl -2005

Now, everything are OK. In Visual Studio 2005, open the Apache.dsw and convert all *.dsp files to *.vcproj files. Then build the project “BuildBin”. The project “InstallBin” project will distribute all the project in the Apache solution.

4. Debugging with Visual Studio 2005

It’s quite simple. After build the project “InstallBin”, open the property page of the “httpd” project. Switch to “Debugging” tab, change the Command value into your binary of installed directory. Now, add breakpoints, and press F5 to start your tracing or debugging.

5. Reference

a) Compiling Apache for Microsoft Windows
b) Apache 2.2.9 command line build with the Windows 2008 SDK