Note: this paper is a DRAFT. This is a major revision of a previously posted draft, which corrects many errors and introduces some better techniques, and explains them better. Caveat lector.

NEW: The techniques below have been used to implement a fully-conforming C-subset library.


DRAFT 1998-10-04

Standard C++ C-Library Subset Header Strategy

Copyright 1997,1998 by Nathan C. Myers All Rights Reserved.

Abstract:
This is a report intended for Standard C++ implementers. It describes an approach to producing the headers required for the C subset of C++ when (for any reason) a set of underlying C headers must be used to provide most of the necessary declarations.

The C++ standard places requirements on the contents of the C library and the C headers, as seen from C++ programs, that differ from the C Standard. These requirements can be tricky to meet under an external constraint that underlying C headers must be used as-is, without change or duplication, and moreso under the additional constraint that programs must retain access to non-standard names defined in those headers. This paper describes an architecture that satisfies all the Draft requirements with a minimum of duplication and extraneous apparatus. It is meant for those who, for whatever reason, are unable to implement the C library subset directly in C++.

This approach depends on some "inside information" about the set of standard and non-standard files that can be included by the standard "#include <>" directives. This means that fully generic C++ headers, independent of the underlying C headers, are impossible; however, the dependency is far less than if the contents were duplicated, and all the information needed can be obtained by inspection of the target C headers, either manually or via a script.

A key complication in any approach to this problem arises from Koenig Lookup effects. Briefly, a using-declaration for a type defined in another namespace is not equivalent to an actual type definition. Consider:


  namespace A {  // some namespace
    struct U {};
  }
  namespace std {
    struct T {};
    void f(const T&);
    using A::U;  // or typedef A::U U;
    void g(const U&);
  }
  int main() {
    f(std::T());  // OK
    g(std::U());  // error
  }

The name 'g' is looked up only in namespace A and the global scope. This implies that any struct definitions specified to appear in namespace std, and any functions that depend on them, must actually appear in the std namespace, and not simply be aliased there.

Requirements

The following is a list of the requirements assumed for the design of the solution presented. The following points may require some reflection to understand completely:

Architecture

Assume that there is already a set of C headers <NAME.h> in a known directory. These headers are "C++ clean", in that they conform to the common subset of the core ANSI C and C++ language syntax specifications. (I.e., they are not "K&R" C declarations.) This is the common (though not universal) status quo throughout the industry.

The C++ implementation must provide its own directory or directories to be searched in place of, or at least before, the C headers directory. For each of the <NAME.h> forms found in the Standard C library it must place two "shadow" headers in this directory: "NAME.h" and "cNAME", which in turn include the actual C headers. In addition, it must provide a shadow header <NAME.h> for each non-standard file (transitively) included by any of the C headers. (These non-standard headers can be identified and generated by a configuration script. Such a script has been written.)

These shadow headers provide a place to correct the differences between the C and C++ definitions of the C library names. Corrections include undefining C macros, wrapping declarations in namespaces, promoting names to global scope, as well as various bits of appalling macro surgery for special cases such as va_start, strchr, qsort and FILE.

A noticeable but strictly limited amount of preprocessor apparatus is needed to nest declarations in a namespace without causing sub-included declarations to be further nested in sub-namespaces. Unrestrained use of sub-includes inside ``extern "C"'' blocks in the C headers can interfere with this.

The prototypical headers look like this:



  /* inherited C header foo.h, in C header directory */

  #ifndef _INCLUDED_FOO_  /* ordinary include guard */
  #define _INCLUDED_FOO_

  extern int foo_this(const char*); /* ordinary C */
  extern int foo_that(const char*);

  #endif /* _INCLUDED_FOO_ */



  // C++ header <cfoo>

  #ifndef _INCLUDED_CPP_CFOO_  /* ordinary include guard */
  #define _INCLUDED_CPP_CFOO_

  namespace _C_Swamp {
    extern "C" {
  #   define _IN_C_SWAMP_
  #   include "/usr/include/foo.h"  /* or #include_next <foo.h> */
    }
    namespace _C_Shadow { }  // placeholder
  } // close namespace ::_C_Swamp::

  # undef foo_this
  # undef foo_that

  namespace std {

    // Adopt C names into std::
    using ::_C_Swamp::foo_this;
    using ::_C_Swamp::foo_that;
    // ... and others

  } // close namespace std::

  #undef  _IN_C_SWAMP_

  #endif /* _INCLUDED_CPP_CFOO_ */


  // C++ "shadow" header <foo.h>

  #ifndef _INCLUDED_CPP_FOO_H_
  # undef _SHADOW_NAME
  # define _SHADOW_NAME <cfoo> /* substitute here */
  # include <generic_shadow.h>
  # undef _SHADOW_NAME

  #ifndef _IN_C_SWAMP_ 
    using std::foo_this;
    using std::foo_that;
  #define _INCLUDED_CPP_FOO_H_
  #endif

  #endif /* _INCLUDED_CPP_FOO_H_ */


Where the file <generic_shadow.h> used for all the standard ".h" headers looks like this:


  // <generic_shadow.h>

  #ifdef _IN_C_SWAMP_  /* sub-included by a C header */

      // get out of the "swamp"
    } // close extern "C"
  }   // close namespace _C_Swamp::

  # undef _IN_C_SWAMP_

  # include _SHADOW_NAME

  // dive back into the "swamp"
  namespace _C_Swamp_ {
    extern "C" {
  #   define _IN_C_SWAMP_

  #else /* not _IN_C_SWAMP_:  directly included by user program */

  # include _SHADOW_NAME

    // expose global C names, including non-standard ones, but shadow
    //   some names and types with the std:: C++ version.
    using namespace ::_C_Swamp::_C_Shadow;

  #endif /* _IN_C_SWAMP_ */


Finally, non-standard headers sub-included by standard headers need to be shadowed; the pattern is like this, for the common example sys/types.h:

  // shadow header sys/types.h
  #ifndef  _INCLUDED_CPP_SYS_TYPES_H_
  
  # ifdef _IN_C_SWAMP_  /* sub-included by a C header */
  #  include </usr/include/sys/types.h>
  # else
  
      namespace _C_Swamp { namespace _C_Shadow { } }
      using namespace ::_C_Swamp::_C_Shadow;
      namespace _C_Swamp_ {
        extern "C" {
  #       define _IN_C_SWAMP_
  #       include </usr/include/sys/types.h>
        } // close extern "C"
      }   // close namespace _C_Swamp::
  
  # endif /* _IN_C_SWAMP_ */
  #endif /* _INCLUDED_CPP_SYS_TYPES_H_ */
  

It may not be easy to produce and maintain the list of files which require this treatment; #if directives interfere with automating the process. However, you can be conservative by providing a shadow for anything that might be included by a standard header. I have a shellscript which produces such a list, available on request. If you can arrange that the files can only be found via the shadow, you can at least detect reliably when you have missed one.

Special Cases

The C library defines four kinds of names: Each requires a slightly different treatment.

[Required macros:]
The C headers define many macros that resolve to simple literal constants usable in "#if" conditionals. All of these have ALL_UPPER_CASE names, and require no special treatment unless the value for C++ must be different than for C. They define others which resolve to a regular "constant expression" or simply an "expression". The expansions may refer to names defined only the namespace where the header is being included (::_C_Swamp in the examples), so the expression must be captured at that point so the macro can be redefined to refer to a correctly- scoped value.

The headers also define a variety of function macros. These are macros which require arguments, but do not necessarily resolve to a function call or expression. Examples include assert() and va_start(). These must remain macros in C++ headers, and so may also be left alone. In the case of the va_* macros, however, reimplementation using inline template functions underneath may result in better error messages for users.

[Functions:]
Any or all of the names specified as regular functions may actually be defined as macros in the C headers, though in every case a function of the same name must be provided with the same semantics, accessible by "undefining" the name or mentioning it in parentheses. In C++ any name not actually specified as a macro must not be a macro, so all the C functions must be handled specially. In some cases the name is a macro for performance reasons, so its semantics must be captured in the body of an inline function, which then replaces the C function. In most cases, however, it is sufficient to #undef any macro present and expose the regular C definition. Here is an example using atoi. This text would appear in the <cstdlib> header.


  #ifndef _INCLUDED_CPP_CSTDLIB_
  #define _INCLUDED_CPP_CSTDLIB_
  namespace _C_Swamp {
    extern "C" {
  #   define _IN_C_SWAMP_
  #   include "/usr/include/stdlib.h"  /* or #include_next <stdlib.h> */
    }
  } // _C_Swamp::
  # undef atoi

  namespace std {
    // eliminate forbidden macros
    // ...

    // import declarations to C++ std:: namespace:
    using ::_C_Swamp::atoi;
    // ...

  } // std::
  #endif /* _INCLUDED_CPP_CFOO_ */

If a function is defined inline to replace a macro, the definition must actually replace the C header version to prevent conflicts. Another reason that some functions in the C headers must be replaced is that in C++ they have a new interface; examples include strchr() and qsort(). Others must be replaced because they must be overloaded with other functions of the same name, and thus cannot be extern "C". Finally, some must be replaced because a type used in their interface is wrong, or would be defined in the wrong namespace, causing name lookup anomalies. Examples include functions in <time.h> and <wchar.h>.

Here is an example for the <string.h> function strchr():

  namespace _C_Swamp {
    extern "C" {
  # include "/usr/include/string.h"  /* or #include_next <string.h> */
    }
  }
  # undef strchr

  namespace std {
    // the C++ definitions
    inline char const* strchr(const char* s, int c)
      { return ::_C_Swamp::strchr(s,c); }
    inline char*       strchr(      char* s, int c)
      { return ::_C_Swamp::strchr(s,c); }
    // actually there are much better definitions possible,
    //   using template techniques.
  }
  namespace _C_Swamp {
    namespace _C_Shadow {
      using std::strchr;  // after "using namespace _C_Swamp::_C_Shadow;",
                          //   finds the std:: version.
    }
  }

When the operation of a macro is not obvious (unlike in the case of strchr) and for performance reasons it cannot simply be #undef'd, it may be necessary to capture the definition in an inline function before #undef'ing it. Note that the macro is likely to contain references to nonstandard names visible only in the same namespace.

  namespace _C_Swamp {
    extern "C" {
  #   include "/usr/include/stdio.h"  /* or #include_next <stdio.h> */
    }
    // capture macro definition
    inline int _CPP_getchar() { return getchar(); }
  }
  #undef getchar

  namespace std {
    // the C++ definition
    inline int getchar() { return _C_Swamp::_CPP_getchar(); }
  }
  namespace _C_Swamp {
    namespace _C_Shadow {
      using std::getchar;
    }
  }

Structs defined in C headers must be replaced with the same name in the namespace std, as discussed below. The functions which use them must be declared to take the std:: member type, not the C header type, so that Koenig lookup will find them. This typically requires defining inline forwarding functions in the std::to call the C implementation, and casts to convert pointers to the types involved.

[Expression macros:]
The C standard defines macro names that resemble global variables, such as stdin and errno. In C++ (unlike C) errno is required to be a macro. These might be defined to evaluate to an expression involving namespace-scoped names. For example, stdin may resolve to "&__iob[0]". In C++, it must become a reference to the same value, and the definition must appear in a scope where "__iob" is visible. The errno variable often resolves to a function call. These expressions can generally be captured in a no-argument inline function.

[Types:]
The C library defines types. Some of these, such as size_t, are aliases for built-in types. These aliases may be macros or typedefs. Those that are or may be macros must be changed to typedefs:


  namespace _C_Swamp {
    ...
  #ifdef size_t
    typedef size_t _Cplusplus_size_t;
  }
  #undef size_t
  namespace std {
    typedef _C_Swamp::_Cplusplus_size_t size_t;
  }
  #endif

Names actually specified as equivalent to built-in types might actually be macros, and if so must be converted to typedefs first and then imported to the std:: namespace with a using-declaration or another typedef.

Some names are allowed to be typedefs (or macros) either for built-in types or for structs, at the implementors' option, such as fpos_t. Since a conforming program cannot depend on their being a struct, they need not be (re-)defined in namespace std, but might best be so anyhow, to reduce variability between platforms.

Types like struct tm and FILE, because of Koenig lookup, must actually be defined in the std namespace; it is not sufficient to bring them in via (e.g.) "using _C_Swamp::tm". Instead, corresponding types must be defined in namespace std. Note that even struct tags like "tm" might be macros in the C header, and thus must be #undef'ed.


  // C++ header <ctime>

  #ifndef _INCLUDED_CPP_CTIME_
  #define _INCLUDED_CPP_CTIME_

  namespace _C_Swamp {
    extern "C" {
  #   include "/usr/include/time.h"  /* or #include_next <time.h> */
    }
    namespace _Hidden {
      typedef tm _CLib_tm
    }
  # undef asctime
  # undef tm
    namespace _Hidden {
      extern "C" char* asctime(const _Hidden::_CLib_tm*);
    }
  }
  namespace std {

    // the actual replacement std::tm
    struct tm : ::_C_Swamp::_Hidden::_CLib_tm { };

    inline char* asctime(const std::tm* tmp)
      { return ::_C_Swamp::_Hidden::asctime(tmp); }

  }
  namespace _C_Swamp {
    namespace _C_Shadow {
      using ::std::asctime;  // for possible access via "using namespace".
    }
  }
  #endif

Note that functions which return a tm* must be shadowed with a function that converts (via reinterpret_cast) the C struct tm* to a C++ tm*. This conversion is, strictly speaking, undefined, but works sensibly on most compilers. Where it doesn't some other hack is needed.

Complications

While this approach solves all "normal" header problems, pathological cases may remain. For example, some header filesets contain conditional definitions of some types, such as size_t. It is fairly common to define a macro for "errno" that (conditionally) resolves to an lvalue expression rather than a global variable; this might require special handling. The <stdarg.h> header might best be implemented with templates. Some traditionally open-coded functions (such as memcpy) may also require special handling (perhaps with templates) as might any standard functions with non-standard interfaces.

The most annoying problem headers are those which, when included with some non-standard macro turned on, produce different results than for a normal include. These require non-portable "hacks" to work around.

The techniques described here are (more-or-less) legal C++, but a preprocessor extension strongly recommended for implementors is "#include_next", as is found in GNU cccp; it begins searching for the named header after the directory where the file which contains the directive was found, and allows a shadow file "foo.h" to sub-include another "foo.h" without naming the full path to it.

Conclusion

While it is no picnic adapting a C library header fileset to work completely and conformantly with a C++ compiler, it is far less difficult than some have assumed. Most of the necessary adaptations are "boilerplate", and while messy are not particularly error-prone. Still, anyone who releases the C portion of their C++ library written actually in C++ will probably achieve a more satisfactory result.

Acknowledgements

Thanks to Steve Clamage, Bjarne Stroustrup, Erwin Unruh, and Sandra Whitman for generosity with ideas and experience that enabled me to this article. Any erors remain my own. Corrections and ideas are most welcome, and will make a difference. Implementers who use the techniques described here are asked (politely) to acknowledge this paper in at least one header. You may reference the URL: <http://www.cantrip.org/cheaders.html>.

Return to the Cantrip Corpus. Email remarks about this page to ncm-nospam@cantrip.org.
© Copyright 1997,1998 by Nathan C. Myers. All Rights Reserved.

URL: <http://www.cantrip.org/cheaders.html>