This is a gcc plugin written to ease porting C software to Cosmopolitan Libc.
The general idea is to reduce manually changing the source code of any external
software when attempting to build it with Cosmopolitan Libc -- ideally, you
would need to customize only the build process, but make zero changes to the
source code.
Licensed under ISC License.
I ended up writing a
gccpatch with the code from this plugin. My patch is also licensed under ISC. The patchedgccdoes a lot less work than this plugin (and avoids almost all of the counterexamples) because I avoid using the macro hack and just patch the AST before the parser complains. (Per my current understanding,gccdoes not provide plugin access to the program AST during its construction in the parsing process, which is why I wrote the patch instead. However, plugins provide a sufficiently large surface to figure out what a problem requires before diving into the depths ofgccinternals.
Note: this plugin has not yet been fully tested -- please check the compiled
.o file, generated ASM, or errors in your test suite to confirm the
correctness of the transformations. When in doubt, transform the code manually.
See the Counterexamples section for more details.
- Install the necessary
gccplugin headers (you needgccto be able to use its plugin architecture) - Clone this repository and run
make - Create a small shell script that uses
/usr/bin/gccwith this plugin (ie add-O2 -fplugin=/location/of/portcosmo.so -include /location/of/tmpconst.h) and use that asCCwhen building software.
For building software with Cosmopolitan Libc+this plugin, you will need to use
this branch where
I've been trying to ensure I change as little of Cosmopolitan Libc as possible
in order to make this work. And it does work! This
branch of CPython
3.11.0rc1 builds with Cosmopolitan Libc, and I didn't have to modify any
switch statements.
Cosmopolitan Libc contains system-level constants (for example, errno constants
like SIGABRT) defined as follows:
extern const int SIGABRT;
#define SIGABRT ACTUALLY(SIGABRT)This plugin activates upon finding a ACTUALLY( (note the space) within a
defined macro, and (re-)defines ACTUALLY as follows:
#define ACTUALLY(X) __tmpcosmo_##Xand records the location in the source file every time a macro containing
ACTUALLY( is used. In tmpconst.h, there is a huge list of constants starting
with the __tmpcosmo_ prefix.
After every (valid) macro usage has been recorded, this plugin walks through the
entire AST of the source file to find each usage, and substitutes the
appropriate extern variable name in the location where the macro was used. It
does so via the below two components:
ifswitch-- rearrangeswitchstatements if the case labels would otherwise raise thecase label is not constanterror.initstruct-- update definitions of variables,structs, and arrays if their initialization would otherwise raise theinitializer element is not constanterror (can handlestaticand global variables).
The plugin errors out if the ACTUALLY macro was improperly used, or if it is
unable to confirm all the macro usage records were substituted successfully. At
the end of compilation, the plugin provides a note of how many substitutions
were made when compiling the file.
There might be other ways to check for such incorrect statements, but any
method to rearrange these switch statements would need to incorporate a C
preprocessor and parser, and any source code transformations would need to
remain valid even if ifdefs are mixed within the C source code.
Mixing ifdefs is a quite common occurrence in switch statements -- often
times you see handlers for errno having a bunch of ifdefs (and
fallthroughs!) to allow for different kinds of errno values based on the
operating system.
The best place to handle these statements is after the preprocessor has done
its work, so that the focus can be solely on the AST. gcc comes in with a
battle-tested C preprocessor, parser, decent optimizations, and plugin support,
so why not a gcc plugin?
While this plugin can traverse through the code AST and modify almost all uses of the macro, there are a few cases where it may not be able to do so:
- Using
gcc -O0i.e. if you disable all optimizations, thengccwill not perform constant-folding and error out withcase label is not constantwith some source code like
case __tmpcosmo_SIGABRT:
This can likely be fixed, it's just a matter of enabling the right optimization
flag in gcc. Better yet: we can figure out how to use __tmpcosmo_SIGABRT as
a macro that can be defined during runtime, instead of a static const int in
tmpconst.h, which would circumvent this problem. Edit: I ended up
patching gcc with the code from this plugin, so this problem is avoided.
caselabels with ranges, something like:
case SIGABRT .. 0:Yes, I know it's possible to make this work, but I haven't seen any real-life C code that does something like this yet.
- constant-folding algebra:
static const int e = SIGABRT;
/* few lines later... */
func(e);Under gcc's optimization flags, e will be constant-folded, and its value
will be used everywhere instead. The plugin has not recorded all the locations
where e could have been used, so it just bails out when seeing a declaration
like this. Edit: I ended up
patching gcc with the code from this plugin, so this problem is avoided.
int x = SIGABRT+42;
if(j < SIGABRT+42)
case SIGABRT+42:
for(int i=SIGABRT-1; i < 0; ++i)Under gcc's optimization flags, all of the above statements will have been
constant-folded, and even though the plugins has recorded where the macro was
used, it does not know what expression was simplified, so it bails out if it was
unable to substitute a constant in any expression. Edit: I ended up
patching gcc with the code from this plugin, so this problem is avoided.
-
magical things like Duff's device -- I don't know if any C code uses Duff's device with
SIGABRT, would be fun to find out. Edit: I ended up patchinggccwith the code from this plugin, so this problem is avoided. -
substituting the incorrect location due to a
badpick of constant: Suppose we have some code which uses a lot of integer constants, and some of them are on the same line as when one of our macro substitutions was recorded, then the plugin will likely substitute the constant at the wrong location. See the below example:
/* suppose tmpconst.h has the below value */
static int __tmpcosmo_SIGABRT = -961;
/* and your code has something like */
func(-961, SIGABRT);
/* the macro will modify it to */
func(-961, __tmpcosmo_SIGABRT);
/* and record the location of the modification */
/* but gcc will constant-fold it to */
func(-961, -961);
/* the AST will be INCORRECTLY transformed into */
func(SIGABRT, -961);
/* whereas the second param should actually be transformed */
func(-961, SIGABRT);It might be possible to fix this via a hash-table of some sort, because we can just check the function call/expression at a marked location to confirm that it does not have the constant we just substituted anymore(ie our substitution actually fixed the macro use and some other constant in the source code).
This can also be fixed if we had more precise location checking, at present, if your source code has a function call like
func(27, __tmpcosmo_SIGABRT);In terms of line information, we only know that the CALL_EXPR with func
starts on line 42 (and also its end sometimes) -- we do not know the location of
the the individual parameters 27 and -961, which would be useful to match
with the location we have saved from when the macro was used.
Edit: I ended up patching
gccwith the code from this plugin, so this problem is avoided in most situations (I haven't found an example of this problem in real-life code yet). It can still happen if you're initializing a struct or writing aswitchcase with the clashing values, but my current belief is that the latter is quite rare (aswitchwhose options include both errno constants and other unrelated negative values), and the former is still uncommon, and would be caught by a simple test. Either way, the fix is the same as always: use different constants, or do the AST patching by hand.
- The
gccInternals documentation -- this document, along with thegccheaders for plugin writers, provides everything you need to know about what plugins can do. - History of C
-
C99
switchconstraints and semantics -- see page 92,$\S 6.8.4.2$ -
C11
switchconstraints and semantics -- see page 149,$\S 6.8.4.2$ -
C17 final draft
switchconstraints and semantics -- see page 108,$\S 6.8.4.2$ - Assert Rewriting in
gcc - An Introduction to
gccandgcc's plugins - LWN: Randomizing Structure Layout
- Source code of the
randstructplugin used in the Linux kernel -- this is to understand how much can be done at thegccplugin level. -
gccOpenMP Runtime Wiki -- need to understand howpragmas can be used to alert a plugin