How to create a lightweight C code sandbox?

Since the C standard is much too broad to be allowed, you would need to go the other way around: specify the minimum subset of C which you need, and try to implement that. Even ANSI C is already too complicated and allows unwanted behaviour.

The aspect of C which is most problematic are the pointers: the C language requires pointer arithmitic, and those are not checked. For example:

char a[100];
printf("%p %p\n", a[10], 10[a]);

will both print the same address. Since a[10] == 10[a] == *(10 + a) == *(a + 10).

All these pointer accesses cannot be checked at compile time. That’s the same complexity as asking the compiler for ‘all bugs in a program’ which would require solving the halting problem.

Since you want this function to be able to run in the same process (potentially in a different thread) you share memory between your application and the ‘safe’ module since that’s the whole point of having a thread: share data for faster access. However, this also means that both threads can read and write the same memory.

And since you cannot prove compile time where pointers end up, you have to do that at runtime. Which means that code like ‘a[10]’ has to be translated to something like ‘get_byte(a + 10)’ at which point I wouldn’t call it C anymore.

Google Native Client

So if that’s true, how does google do it then? Well, in contrast to the requirements here (cross-platform (including embedded systems)), Google concentrates on x86, which has in additional to paging with page protections also segment registers. Which allows it to create a sandbox where another thread does not share the same memory in the same way: the sandbox is by segmentation limited to changing only its own memory range. Furthermore:

  • a list of safe x86 assembly constructs is assembled
  • gcc is changed to emit those safe constructs
  • this list is constructed in a way that is verifiable.
  • after loading a module, this verification is done

So this is platform specific and is not a ‘simple’ solution, although a working one. Read more at their research paper.

Conclusion

So whatever route you go, you need to start out with something new which is verifiable and
only then you can start by adapting an existing a compiler or generating a new one. However, trying to mimic ANSI C requires one to think about the pointer problem. Google modelled their sandbox not on ANSI C but on a subset of x86, which allowed them to use existing compilers to a great extend with the disadvantage of being tied to x86.

Leave a Comment