Compile-Time Embedding with #include

Although executing code at compile-time is not a new idea by any means, the Zig programming language popularised the term comptime , and has brought a lot of eyes back on the topic by making it easier to use than in other languages.

I personally haven’t taken the time to dabble in Zig yet - I’m already busy with Rust and OCaml - but curious as I am, I particularly wanted to know if files could be accessed at compile-time in the language I still know best: C++.

Turns out, the answer is… yes! Although, surprising to me, it’s not through some constexpr version of std::ifstream.

Some Context

Instead, we can turn to a new feature in C23 and C++26 : the #embed preprocessor directive. It allows file data to be embedded into a program at compile-time like so:

constexpr unsigned char foo =
{
#embed "foo.dat"
};

and even allows data to be read directly into structs!

struct T
{
    double a, b, c;
    struct { double e, f, g; } x;
    double h, i, j;
};
T x =
{
// well-formed if the directive produces nine or fewer values
#embed "s.dat"
};

There are also optional embed parameters for even more power; and all-in-all, this is a welcome alternative to encoding file data into scripts manually. Plus, it better allows compilers to make optimisations!

However, although #embed is a nice solution to embedding files for the timebeing, it is still a preprocessor, which we in C++ would rather avoid.

The “proper” C++ alternative is proposed via std::embed by the same author of #embed, JeanHeyd Meneide , and was actually first proposed back in 2018 - although perhaps this shouldn’t come as a surprise considering C++ is far from the easiest language to design and implement new features for.

Although there’s a lot more to be said about these features, I want to focus this article on something… a bit less useful (obsolete even) - another, rather fun way to embed data. A method without #embed, std::embed, special conversion tools, or manually copying bytes by hand!

The Power of the Preprocessor

The #include preprocessor is a relic of C’s past that C++ is only recently beginning to escape from, with popular compilers slowly but surely implementing C++20’s new modules system. After all C’s old include system is widely considered to be glorified copy-paste, leaving C++ programmers to struggle with header guards, namespaces, and build times manually.

However, because #include effectively just copies a file’s contents directly into a C++ file - regardless of the file’s type - we can exploit this to load more than just header files, and at compile-time!

But there is an obvious drawback: the content that is pasted into our C++ script has to be valid C++ code. Therefore, a reliable way to have this work for any arbitrary data is to include it as a C++ string literal. This is because C strings are nothing more than char arrays, where each char represents a single byte, thus acting as a suitable container for binary-encoded data.

For instance, the following program encodes some binary data:

#include <iostream>
#include <cstdint>

constexpr const char data[] = "\x64\0\0\0\0\0\x20\x40\x31\x6A\x6E\x74\x30\x31\0\0";

struct Character {
    std::uint32_t hp;
    float speed;
    char id[8];
};

int main()
{
    // Assume little endian.
    Character character = *reinterpret_cast<const Character*>(data);
    std::cout
        << "Character ID: "    << character.id    << "\n"
        << "Character HP: "    << character.hp    << "\n"
        << "Character Speed: " << character.speed << "\n";
}

Run this code!

Expected output:

Character ID: 1jnt01
Character HP: 100
Character Speed: 2.5

So, seeing as that works, let’s #include a file in place of the string:

constexpr const char data[] =
    #include "1jnt01.bin"
;

Fairly straightforward, although because there is no way (that I could find, anyway) to wrap the include in a string from the C++ file itself, we need to add it to the binary file directly. As a result, it’s important we format the data as a raw string literal, since then we don’t need to worry about random " characters in the middle of the data breaking our string.

More specifically, we should use a prefix that is highly unlikely to show up randomly elsewhere in the data, like R"CEmbd(. Notice that this is also 8 characters/bytes long, which is because we should maintain 64-bit alignment. This in turn gives a suffix of 7 bytes, )CEmbd", for a total of 15 bytes of boilerplate - not particularly expensive.

To better illustrate what I’m on about, let’s construct a simple example with a file hello.bin that contains the following content:

R"CEmbd(Hello, "world"!)CEmbd"

which we include in a program like so:

constexpr const char data[] =
    #include "hello.bin"
;

which, after the preprocessor includes the file, will result in:

constexpr const char data[] =
    R"CEmbd(Hello, "world"!)CEmbd"
;

which is equivalent to:

constexpr const char data[] = "Hello, \"world\"!";

And that’s it! We have successfully embedded data from a non-C++ file into our C++ code. This method should work for any binary data that includes the proper raw string wrapping, whether edited to include it or designed as some kind of special data format (wouldn’t that be interesting!).

If you want to see a more practical example (with better visual aid), then feel free to check out this video:

Why?

#embed is already making its way into C++26, and std::embed may eventually be added as well, rendering this method as obsolete for anyone on the latest standard. Not to mention, the major drawback of using #include to embed is that you’d need to edit existing files to add the raw string wrapper.

That being said, for anyone on a version before C++26 who doesn’t want to have to convert their file to C++ source code every time they swap or edit it, this is one such solution. And don’t get me wrong - this isn’t some revolutionary discovery. I wouldn’t be surprised if some developer somewhere at some point in history has ended up genuinely making use of this hack because of their needs. Nevertheless, for developers that do have access to #embed, it is clearly the better option. Ergonomics aside, the additional performance benefits make it a true no-brainer.

Nevertheless, this was as interesting idea to test out, and I’m honestly surprised it worked so smoothly in the first place. C++ is full of strange features (I imagine largely unintentionally so) and it’s fun to see how they can be used in creative, sometimes mildly useful ways. With more compile-time features coming soon^TM to C++, including reflection, I very much look forward to seeing what else becomes possible!

Some Context¶

The Power of the Preprocessor¶

Why?¶

Some Context

The Power of the Preprocessor

Why?