Using panic metadata to recover source code information from Rust binaries

2023/12/08

🔗 The contents of this article were originally published as a Mastodon thread at @cxiao@infosec.exchange.

Introduction #

If you’ve ever looked inside the strings of a Rust binary, you may have noticed that many of these strings are paths to Rust source files (.rs extension). These are used when printing diagnostic messages when the program panics, such as the following message:

thread 'main' panicked at 'oh no!', src\main.rs:314:5

The above message includes both a source file path src\main.rs, as well as the exact line and column in the source code where the panic occurred. All of this information is embedded in Rust binaries by default, and is recoverable statically!

Examining these can be useful in separating user from library code, as well as in understanding functionality. This is especially nice because Rust’s standard library and the majority of third-party Rust libraries are open-source, so you can use the panic strings to find the relevant location in the source code, and use that to aid in reversing.

Extracting panic metadata #

The type that contains this location information is core::panic::Location, which has the following definition. It consists of a string slice reference (&str), and two unsigned 32-bit integers (u32). The string slice reference &str represents a view of a string, and is made up of two components: a pointer, and a length.

pub struct core::panic::Location<'a> {
    file: &'a str,
    line: u32,
    col: u32,
}

Using Binary Ninja, let’s look inside a Rust binary at a place where one of the source file path strings is referenced. This is actually a core::panic::Location struct, embedded inside the binary.

$A screenshot of Binary Ninja showing a pointer to the string “library\std\src\sys\windows\mod.rs”, followed by a sequence of bytes with the value 22 00 00 00 00 00 00 00, a sequence of bytes with the value fc 00 00 00, and a sequence of bytes with the value 11 00 00 00.$

We can see the following pieces of data, and we can match them against the fields in core::panic::Location:

A pointer with the value 0x18003adb0, which is the address where the source file path string resides. This is the pointer component of the string slice file.
A sequence of bytes with the value 22 00 00 00 00 00 00 00, which is the little-endian 64-bit integer value 0x22. This is the length component of the string slice file. Note how the length of the path string library\std\src\sys\windows\mod.rs is 34 (0x22) bytes (when encoded with UTF-8, which is always the encoding used by &str).
A sequence of bytes with the value fc 00 00 00, which is the little-endian 32-bit integer value 0xfc. This is the unsigned 32-bit integer line.
A sequence of bytes with the value 11 00 00 00, which is the little-endian 32-bit integer value 0x11. This is the unsigned 32-bit integer col.

⚠️ Caution: Type layouts in compiled Rust binaries are not stable

In this case, the order of the fields in the compiled binary matched against the definition of core::panic::Location. However, you cannot rely on this always being the case; the Rust compiler is free to reorder these fields however it wants. Therefore, you must do the work of examining the data in your particular binary, and deducing from that what the layout of core::panic::Location in your binary is!

For more details on what guarantees the compiler makes (or more importantly, doesn’t make) about type layouts, see this section in the Type Layout chapter in The Rust Reference.

Now that we know the layout of core::panic::Location in our binary, let’s define a new type in Binary Ninja which we can apply to the binary. My type definition for this binary is as follows:

struct core::panic::Location
{
    struct RustStringSlice file
    {
        char* address;
        int64_t length;
    };
    uint32_t line;
    uint32_t col;
};

The screenshot shows this new type definition applied to the sequence of data, in a nice readable form which shows the line number and column number at a glance.

struct core::panic::Location panic_location_"library\std\src\sys\windows\mod.rs" = 
{
    struct RustStringSlice file = 
    {
        char* address = data_18003adb0 {"library\std\src\sys\windows\mod.rs"}
        int64_t length = 0x22
    }
    uint32_t line = 0xfc
    uint32_t col = 0x11
}

Finding the original library source code #

Because the Rust standard library is open-source, we can actually go read the source code at the place that this core::panic::Location data points to. The Rust compiler and standard library live in the same Git repository (rust-lang/rust on Github) and are released together; the last piece of information we need to find the source code here is the Git commit ID of the Rust compiler / standard library version that was used to create this binary.

This is something we can find by examining some of the other strings in this binary, which contain paths like this, which have the rustc commit ID embedded inside them:

/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225\\library\\alloc\\src\\vec\\mod.rs

We can now look up the exact location in the source code: rust-lang/rust commit 8ede3aae, File library\std\src\sys\windows\mod.rs, Line 252 (0xfc), Column 17 (0x11):

fn fill_utf16_buf<F1, F2, T>(mut f1: F1, f2: F2) -> crate::io::Result<T> {
[...]
            if k == n && c::GetLastError() == c::ERROR_INSUFFICIENT_BUFFER {
                n = n.saturating_mul(2).min(c::DWORD::MAX as usize);
            } else if k > n {
                n = k;
            } else if k == n {
                // It is impossible to reach this point.
                // On success, k is the returned string length excluding the null.
                // On failure, k is the required buffer length including the null.
                // Therefore k never equals n.
                unreachable!(); // ⬅️ This is the location from our binary!
            } else {
[...]

This is indeed a place where the code can panic! The unreachable!() macro is used to mark locations in the code that should never be reached, in cases where reachability cannot be automatically determined by the compiler. The documentation for unreachable!() says:

This will always panic! because unreachable! is just a shorthand for panic! with a fixed, specific message.

Compare the above snippet of source code with the decompiler’s output, at one of the locations where this particular core::panic::Location struct is referenced. We can see the following arguments all being passed into the function sub_18001ee90, which is the entry point to the panic handling logic (notice how the branch where that function is called is also noreturn).

The fixed error message for the unreachable!() macro (internal error: entered unreachable code)
The length of that error message string (0x28 characters)
The address of that core::panic::Location struct.

} else {
    if (GetLastError() != ERROR_INSUFFICIENT_BUFFER) {
        sub_18001ee90("internal error: entered unreachable code", 0x28, &panic_location_"library\std\src\sys\windows\mod.rs")
        noreturn
    }
    uint64_t n_6 = n * 2
    if (n_6 u>= 0xffffffff) {
        n_6 = 0xffffffff
    }
    n_2 = n_6
[...]

The information passed in the arguments is used to construct the message that the program emits when it panics, which in this case will look something like the following:

thread 'main' panicked at 'internal error: entered unreachable code', library\std\src\sys\windows\mod.rs:252:17

We can also see several other features which appear in the original source code at this location:

The check of the GetLastError() result against the ERROR_INSUFFICIENT_BUFFER error code.
The saturating multiplication of the variable n.

Rust binaries without panic metadata #

Note that this location information is only embedded into Rust binaries if the developer uses the default panic behaviour, which unwinds the stack, cleans up memory, and collects information to show in the panic message and in backtraces. You can read more about the details of the default panic implementation in the Rust standard library in the Rust Compiler Dev book.

Developers can trivially strip the location information by just putting panic = 'abort' when specifying the build profile in their Cargo.toml build configuration file; this will cause the program to immediately abort on panic instead, without taking any further actions.

Developers can also provide their own custom panic handlers - you can learn more about this in my previous post about the panic handlers used by the Rust code in the Windows kernel, here: https://infosec.exchange/@cxiao/110500609127711155

Of course, a lot of Rust malware out there doesn’t even bother taking the more basic step of stripping symbols, much less this panic information…

Next steps #

You can take this a step further and write a script that automatically extracts all of this panic location metadata, in your reverse engineering tool of choice. I wrote a quick Binary Ninja script that goes and extracts all panic location metadata, finds the locations in the code where they are referenced, and then annotates each location with a tag that displays the extracted source file information. Here are the tags generated by my script, run on a piece of Rust malware targeting macOS systems, RustBucket.

That’s all! I will release the metadata extraction script for this at some point.