Using panic metadata to recover source code information from Rust binaries
2023/12/08
đź”— The contents of this article were originally published as
a Mastodon thread at @cxiao@infosec.exchange
.
Introduction #
If you’ve ever looked inside the strings of a Rust binary, you may have noticed that many of these strings are paths to Rust source files (.rs
extension). These are used when printing diagnostic messages when the program panics, such as the following message:
thread 'main' panicked at 'oh no!', src\main.rs:314:5
The above message includes both a source file path src\main.rs
, as well as the exact line and column in the source code where the panic occurred. All of this information is embedded in Rust binaries by default, and is recoverable statically!
Examining these can be useful in separating user from library code, as well as in understanding functionality. This is especially nice because Rust’s standard library and the majority of third-party Rust libraries are open-source, so you can use the panic strings to find the relevant location in the source code, and use that to aid in reversing.
Extracting panic metadata #
The
type that contains this location information is core::panic::Location
, which has the following definition. It consists of a string slice reference (&str
), and two unsigned 32-bit integers (u32
). The string slice reference &str
represents a view of a string, and is made up of two components: a pointer, and a length.
pub struct core::panic::Location<'a> {
file: &'a str,
line: u32,
col: u32,
}
Using Binary Ninja, let’s look inside a Rust binary at a place where one of the source file path strings is referenced. This is actually a core::panic::Location
struct, embedded inside the binary.
We can see the following pieces of data, and we can match them against the fields in core::panic::Location
:
- A pointer with the value
0x18003adb0
, which is the address where the source file path string resides. This is the pointer component of the string slicefile
. - A sequence of bytes with the value
22 00 00 00 00 00 00 00
, which is the little-endian 64-bit integer value0x22
. This is the length component of the string slicefile
. Note how the length of the path stringlibrary\std\src\sys\windows\mod.rs
is 34 (0x22) bytes (when encoded with UTF-8, which is always the encoding used by&str
). - A sequence of bytes with the value
fc 00 00 00
, which is the little-endian 32-bit integer value0xfc
. This is the unsigned 32-bit integerline
. - A sequence of bytes with the value
11 00 00 00
, which is the little-endian 32-bit integer value0x11
. This is the unsigned 32-bit integercol
.
⚠️ Caution: Type layouts in compiled Rust binaries are not stable
In this case, the order of the fields in the compiled binary matched against the definition of
core::panic::Location
. However, you cannot rely on this always being the case; the Rust compiler is free to reorder these fields however it wants. Therefore, you must do the work of examining the data in your particular binary, and deducing from that what the layout ofcore::panic::Location
in your binary is!For more details on what guarantees the compiler makes (or more importantly, doesn’t make) about type layouts, see this section in the Type Layout chapter in The Rust Reference.
Now that we know the layout of core::panic::Location
in our binary, let’s define a new type in Binary Ninja which we can apply to the binary. My type definition for this binary is as follows:
struct core::panic::Location
{
struct RustStringSlice file
{
char* address;
int64_t length;
};
uint32_t line;
uint32_t col;
};
The screenshot shows this new type definition applied to the sequence of data, in a nice readable form which shows the line number and column number at a glance.
struct core::panic::Location panic_location_"library\std\src\sys\windows\mod.rs" =
{
struct RustStringSlice file =
{
char* address = data_18003adb0 {"library\std\src\sys\windows\mod.rs"}
int64_t length = 0x22
}
uint32_t line = 0xfc
uint32_t col = 0x11
}
Finding the original library source code #
Because the Rust standard library is open-source, we can actually go read the source code at the place that this core::panic::Location
data points to. The Rust compiler and standard library live in the same
Git repository (rust-lang/rust
on Github) and are released together; the last piece of information we need to find the source code here is the Git commit ID of the Rust compiler / standard library version that was used to create this binary.
This is something we can find by examining some of the other strings in this binary, which contain paths like this, which have the rustc
commit ID embedded inside them:
/rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225\\library\\alloc\\src\\vec\\mod.rs
We can now look up the exact location in the source code:
rust-lang/rust
commit 8ede3aae
, File library\std\src\sys\windows\mod.rs
, Line 252 (0xfc), Column 17 (0x11):
fn fill_utf16_buf<F1, F2, T>(mut f1: F1, f2: F2) -> crate::io::Result<T> {
[...]
if k == n && c::GetLastError() == c::ERROR_INSUFFICIENT_BUFFER {
n = n.saturating_mul(2).min(c::DWORD::MAX as usize);
} else if k > n {
n = k;
} else if k == n {
// It is impossible to reach this point.
// On success, k is the returned string length excluding the null.
// On failure, k is the required buffer length including the null.
// Therefore k never equals n.
unreachable!(); // ⬅️ This is the location from our binary!
} else {
[...]
This is indeed a place where the code can panic! The unreachable!()
macro is used to mark locations in the code that should never be reached, in cases where reachability cannot be automatically determined by the compiler.
The documentation for unreachable!()
says:
This will alwaysÂ
panic!
 becauseÂunreachable!
 is just a shorthand forÂpanic!
 with a fixed, specific message.
Compare the above snippet of source code with the decompiler’s output, at one of the locations where this particular core::panic::Location
struct is referenced. We can see the following arguments all being passed into the function sub_18001ee90
, which is the entry point to the panic handling logic (notice how the branch where that function is called is also noreturn
).
- The fixed error message for the
unreachable!()
macro (internal error: entered unreachable code
) - The length of that error message string (0x28 characters)
- The address of that
core::panic::Location
struct.
} else {
if (GetLastError() != ERROR_INSUFFICIENT_BUFFER) {
sub_18001ee90("internal error: entered unreachable code", 0x28, &panic_location_"library\std\src\sys\windows\mod.rs")
noreturn
}
uint64_t n_6 = n * 2
if (n_6 u>= 0xffffffff) {
n_6 = 0xffffffff
}
n_2 = n_6
[...]
The information passed in the arguments is used to construct the message that the program emits when it panics, which in this case will look something like the following:
thread 'main' panicked at 'internal error: entered unreachable code', library\std\src\sys\windows\mod.rs:252:17
We can also see several other features which appear in the original source code at this location:
- The check of the
GetLastError()
result against theERROR_INSUFFICIENT_BUFFER
error code. - The saturating multiplication of the variable
n
.
Rust binaries without panic metadata #
Note that this location information is only embedded into Rust binaries if the developer uses the default panic behaviour, which unwinds the stack, cleans up memory, and collects information to show in the panic message and in backtraces. You can read more about the details of the default panic implementation in the Rust standard library in the Rust Compiler Dev book.
Developers can trivially strip the location information by just putting panic = 'abort'
when specifying the build profile in their Cargo.toml
build configuration file; this will cause the program to immediately abort on panic instead, without taking any further actions.
Developers can also provide their own custom panic handlers - you can learn more about this in my previous post about the panic handlers used by the Rust code in the Windows kernel, here: https://infosec.exchange/@cxiao/110500609127711155
Of course, a lot of Rust malware out there doesn’t even bother taking the more basic step of stripping symbols, much less this panic information…
Next steps #
You can take this a step further and write a script that automatically extracts all of this panic location metadata, in your reverse engineering tool of choice. I wrote a quick Binary Ninja script that goes and extracts all panic location metadata, finds the locations in the code where they are referenced, and then annotates each location with a tag that displays the extracted source file information. Here are the tags generated by my script, run on a piece of Rust malware targeting macOS systems, RustBucket.
That’s all! I will release the metadata extraction script for this at some point.