Reverse Engineering Techniques - Detect open source Libraries and its functions

May 1, 2022

Automated Identification of Open Source Library Functions (Translation by gemini)

In various reverse engineering projects and CTF competition problems, many use open-source libraries, such as the Lua interpreter integrated in LuaJIT, OpenSSL, zlib, and other common libraries. Many libraries have very complex logic. If analyzed in detail during reverse engineering, it will increase the workload significantly. Moreover, much of the logic involves mathematics and cryptography; without this theoretical knowledge, it is virtually impossible to reverse-engineer these programs. Therefore, the identification of library functions becomes crucial. This article introduces two methods for identifying open-source library functions.

Strategy

First, let’s discuss the general idea. Since we have found the open-source library, the logic of many functions used in the program is basically the same as in the library. In other words, we have part of the program’s source code, but we don’t know which part of the program corresponds to which piece of source code. If we can find the corresponding function, we can understand its purpose by consulting the library’s source code, function names (symbols), and development documentation. Both methods introduced in this article expand on this idea.

1. Automated Analysis - BinDiff

BinDiff is an automated tool used to compare the similarity of function code in IDA databases. The higher the identified code similarity, the greater the likelihood that they implement the same logic. In other words, it is a tool for matching function names with function bodies in the program.

Environment Setup

Java SE Development Kit (JDK) First, you need to install the JDK. The author uses JDK 11, which can be downloaded and installed from Oracle’s official website. Download link: https://www.oracle.com/java/technologies/downloads/

IDA Pro BinDiff performs matching through databases generated by IDA. IDA Pro is a powerful reverse engineering tool that requires a license. The author uses IDA version 7.6. Official website: https://hex-rays.com/ida-pro/

BinDiff Official download link: https://www.zynamics.com/software.html Note: During installation, you will be asked to select the path for IDA. Select the directory where your IDA tools (ida.exe/ida64.exe) are located.

Usage

To use BinDiff, you need to provide another .idb (or .i64) database as a comparison.

Open the database of the program being analyzed with IDA.
Go to File -> BinDiff and select the database you want to compare against.
After waiting for a while, you can see the matching results in the Matched Functions tab.

Functions on the left named sub_XXXX (unnamed) can have their corresponding function names identified this way.

Example: Defcon Final 2021 - Barb Metal

This problem used the mrubyc library to implement a virtual machine and encrypted the VM code. The goal was to find vulnerabilities in the VM code to get the flag. (Source: https://github.com/o-o-overflow/dc2021f-barb-metal-public)

Step 1: Identifying Library Characteristics Look for strings in OpenSSL, assertion information added during compilation, etc. These clues help locate which specific library was used. By searching for strings in IDA, we found many code path traces pointing to:

LibTomMath(https://github.com/libtom/libtommath)
LibTomCrypt(https://github.com/libtom/libtomcrypt)
mrubyc(https://github.com/mrubyc/mrubyc)

Step 2: Compiling the Library Compile the open-source library into a binary library file of the same architecture (in this case, Linux x86). Note: Ensure you compile for 32-bit if the target is 32-bit. Also, convert static libraries (.a) to shared libraries (.so) for easier use with BinDiff, as a shared library only requires one IDA database.
By modifying the Makefile (e.g., adding -shared -fPIC to CFLAGS and using gcc -shared for linking), we generated libmrubyc.so, libtomcrypt.so, and libtommath.so. We then imported these into IDA to generate databases for BinDiff analysis.

分别导入IDA生成对应的idb，使用BinDiff选中进行分析

但是对于libTomCrypt和libTomMath，BinDiff便有些鸡肋了。

2. Manual Analysis - Source Code Comparison

If automated tools like BinDiff are not effective, we must manually compare the source code to analyze each function’s purpose.

Using LibTom as an example again: we can obtain directory and file information, specific lines of code, or assertion info. Sometimes functions even have their names written directly inside them.

However, in many CTF problems, these characteristics are removed. In these cases, the function’s internal logic becomes the focus. No matter how the name or source changes, the logic remains the same.

Example: Manual Logic Analysis

If the program has no searchable info, start from main. By browsing the pseudocode, we see the program prints a key, encrypts data.txt into enc.dat, and splits it with the string FLAG.

By looking for constant values, we might find something like 0x312, 0x212, etc. Searching these values online might reveal they belong to the Nintendo DS encryption algorithm. By comparing with the source code, we can map:

sub_B91F -> apply_keycode
sub_B7BC -> crypt_64bit_up

The analysis then follows: identify the library, find critical variables (like those in the MIRACL library), and manually rename functions based on logic (e.g., constants like 0x80000000 often appear in specific math functions like convert).