ARCHIVE

Reverse Engineering Techniques - Detect open source Libraries and its functions


Automated Identification of Open Source Library Functions (Translation by gemini)

In various reverse engineering projects and CTF competition problems, many use open-source libraries, such as the Lua interpreter integrated in LuaJIT, OpenSSL, zlib, and other common libraries. Many libraries have very complex logic. If analyzed in detail during reverse engineering, it will increase the workload significantly. Moreover, much of the logic involves mathematics and cryptography; without this theoretical knowledge, it is virtually impossible to reverse-engineer these programs. Therefore, the identification of library functions becomes crucial. This article introduces two methods for identifying open-source library functions.

Strategy

First, let’s discuss the general idea. Since we have found the open-source library, the logic of many functions used in the program is basically the same as in the library. In other words, we have part of the program’s source code, but we don’t know which part of the program corresponds to which piece of source code. If we can find the corresponding function, we can understand its purpose by consulting the library’s source code, function names (symbols), and development documentation. Both methods introduced in this article expand on this idea.

1. Automated Analysis - BinDiff

BinDiff is an automated tool used to compare the similarity of function code in IDA databases. The higher the identified code similarity, the greater the likelihood that they implement the same logic. In other words, it is a tool for matching function names with function bodies in the program.

Environment Setup

Java SE Development Kit (JDK) First, you need to install the JDK. The author uses JDK 11, which can be downloaded and installed from Oracle’s official website. Download link: https://www.oracle.com/java/technologies/downloads/

IDA Pro BinDiff performs matching through databases generated by IDA. IDA Pro is a powerful reverse engineering tool that requires a license. The author uses IDA version 7.6. Official website: https://hex-rays.com/ida-pro/

BinDiff Official download link: https://www.zynamics.com/software.html Note: During installation, you will be asked to select the path for IDA. Select the directory where your IDA tools (ida.exe/ida64.exe) are located.

Usage

To use BinDiff, you need to provide another .idb (or .i64) database as a comparison.

  1. Open the database of the program being analyzed with IDA.
  2. Go to File -> BinDiff and select the database you want to compare against.
  3. After waiting for a while, you can see the matching results in the Matched Functions tab.

Functions on the left named sub_XXXX (unnamed) can have their corresponding function names identified this way.

Example: Defcon Final 2021 - Barb Metal

This problem used the mrubyc library to implement a virtual machine and encrypted the VM code. The goal was to find vulnerabilities in the VM code to get the flag. (Source: https://github.com/o-o-overflow/dc2021f-barb-metal-public)

Step 1: Identifying Library Characteristics Look for strings in OpenSSL, assertion information added during compilation, etc. These clues help locate which specific library was used. By searching for strings in IDA, we found many code path traces pointing to:

  • Step 2: Compiling the Library Compile the open-source library into a binary library file of the same architecture (in this case, Linux x86). Note: Ensure you compile for 32-bit if the target is 32-bit. Also, convert static libraries (.a) to shared libraries (.so) for easier use with BinDiff, as a shared library only requires one IDA database.

    By modifying the Makefile (e.g., adding -shared -fPIC to CFLAGS and using gcc -shared for linking), we generated libmrubyc.so, libtomcrypt.so, and libtommath.so. We then imported these into IDA to generate databases for BinDiff analysis.

分别导入IDA生成对应的idb,使用BinDiff选中进行分析

但是对于libTomCrypt和libTomMath,BinDiff便有些鸡肋了。

2. Manual Analysis - Source Code Comparison

If automated tools like BinDiff are not effective, we must manually compare the source code to analyze each function’s purpose.

Using LibTom as an example again: we can obtain directory and file information, specific lines of code, or assertion info. Sometimes functions even have their names written directly inside them.

However, in many CTF problems, these characteristics are removed. In these cases, the function’s internal logic becomes the focus. No matter how the name or source changes, the logic remains the same.

Example: Manual Logic Analysis

If the program has no searchable info, start from main. By browsing the pseudocode, we see the program prints a key, encrypts data.txt into enc.dat, and splits it with the string FLAG.

By looking for constant values, we might find something like 0x312, 0x212, etc. Searching these values online might reveal they belong to the Nintendo DS encryption algorithm. By comparing with the source code, we can map:

  • sub_B91F -> apply_keycode
  • sub_B7BC -> crypt_64bit_up

The analysis then follows: identify the library, find critical variables (like those in the MIRACL library), and manually rename functions based on logic (e.g., constants like 0x80000000 often appear in specific math functions like convert).

The two datasets mentioned above, upon closer analysis of the source code and program logic, reveal that they simply write their keys into an sbox; the subsequent large amount of data is completely unused. These two datasets are actually a distraction for our reverse engineering.

Returning to the main function, we find the encryption logic sub_CB0.

def crypt_64bit_down(x, y):
    for i in range(0x11, 1, -1):
        z = sbox[i] ^ x
        x = sbox[0x012 + ((z>>24)&0xff)];
        x = sbox[0x112 + ((z>>16)&0xff)] + x;
        x = sbox[0x212 + ((z>> 8)&0xff)] ^ x;
        x = sbox[0x312 + ((z>> 0)&0xff)] + x;
        x = y ^ x
        y = z
    x = x ^ sbox[1]
    y = y ^ sbox[0]
    return (x, y)

def byteswap32(a):
    return (a >> 8) & 0xFF00 | (a << 8) & 0xFF0000 | (a << 24) & 0xFF000000 | (a >> 24) & 0xFF

for i in range(0, len(encbuf), 2):
    a, b = encbuf[i], encbuf[i+1]
    a, b = crypt_64bit_down(byteswap32(a), byteswap32(b))
    a = byteswap32(a)
    b = byteswap32(b)
    encbuf[i: i+2] = [a, b]

Next, let’s return to the main function and trace sub_2200->sub_1CB0. This function is quite large, but it contains some key variables that allow us to search.

A Google search yields results for the MIRACL library, whose source code is here: https://github.com/miracl/MIRACL/blob/master/source/mrcore.c

I also found a good article on Kanxue (https://bbs.pediy.com/thread-222568.htm), which specifically discusses how to identify MIRACL library functions. Unfortunately, this problem removed the function characteristics mentioned in the article, so we can’t use the approach described there.

So we compiled it directly and dragged it into BinDiff, but unfortunately, the recognition rate was surprisingly low, making it unusable for reference.

However, our compiled .so file isn’t useless. Open it with IDA, and the first thing to do is search for the key value. We’ll find that our function name is mirsys_basic. Next, we’ll refer to the symbol table and fill the corresponding function from the .so file into our reverse engineering program.

Next, we’ll examine each function and rename it based on its conditional statements, function calls, and other functional logic. For example, if a function uses the value 0x80000000, we’ll search for it directly. Check the code similarity for each function.

I guessed it was the convert function, and so on. Using cross-referencing with X, I found that these two functions were very similar, especially the one with the parameter 1.

Continuing to look, I could basically conclude that these two functions were completely identical, named mr_jsf. Then, functions like copy, add, sub, subdiv, mr_addbit, and premult were also identified. Continuing in this manner, I identified the cinstr function.

![](<./assets/image (40).png>)

Here’s a little trick: since we’ve identified the mr_berror function, which is called in many functions but only a few times (between 1 and 5), we can use cross-referencing to determine its call count and location within the program. Then, by looking at its parameters (e.g., the 3 in this case), we can determine the function’s name.

For example, the complex sub_6370 function is called twice, with values of 10 and 7 respectively.

Find functions that are called twice, with both arguments being 10 and 7.

Manual analysis confirms that it is the powmod function.

Following this pattern, restore the symbols accordingly.

The encryption logic is as follows:

It’s a very simple logic. Let’s return to the sub_C06E function within the main function.

After searching and identification, it is clearly a base58 algorithm; the dictionary has been changed.

9876432*Flag{n0T-EA5y=to+f1Nd}BCDGHJKLMPQRSUVWXYZbcehijkmp

The following code will give you the value of the key, which can be obtained by taking the cube root of the integer.

STANDARD_ALPHABET = b'123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'
CUSTOM_ALPHABET = b'9876432*Flag{n0T-EA5y=to+f1Nd}BCDGHJKLMPQRSUVWXYZbcehijkmp'

with open("./enc.dat.bak", "rb") as f:
    buf = f.read()

encrypted, key = buf.split(b"FLAG")

key = base58.b58decode(key.translate(bytes.maketrans(CUSTOM_ALPHABET, STANDARD_ALPHABET)))

print("Key: " + int(gmpy2.iroot(int.from_bytes(key, "big"))[0]).to_bytes(8, "big").decode())

# Key: N5f0cuS_

By debugging, the value of the key was changed, and then after the program generated the sbox, it was dumped to obtain sbox.bin.

The following is the author’s complete solution script:

import base58
import struct
import gmpy2

STANDARD_ALPHABET = b'123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'
CUSTOM_ALPHABET = b'9876432*Flag{n0T-EA5y=to+f1Nd}BCDGHJKLMPQRSUVWXYZbcehijkmp'

with open("./enc.dat.bak", "rb") as f:
    buf = f.read()

encrypted, key = buf.split(b"FLAG")

key = base58.b58decode(key.translate(bytes.maketrans(CUSTOM_ALPHABET, STANDARD_ALPHABET)))

print("Key: " + int(gmpy2.iroot(int.from_bytes(key, "big"))[0]).to_bytes(8, "big").decode())

# Key: N5f0cuS_

with open("./sbox.bin", "rb") as f:
    buf = f.read()

it = struct.iter_unpack("<I", buf)
sbox = []
while True:
    try:
        sbox.append(next(it)[0])
    except StopIteration:
        break

def crypt_64bit_down(x, y):
    for i in range(0x11, 1, -1):
        z = sbox[i] ^ x
        x = sbox[0x012 + ((z>>24)&0xff)];
        x = sbox[0x112 + ((z>>16)&0xff)] + x;
        x = sbox[0x212 + ((z>> 8)&0xff)] ^ x;
        x = sbox[0x312 + ((z>> 0)&0xff)] + x;
        x = y ^ x
        y = z
    x = x ^ sbox[1]
    y = y ^ sbox[0]
    return (x, y)

def byteswap32(a):
    return (a >> 8) & 0xFF00 | (a << 8) & 0xFF0000 | (a << 24) & 0xFF000000 | (a >> 24) & 0xFF

it = struct.iter_unpack("<I", encrypted)
encbuf = []
while True:
    try:
        encbuf.append(next(it)[0])
    except StopIteration:
        break

for i in range(0, len(encbuf), 2):
    a, b = encbuf[i], encbuf[i+1]
    a, b = crypt_64bit_down(byteswap32(a), byteswap32(b))
    a = byteswap32(a)
    b = byteswap32(b)
    encbuf[i: i+2] = [a, b]

data = b''
for i in range(0, len(encbuf), 2):
    data += struct.pack("<I", encbuf[i+1])
    data += struct.pack("<I", encbuf[i])

print(data.decode())

Attachments

Contact me and see if I can find these challenges. Unfortunately I have formatted my drive too many times without backup over these years. — 2026