Security hardening of Android native code

This post is in-fact a continuation of my previous post on Frida detection. In this post, I will explain the mechanisms I have followed in hardening the native code written for Frida detection.Generally binary from high level languages such as Java help to perform easier static analysis compared to that of written in C, C++ languages. Though such low level languages are prone to vulnerabilities such as buffer overflow, memory corruption and similar such memory related issues, security sensitive applications rely on having core sensitive part of the code written in C or C++ for performance and to make it difficult for reverse engineers. In this post, 3 mechanisms are explained to improve the security stature of the native code by using resources from open source.

Replace libc APIs with syscalls

As the name suggests, syscalls are system calls that are not generally invoked directly but rather via wrapper functions in libc. The problem is libc being a dynamic library, exports all file, memory, string related operations. Using such exported APIs are common attack vectors as it allows for easier dynamic instrumentation as well as to bypass certain protections used by an App. In simple terms, just by doing a preload(LD_PRELOAD) of libc with a custom built libc library, one can understand the files being checked, strings being compared etc. In the frida detection project, there are file and string operations being used. To mitigate these attack vectors, libc dependency has to be removed. A better alternative is to use syscalls directly in the project. This is not trivial especially with embedding architecture specific assembly instructions for each and every file operation. To our rescue comes MUSL, an MIT-licensed implementation of the standard C library targetting the Linux syscall API, suitable for use in a wide range of deployment environments. For my github project, architecture specific syscall headers(syscall_arch.h) is good enough to expand it for my specific use case of replacing file related operations with syscalls. The syscall_arch header exposes simplified APIs such as syscall0, syscall1 etc. The number in the suffix indicates the number of argument required. To use the syscalls, I have created a wrapper function for each of the file related libc APIs such as open, read, close to use syscalls (as shown below). The first argument in the syscall is an indicator of an operation. For example, __NR_read indicates the file read operation.

static inline int  my_openat(int __dir_fd, const void* __path, int __flags, int __mode ){
    return (int)__syscall4(__NR_openat, __dir_fd, (long)__path, __flags, __mode);
}

static inline ssize_t my_read(int __fd, void* __buf, size_t __count){
    return __syscall3(__NR_read, __fd, (long)__buf, (long)__count);
}

static inline off_t my_lseek(int __fd, off_t __offset, int __whence){
    return __syscall3(__NR_lseek, __fd, __offset, __whence);
}

static inline int my_close(int __fd){
    return (int)__syscall1(__NR_close, __fd);
}

This makes the options narrower for a reverse engineer to statically tamper the code(requires re-signing of App) or dynamically tamper at instruction level. To make the dynamic tampering even harder, these functions performing syscalls need to be inlined. This makes the reference to SuperVisor Calls (SVC) multi-fold in the text section. After inlining, there are 19 references to SVC calls in the final binary.

Replace libc APIs with custom implementation

As discussed in previous section, many libraries tend to depend on libc for string related operations (strcmp, strlen, strstr etc) and some memory related operations (memcmp, memset, memcpy ). Having a custom implementation of such operations reduces the attack surface. To showcase this, I have used custom implementation of string, memory related APIs and inlined them in the github project. Many open source projects ( apple, google, glibc ) provide implementations of libc calls. Check if there are any licensing issues before you use them in your project.

Native code obfuscation using O-LLVM

Obfuscator-LLVM project provides an open-source fork of the LLVM compilation suitable to provide increased software security through code obfuscation. O-LLVM has multiple features such as bogus control-flow, control-flow flattening and instruction substitution. String obfuscation is one important feature I am missing in this tool. Though it is built with an outdated LLVM ( v4.0 ), it serves my need. For convenience, I am pushing the O-LLVM built binaries here. The following changes are made in the CMakeLists.txt file to point to the clang of O-LLVM and append the O-LLVM specific obfuscation flags to CMAKE C flags.

set(OLLVM_PATH ${CMAKE_HOME_DIRECTORY}/../../../../../build/bin)
set(OLLVM_C_COMPILER ${OLLVM_PATH}/clang)
set(OLLVM_CXX_COMPILER ${OLLVM_PATH}/clang++)

set(OLLVM_C_FLAGS "-mllvm -sub -mllvm -bcf -mllvm -fla")

set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OLLVM_C_FLAGS}")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OLLVM_C_FLAGS}")
set(CMAKE_C_COMPILER ${OLLVM_C_COMPILER})
set(CMAKE_CXX_COMPILER ${OLLVM_CXX_COMPILER})

The final obfuscated library is generated. No wonder the generated native library got bloated from 18kb to 117kb. To understand the impact of this obfuscator, counted the number of syscalls. It got increased from 19 to a whopping 135. The below diagram shows the super complex control flow graphs created by the O-LLVM obfuscator for a single function in comparison with the already complex graph before obfuscation.

For reversing this, one has to start with reducing the control flow graph by removing the pseudo blocks and then can start a meaningful static analysis. Security by obscurity is generally not considered a good idea, but in this world of growing expertise in reverse engineering, strong obfuscation provides some resistance to static analysis if not significant.

These 3 mechanisms described above hardens the native code significantly. When you think about productizing such code with syscalls, custom libc implementation, inlining and combined with obfuscation, size of binary and performance gets a hit. Apply your judgement on which functions need to be inlined and which functions require more obfuscation by using function annotation feature of O-LLVM. All 3 mechanisms are applied on the github project created for Frida detection and the obfuscated APK is provided for interested reverse engineers to analyse.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s