File Structure of an APK

📌Why should you understand the file structure?

  • Understanding the file structure of an Android application is essential when reverse engineering. It helps you identify which files are worth analyzing and which ones can be safely ignored.

📌What is APK?

  • APK stands for “Android Package Kit.” It is the file format that distributes and install applications on Android devices. An APK file contains all the necessary components of an Android app, including code, resources, assets, and manifest files. We can easily unzip it and get its components, but it will still be encoded.

  • Users can download and install APK files from various sources, such as the Google Play Store or third-party app stores.

📍Unzipping APK

  • APK files can be unpacked using the command:

    unzip file.apk
  • Unzipping an APK will reveal its components, which will look something like this:

  • Components

    • assets directory ⇒ Contains resources such as pictures, sounds, certificates, or external files.

    • com directory ⇒ It usually doesn't contain any interesting data for us.

    • lib directory

      • Look into this directory to check if the application supports the x86 architecture. If there is a directory containing a lib or if the app does not contain additional libraries at all, this is most likely the case.

      • These shared object files contain C and C++ code compiled into this format and are processor-dependent. If your phone has an ARM CPU, you'll find directories like armeabi and armeabi-v7a. (The developer uses shared object files to import functions from them)

      📌 When you write an Android application, you use Java, which compiles into a classes.dex file. However, Java isn't always efficient for tasks like rendering or 3D effects, so developers can also include C and C++ code, which gets compiled into these shared object files.

    • META-INF directory

      • Contains code signatures related to the app's signing process. Every Android app needs to be signed.

      • MANIFEST.MF: Contains a list of names/hashes (usually SHA256 in Base64) for all the files of the APK.

      • CERT.SF: Contains a list of names/hashes of the corresponding lines in the MANIFEST.MF file.

      • CERT.RSA: This file contains the public key and the signature of CERT.SF.

    • res directory

      • Contains predefined application resources, like XML files that define a state list of colors, user interface layout, fonts, values, etc.

    • AndroidManifest.xml

      • Contains meta-information about the application.

      • A manifest file that describes the application's package name, activities, resources, version, etc.

    • classes.dex file

      • This is the most important file, as it contains the compiled Java source code in Dalvik executable format, to be executed by the Android Runtime.

    • resources.arsc file

      • Contains information about strings or color definitions used in the app (not usually important).

  • In the AndroidManifest.xml we look for the permissions and exported components. We will find out the corresponding classes which define the key components and can hunt for vulnerabilities.

  • We should also check the assets directory, as it may contain certificates or other data used by the app that isn’t visible in the decompiled Java code. If this directory is absent, we can skip this check. :)

We usually care about Assets, lib, META-INF, AndroidManifest.xml, classes.dex.


📌Dalvik (.dex)

On Android, applications are written in Java but run on the Dalvik virtual machine, designed to work efficiently on battery-powered devices. The Java source code is compiled into a different byte format called the Dalvik executable format, optimized for ARM architecture. This format helps conserve resources and battery life on mobile devices.

The Dalvik executable format is represented as an optimized text file format called .dex. It contains classes that are generated from the Java source code in the Dalvik executable format. If needed, the .dex file can be converted back to a regular text file format.

One important limitation of the .dex file is that it can only contain 65,535 methods. If an application exceeds this limit, it will result in multiple .dex files, named classes.dex, classes2.dex, and so on. Libraries, frameworks, and the Android system itself may also lead to multiple .dex files due to the number of methods they contain.

Overall, the Dalvik virtual machine and .dex file format are crucial components of the Android platform, enabling efficient execution of Java-based applications on ARM architecture-based devices.


📌classes.dex

  • The classes.dex files contain Java source code that has been compiled into Dalvik executable bytecode format.

  • The ghex tool can be used to view the raw hexadecimal data of the classes.dex file but cannot disassemble the bytecode into a readable format.

    • You can use the following command to inspect the file:

      ghex classes.dex 
    • The output will look like this:

    Here, we can see some header information and identify if the file is compressed or detect certain patterns.
  • To disassemble the Dalvik bytecode into a human-readable format, you should use a tool like dexdump


📌App retrieval

  • How can we obtain the APK file?

    1. The program may provide the APK directly by uploading it.

    2. We can find the APK online using APKCombo

    3. If only the package name is provided, here’s how to retrieve the APK:

      1. Extract it from the mobile device after installation from Google Play (using ADB tool or an APK extractor app).

      2. Use a Chrome extension (such as an APK downloader) to download the APK.

  • Using the ADB tool, follow these commands in the adb shell to retrieve the APK file:

    # 1. List all packages
    	pm list packages -3
    	pm list packages | grep -i <app_name>
    
    # 2. Get the package path :
    	pm path <package_name>
    
    # 3. Pull the app to your computer
    	adb pull <app_path>
  • After running the above commands, the APK file will be on your computer.


📌Decompiling with apktool

  • APKTool is a popular open-source tool used to decompile and reverse-engineer Android APK files. It allows developers and researchers to extract the APK’s resources, manifest, and smali code (Dalvik bytecode) into a human-readable format. The decompiled code provides insights into how the app functions, its resources, and even allows making modifications to the code.

  • Advantages:

    • Provides human-readable code in smali format, making it easier to understand and analyze.

    • Retains the original file structure and resources, making it suitable for modifying and recompiling the APK.

  • Usage:

    • To decompile the app, use the apktool with the d option:

      apktool d <app.apk>
    • APKTool will create a directory containing the decompiled resources:

    • To rebuild the application after making changes, use the b option:

      apktool b <directory>

📍Unzip vs decompile

  • White highlight: These directories are identical.

  • Green highlight: These directories are identical but exist in a different structure.

  • Other directories: These differ between APKTool and unzipping.

📌In summary: Unzipping an APK gives you access to its non-code assets, but the code remains in compiled form, making it hard to understand. On the other hand, decompiling with APKTool provides deeper insights into the app's functionality and code by converting the bytecode into Smali, which is more human-readable


Last updated