Python Autopsy Module Tutorial #2: The Data Source Ingest Module
In our last blog post, we built a basic Python Autopsy module that looked for big and round files. In our second post in the Autopsy: Python Module Series, we’re going to make two data source ingest modules. The first focuses on finding SQLite databases and parsing them, and the second focuses on running a command line tool on a disk image.
The main difference from the first tutorial, which focused on file ingest modules, is that these are data source ingest modules. Data source-ingest modules are given a reference to a data source and the module needs to find the files to analyze, whereas file-level ingest modules are given a reference to each file in the data source.
This post assumes you’ve read Part 1 of the series. That means that you should know why it is better to write an Autopsy module versus a stand-alone tool, and what you need to set up (Autopsy installed, text editor, etc.). We also assume you are still motivated by some cash prizes associated with OSDFCon 2015 module competition.
I’m also assuming you remember from Part 1 about the limitations (and benefits) of data source ingest modules. The most notable difference between them is that data source-ingest modules may not have access to carved files or files that are inside of ZIP files. For our example in this post, we are looking for a SQLite database with a specific name, and it will not be inside of a ZIP file, so data source ingest modules are the most efficient and will get us results faster.
The last assumption is that you know something about SQL queries. We have some example queries below and we don’t go into detail about how they work.
Making Your Module Folder
We’ll start by making our module folder. As we learned in Part 1, every Python module in Autopsy gets its own folder. To find out where you should put your Python module, launch Autopsy and choose the Tools -> Python Plugins menu item. That will open a subfolder in your AppData folder, such as “C:UsersJDoeAppDataRoamingAutopsypython_modules”.
Make a folder inside of there to store your module. Call it “DemoScript2”. Copy the dataSourcengestModule.py sample file from github into the this new folder and rename it to FindContactsDb.py.
Writing The Script
We are going to write a script that:
- Queries the backend database for files of a given name
- Opens the database
- Queries data from the database and makes an artifact for each row
Open the FindContactsDb.py script in your favorite text editor. The sample Autopsy Python modules all have TODO entries in them to let you know what you should change. The below steps jump from one TODO to the next.
- Factory Class Name: The first thing to do is rename the sample class name from “SampleJythonDataSourceIngestModuleFactory” to “ContactsDbIngestModuleFactory”. In the sample module, there are several uses of this class name, so you should search and replace for these strings.
- Name and Description: The next TODO entries are for names and descriptions. These are shown to users. For this example, we’ll name it “Contacts Db Analyzer”. The description can be anything you want. Note that Autopsy requires that modules have unique names, so don’t make it too generic.
- Ingest Module Class Name: The next thing to do is rename the ingest module class from “SampleJythonDataSourceIngestModule” to “ContactsDbIngestModule”. Our usual naming convention is that this class is the same as the factory class with “Factory” removed from the end. There are a couple of places where this name is used, so do a search and replace in your code.
- startUp() method: The startUp() method is where each module initializes. For our example, we don’t need to do anything special in here except save a reference to the passed in context object. This is used later on to see if the module has been cancelled.
- process() method: This is where we do our analysis and we’ll focus on this more in the next section.
That’s it. In the file-level ingest module, we had a shutdown() method, but we do not need that with data source-level ingest modules. When their process method is finished, it can shut itself down. The process() method will be called only once.
The process() Method
For this tutorial, you can start by deleting the contents of the existing process() method in the sample module. The full source code is linked to at the end of this blog and shows more detail about a fully fledged module. We’ll just cover the analytics in the blog.
Because data source-level ingest modules are not passed in specific files to analyze, nearly all of these types of modules will need to use the FileManager service to find relevant files. Check out the methods on that class to see the different ways that you can find files.
NOTE: See the end of this blog for an example of when you simply want to run a command line tool on a disk image instead of querying for files to analyze.
For our example, we want to find all files named “contacts.db”. There is a findFiles() method that does that. If we want to restrict it to a certain parent folder, there is another findFiles() method that does that. You can also use SQL syntax to match file patterns, such as “%.jpg” to find all files with a JPEG extension.
Our example needs these two lines to get the FileManager for the current case and to find the files.
fileManager = Case.getCurrentCase().getServices().getFileManager() files = fileManager.findFiles(dataSource, "contacts.db")
findFiles() returns a list of AbstractFile objects. This gives you access to the file’s metadata and content.
For our example, we are going to open these SQLite files. That means that we need to save them to disk. This is less than ideal because it wastes time writing the data to disk and then reading it back in, but it is the only option with many libraries. If you are doing some other type analysis on the content, then you do not need to write it to disk. You can read directly from the AbstractFile (see the sample modules for specific code to do this).
The ContentUtils class provides a utility to save file content to disk. We’ll make a path in the temp folder of our case directory. To prevent naming collisions, we’ll name the file based on its unique ID. The following two lines save the file to lclDbPath.
lclDbPath = os.path.join(Case.getCurrentCase().getTempDirectory(), str(file.getId()) + ".db") ContentUtils.writeToFile(file, File(lclDbPath))
Next, we need to open the SQLite database. We are going to use the Java JDBC infrastructure for this. JDBC is Java’s generic way of dealing with different types of databases. To open the database, we do this:
Class.forName("org.sqlite.JDBC").newInstance() dbConn = DriverManager.getConnection("jdbc:sqlite:%s" % lclDbPath)
With our connection in hand, we can do some queries. In our sample database, we have a single table named ‘contacts’, which has columns for name, email, and phone. We first start by querying for all rows in our simple table:
stmt = dbConn.createStatement() resultSet = stmt.executeQuery("SELECT * FROM contacts")
For each row, we are going to get the values for the name, e-mail, and phone number and make a TSK_CONTACT artifact. Recall from the first tutorial that posting artifacts to the blackboard allows modules to communicate with each other and also allows you to easily display data to the user. The TSK_CONTACT artifact is for storing contact information.
The basic approach in our example is to make an artifact of a given type (TSK_CONTACT) and have it be associated with the database it came from. We then make attributes for the name, email, and phone. The following code does this for each row in the database:
while resultSet.next(): # Make an artifact on the blackboard and give it attributes art = file.newArtifact(BlackboardArtifact.ARTIFACT_TYPE.TSK_CONTACT) name = resultSet.getString("name") art.addAttribute(BlackboardAttribute( BlackboardAttribute.ATTRIBUTE_TYPE.TSK_NAME_PERSON.getTypeID(), ContactsDbIngestModuleFactory.moduleName, name)) email = resultSet.getString("email") art.addAttribute(BlackboardAttribute( BlackboardAttribute.ATTRIBUTE_TYPE.TSK_EMAIL.getTypeID(), ContactsDbIngestModuleFactory.moduleName, email)) phone = resultSet.getString("phone") art.addAttribute(BlackboardAttribute( BlackboardAttribute.ATTRIBUTE_TYPE.TSK_PHONE_NUMBER.getTypeID(), ContactsDbIngestModuleFactory.moduleName, phone))
That’s it. We’ve just found the databases, queried them, and made artifacts for the user to see. There are some final things though. First, we should fire off an event so that the UI updates and refreshes with the new artifacts. We can fire just one event after each database is parsed (or you could fire one for each artifact — it’s up to you).
IngestServices.getInstance().fireModuleDataEvent( ModuleDataEvent(ContactsDbIngestModuleFactory.moduleName, BlackboardArtifact.ARTIFACT_TYPE.TSK_CONTACT, None))
And the final thing is to clean up. We should close the database connections and delete our temporary file.
stmt.close() dbConn.close() os.remove(lclDbPath)
Data source-level ingest modules can run for quite some time. Therefore, data source-level ingest modules should do some additional things that file-level ingest modules do not need to.
- Progress bars: Each data source-level ingest module will have its own progress bar in the lower right. A reference to it is passed into the process() method. You should update it to provide user feedback.
- Cancellation: A user could cancel ingest while your module is running. You should periodically check if that occurred so that you can bail out as soon as possible. You can do that with a check of:
Debugging and Development Tips
You can find the full file along with a small sample database on github. To use the database, add it as a logical file and run your module on it.
Whenever you have syntax errors or other errors in your script, you will get some form of dialog from Autopsy when you try to run ingest modules. If that happens, fix the problem and run ingest modules again. You don’t need to restart Autopsy each time!
The sample module has some log statements in there to help debug what is going on since we don’t know of better ways to debug the scripts while running in Autopsy.
While the above example outlined using the FileManager to find files to analyze, the other common use of data source-level ingest modules is to wrap a command line tool that takes a disk image as input.
A sample program (RunExe.py) that does that can be found on github. I’ll cover the big topics of that program in this section. There are more details in the script about error checking and such.
Finding The Executable
To write this kind of data source-level ingest module, put the executable in your module’s folder (the DemoScript2 folder we previously made). Use “__file__” to get the path to where your script is and then use some os.path methods to get to the executable in the same folder.
path_to_exe = os.path.join(os.path.dirname(os.path.abspath(__file__)), "img_stat.exe")
In our sample program, we do this and verify we can find it in the startup() method so that if we don’t, then ingest never starts.
Running The Executable
Data sources can be disk images, but they can also be a folder of files. We only want to run our executable on a disk image. So, verify that:
if not isinstance(dataSource, Image): self.log(Level.INFO, "Ignoring data source. Not an image") return IngestModule.ProcessResult.OK
You can get the path to the disk image using dataSource.getPaths().
Once you have the EXE and the disk image, you can use the various subprocess methods to run them.
Showing User The Results
After the command line tool runs, you have the option of either showing the user the raw output of the tool or parsing it into individual artifacts. Refer to previous sections of this tutorial and the previous tutorial for making artifacts. If you want to simply show the user the output of the tool, then save the output to the Reports folder in the Case directory:
reportPath = os.path.join(Case.getCurrentCase().getCaseDirectory(), "Reports", "img_stat-" + str(dataSource.getId()) + ".txt")
Then you can add the report to the case so that it shows up in the tree in the main UI panel.
Case.getCurrentCase().addReport(reportPath, "Run EXE", "img_stat output")
Data source-level ingest modules allow you to query for a subset of files by name or to run on an entire disk image. This tutorial has shown an example of both use cases and shown how to use SQLite in Jython. Now get coding for the OSDFCon challenge!
The next blog posting in this series will be about Report modules.
Want to learn more about Autopsy? Join us at one or more of the following events:
- Attend our workshop at the DFRWS USA Conference this week in Philadelphia, PA.
- Learn about Autopsy 3’s latest features at the HTCIA Conference Aug. 30 – Sept. 2 in Orlando, FL.
- Be sure to come to OSDFCon, our premier event, where we’ll be offering a Python-focused talk and a half-day developer workshop in the DC area in late October.