NOTE: You can find the full code here.

For some reason, one of my friends has been having some issues with Firefox on Asahi Linux. The Discord tab keeps leaking memory, making the browser (and indeed the entire system, really) nearly unusable. As part of a troubleshooting effort, I asked for them to send me the dump file produced by the “Memory” tab in Firefox. I didn't know what data would be contained inside of this, but I was expecting something a little more detailed than about:memory. Instead, what I got is a giant tree of opaque pointers. Still, it figured I'd write about it as it's still probably useful for someone.

Firefox includes a tool to parse and look into these files, but it can't be built unless you build it from the Firefox source tree, and I've never been able to get it building. It doesn't help that the only existing 3rd party tool to do this errored out for me, supposedly because the snapshot file was too large. I subsequently built my own tool.

The `.fxsnapshot` format

The .fxsnapshot format is just a .gz archive that contains some raw binary data. The data is encoded as the length of a message, followed by the binary representation of a message. The binary data of each individual message is some Google Protocol Buffer data, the schema for which can be found in the Firefox source tree. The CoreDump.proto handily provides a specification of the file, which is found below because I'm nice:

+-----------------------------------------------------------------------+
| Varint32: The size of following `Metadata` message.                   |
+-----------------------------------------------------------------------+
| message: The core dump `Metadata` message.                            |
+-----------------------------------------------------------------------+
| Varint32: The size of the following `Node` message.                   |
+-----------------------------------------------------------------------+
| message: The first `Node` message. This is the root node.             |
+-----------------------------------------------------------------------+
| Varint32: The size of the following `Node` message.                   |
+-----------------------------------------------------------------------+
| message: A `Node` message.                                            |
+-----------------------------------------------------------------------+
| Varint32: The size of the following `Node` message.                   |
+-----------------------------------------------------------------------+
| message: A `Node` message.                                            |
+-----------------------------------------------------------------------+
| .                                                                     |
| .                                                                     |
| .                                                                     |
+-----------------------------------------------------------------------+

Implementation

Now let's write a tool to deal with this format, and we'll do it in Python for simplicity. Let's scaffold the program:

import sys

def usage(file):
    print("fxsdump [snapshot]", file=file)

def main():
    if len(sys.argv) != 2:
        usage(sys.stderr)
        exit(1)
    elif sys.argv[1] == '-h':
        usage(sys.stdout)
        return

	input_path = sys.argv[1]
	results = {'metadata': None, 'nodes': []}

	# ...

if __name__ == '__main__':
    main()

Great! Let's go ahead and read the file and ungzip it.

import gzip

# Inside main().

with gzip.open(input_path, 'rb') as f:
	# ...

Before we do much else, we need to define a function to read the message defined in the length-data format used in the snapshot file.

import CoreDump_pb2

def read_varint32(file):
    shift = 0
    result = 0

    while True:
        byte = file.read(1)
        if not byte:
            raise EOFError("Unexpected end of file while reading varint32.")

        byte = ord(byte)
        result |= (byte & 0x7F) << shift
        if not (byte & 0x80):
            break

        shift += 7

    return result

def parse_message(file, message_class):
    message_size = read_varint32(file)
    message_data = file.read(message_size)

    message = message_class()
    message.ParseFromString(message_data)

    return message

After this, we can go and parse the metadata, then loop through all the messages.

from google.protobuf.json_format import MessageToDict

# Inside the opened gzip file code.

try:
	metadata = parse_message(f, CoreDump_pb2.Metadata)
	results["metadata"] = MessageToDict(metadata)

  while True:
		node = parse_message(f, CoreDump_pb2.Node)
		results["nodes"].append(MessageToDict(node))

except EOFError:
	pass

You may notice that we're calling the MessageToDict function. This function doesn't have much overhead, and allows us to serialize our result dictionary to a JSON object string so we can dump it to a file. If you were going to do any sort of processing on the snapshot data, you'd probably omit that function and just append the message directly to avoid any sort of overhead.

Regardless, after that we'll go ahead and convert the result to a JSON object string, and save it to a file.

import json

# Outside the opened gzip file code.

print(json.dumps(results, indent=4))

Before we can run anything, we need to compile the CoreDump.proto file so we have the associated utilities available to our script. We'll make a simple Makefile to deal with this:

PROTOC ?= protoc

CoreDump_pb2.py: CoreDump.proto
	$(PROTOC) --python_out=. $<

.PHONY: all
all: CoreDump_pb2.py

And why not make a gitignore?

__pycache__/

*.fxsnapshot
*.json
CoreDump_pb2.py

Great! Now you can run the Makefile, and then the tool.

The Full Code

You can find the full code below, or on sourcehut.

import sys
import gzip
import json
import CoreDump_pb2
from google.protobuf.json_format import MessageToDict

def usage(file):
	print("fxsdump [snapshot]", file=file)

def read_varint32(file):
    shift = 0
    result = 0

    while True:
        byte = file.read(1)
        if not byte:
            raise EOFError("Unexpected end of file while reading varint32.")

        byte = ord(byte)
        result |= (byte & 0x7F) << shift
        if not (byte & 0x80):
            break

        shift += 7

    return result

def parse_message(file, message_class):
    message_size = read_varint32(file)
    message_data = file.read(message_size)

    message = message_class()
    message.ParseFromString(message_data)

    return message

def main():
	if len(sys.argv) != 2:
		usage(sys.stderr)
		exit(1)
	elif sys.argv[1] == '-h':
		usage(sys.stdout)
		return

	input_path = sys.argv[1]
	results = {'metadata': None, 'nodes': []}

	with gzip.open(input_path, 'rb') as f:
		try:
			metadata = parse_message(f, CoreDump_pb2.Metadata)
			results["metadata"] = MessageToDict(metadata)

			while True:
				node = parse_message(f, CoreDump_pb2.Node)
				results["nodes"].append(MessageToDict(node))

		except EOFError:
			pass

	with open(input_path + '.json', 'w') as f:
	   f.write(json.dumps(results, indent=4))

if __name__ == '__main__':
	main()

Let's test it! I have the original data that my friend sent me saved, so I'll test it on that:

python fxsdump.py 2450796.fxsnapshot > out.json

After a substantial amount of time (a couple of minutes, thanks Python), we get output. I've pipped it into a file so it doesn't wreak havoc.

Milo Banks

Dumping Firefox Memory Core files to JSON

The `.fxsnapshot` format

Implementation

The Full Code