NOTE: You can find the full code here.
For some reason, one of my friends has been having some issues with Firefox on Asahi Linux. The Discord tab keeps leaking memory, making the browser (and indeed the entire system, really) nearly unusable. As part of a troubleshooting effort, I asked for them to send me the dump file produced by the “Memory” tab in Firefox. I didn't know what data would be contained inside of this, but I was expecting something a little more detailed than about:memory. Instead, what I got is a giant tree of opaque pointers. Still, it figured I'd write about it as it's still probably useful for someone.
Firefox includes a tool to parse and look into these files, but it can't be built unless you build it from the Firefox source tree, and I've never been able to get it building. It doesn't help that the only existing 3rd party tool to do this errored out for me, supposedly because the snapshot file was too large. I subsequently built my own tool.
The .fxsnapshot
format
The .fxsnapshot format is just a .gz archive that contains some raw binary data. The data is encoded as the length of a message, followed by the binary representation of a message. The binary data of each individual message is some Google Protocol Buffer data, the schema for which can be found in the Firefox source tree. The CoreDump.proto handily provides a specification of the file, which is found below because I'm nice:
+-----------------------------------------------------------------------+
| Varint32: The size of following `Metadata` message. |
+-----------------------------------------------------------------------+
| message: The core dump `Metadata` message. |
+-----------------------------------------------------------------------+
| Varint32: The size of the following `Node` message. |
+-----------------------------------------------------------------------+
| message: The first `Node` message. This is the root node. |
+-----------------------------------------------------------------------+
| Varint32: The size of the following `Node` message. |
+-----------------------------------------------------------------------+
| message: A `Node` message. |
+-----------------------------------------------------------------------+
| Varint32: The size of the following `Node` message. |
+-----------------------------------------------------------------------+
| message: A `Node` message. |
+-----------------------------------------------------------------------+
| . |
| . |
| . |
+-----------------------------------------------------------------------+
Implementation
Now let's write a tool to deal with this format, and we'll do it in Python for simplicity. Let's scaffold the program:
import sys
def usage(file):
print("fxsdump [snapshot]", file=file)
def main():
if len(sys.argv) != 2:
usage(sys.stderr)
exit(1)
elif sys.argv[1] == '-h':
usage(sys.stdout)
return
input_path = sys.argv[1]
results = {'metadata': None, 'nodes': []}
# ...
if __name__ == '__main__':
main()
Great! Let's go ahead and read the file and ungzip it.
import gzip
# Inside main().
with gzip.open(input_path, 'rb') as f:
# ...
Before we do much else, we need to define a function to read the message defined in the length-data format used in the snapshot file.
import CoreDump_pb2
def read_varint32(file):
shift = 0
result = 0
while True:
byte = file.read(1)
if not byte:
raise EOFError("Unexpected end of file while reading varint32.")
byte = ord(byte)
result |= (byte & 0x7F) << shift
if not (byte & 0x80):
break
shift += 7
return result
def parse_message(file, message_class):
message_size = read_varint32(file)
message_data = file.read(message_size)
message = message_class()
message.ParseFromString(message_data)
return message
After this, we can go and parse the metadata, then loop through all the messages.
from google.protobuf.json_format import MessageToDict
# Inside the opened gzip file code.
try:
metadata = parse_message(f, CoreDump_pb2.Metadata)
results["metadata"] = MessageToDict(metadata)
while True:
node = parse_message(f, CoreDump_pb2.Node)
results["nodes"].append(MessageToDict(node))
except EOFError:
pass
You may notice that we're calling the MessageToDict
function. This function
doesn't have much overhead, and allows us to serialize our result dictionary to
a JSON object string so we can dump it to a file. If you were going to do any
sort of processing on the snapshot data, you'd probably omit that function and
just append the message directly to avoid any sort of overhead.
Regardless, after that we'll go ahead and convert the result
to a JSON object
string, and save it to a file.
import json
# Outside the opened gzip file code.
print(json.dumps(results, indent=4))
Before we can run anything, we need to compile the CoreDump.proto
file so we
have the associated utilities available to our script. We'll make a simple
Makefile to deal with this:
PROTOC ?= protoc
CoreDump_pb2.py: CoreDump.proto
$(PROTOC) --python_out=. $<
.PHONY: all
all: CoreDump_pb2.py
And why not make a gitignore?
__pycache__/
*.fxsnapshot
*.json
CoreDump_pb2.py
Great! Now you can run the Makefile, and then the tool.
The Full Code
You can find the full code below, or on sourcehut.
import sys
import gzip
import json
import CoreDump_pb2
from google.protobuf.json_format import MessageToDict
def usage(file):
print("fxsdump [snapshot]", file=file)
def read_varint32(file):
shift = 0
result = 0
while True:
byte = file.read(1)
if not byte:
raise EOFError("Unexpected end of file while reading varint32.")
byte = ord(byte)
result |= (byte & 0x7F) << shift
if not (byte & 0x80):
break
shift += 7
return result
def parse_message(file, message_class):
message_size = read_varint32(file)
message_data = file.read(message_size)
message = message_class()
message.ParseFromString(message_data)
return message
def main():
if len(sys.argv) != 2:
usage(sys.stderr)
exit(1)
elif sys.argv[1] == '-h':
usage(sys.stdout)
return
input_path = sys.argv[1]
results = {'metadata': None, 'nodes': []}
with gzip.open(input_path, 'rb') as f:
try:
metadata = parse_message(f, CoreDump_pb2.Metadata)
results["metadata"] = MessageToDict(metadata)
while True:
node = parse_message(f, CoreDump_pb2.Node)
results["nodes"].append(MessageToDict(node))
except EOFError:
pass
with open(input_path + '.json', 'w') as f:
f.write(json.dumps(results, indent=4))
if __name__ == '__main__':
main()
Let's test it! I have the original data that my friend sent me saved, so I'll test it on that:
python fxsdump.py 2450796.fxsnapshot > out.json
After a substantial amount of time (a couple of minutes, thanks Python), we get output. I've pipped it into a file so it doesn't wreak havoc.