I’ve been working with ML infrastructure for a while and realized there’s a gap in the security posture: we scan our requirements.txt for vulnerabilities, but blindly trust the 5GB binary model files (.pt) we download from Hugging Face.
Most developers don't realize that standard PyTorch files are just Zip archives containing Python Pickle bytecode. When you run torch.load(), the unpickler executes that bytecode. This allows for arbitrary code execution (RCE) inside the model file itself - what security researchers call a "Pickle Bomb."
I built AIsbom (AI Software Bill of Materials) to solve this without needing a full sandbox.
How it works: 1. It inspects the binary structure of artifacts (PyTorch, Pickle, Safetensors) without loading weights into RAM. 2. For PyTorch/Pickles, it uses static analysis (via pickletools) to disassemble the opcode stream. 3. It looks for GLOBAL or STACK_GLOBAL instructions referencing dangerous modules like os.system, subprocess, or socket. 4. It outputs a CycloneDX v1.6 JSON SBOM compatible with enterprise tools like Dependency-Track. 5. It also parses .safetensors headers to flag "Non-Commercial" (CC-BY-NC) licenses, which often slip into production undetected.
It’s open source (Apache 2.0) and written in Python/Typer. Repo: https://github.com/Lab700xOrg/aisbom Live Demo (Web Viewer): https://aisbom.io
Why I built a scanner? https://dev.to/labdev_c81554ba3d4ae28317/pytorch-models-are-...
I’d love feedback on the detection logic (specifically safety.py) or if anyone has edge cases of weird Pickle protocols that break the disassembler.
[1]: https://github.com/Lab700xOrg/aisbom/blob/main/aisbom/safety...
I somehow doubt this tool is going to be able to pull off what Java bytecode verification could not.