One question worth asking with this acquisition: evaluation and red-teaming are the pre-deployment half of AI security. They answer "what could go wrong?"
The complementary problem is runtime enforcement: at production time, when an agent makes a tool call, is there a boundary that structurally prevents the call if it violates policy? Not a log entry after the fact, but an enforcement gate before execution.
Testing tells you what your failure modes are. Enforcement ensures the boundary holds when one of those failure modes fires in production. Most orgs treat them as the same problem. They are not.
The gap today: you can use Promptfoo to discover that your agent will exfiltrate data when prompt-injected. Then what? You harden the prompt. You add guardrails. You test again. But in production, if the agent is injected in a way the test suite did not cover, there is no structural enforcement layer stopping the action.
Curious whether the plan at OpenAI is to keep Promptfoo focused on the evaluation side, or to extend into runtime enforcement as well. The integration point between the two is where the real security leverage lives.
Happy to answer questions.
The one I'd ask if I were reading this: what happens to Promptfoo open source? We're going to keep maintaining it. The repo will stay public under the same license, we will continue to support multiple providers, and we'll keep reviewing PRs and cutting releases.
We started Promptfoo because there was no good way to test AI systems before shipping them. That turned into evals, then red teaming, then a broader security platform. We're joining OpenAI because this work has more impact closer to the model and infrastructure layers.
Ask me anything.