- Impressive work, but I'm confused on a number of fronts:
- You are serving closed models like Claude with your CTGT policy applied, yet, the way you described your method, it involves modifying internal model activations. Am I misunderstanding something here?
- Could you bake the activation interventions into the model itself rather than it being a runtime mechanism?
- Could you share the publications of the research associated with this? You stated it comes from UCSD.
- What exactly are you serving in the API? Did you select a whitelist of features to suppress you thought would be good? Which ones? Is it just the "hallucination" direction that you showcase in the benchmark? I see some vague personas, but no further control other than that. It's quite black-boxy the way you present it right now.
I don't mean this as a criticism, this looks great, I just want to understand what it is a bit better.
by alexchantavy
2 subcomments
- > they mimic common misconceptions found on the internet (e.g. "chameleons change color for camouflage")
Wait what, what do chameleons actually change color for then?? TIL.
---
So if I understand correctly, you take existing models, do fancy adjustments to them so that they behave better, and then sell access to that?
> These are both applications where Fortune 500 companies have utilized our technology to improve subpar performance from existing models, and we want to bring this capability to more people.
Can you share more examples on how your product (IIUC, a policy layer for models) is used?
by serjester
1 subcomments
- Congrats on the launch - you're value-add is quite confusing as someone that's at the applied AI layer. This comes off as more of a research project than a business. You're going to need an incredibly compelling sales pitch for me to send my data to an unknown vendor to fix a problem that might be obviated by the next model release (or just stronger evals with prompt engineering). Best of luck.
- Can you share more about the challenges ran into on the benchmarking? According to the benchmark note, Claude 4.5 Opus and Gemini 3 Pro Preview exhibited elevated rejection and were dropped from TruthfulQA without further discussion. To me this begs the questions, does this indicated that frontier closed SOTA model will likely not allow this approach in the future (ie in the process of screening for potential attack vectors) and/or that this approach will only be limited to a certain LLM architecture? If it’s an architecture limitation, it’s worth discussing chaining for easier policy enforcement.
- So if I understand, this is basically advanced activation steering as a service? And you have already identified vectors for several open models that make them more truthful or better at reasoning and apply them automatically?
Because the API has a persona option which might be achieved with something like this https://github.com/Mihaiii/llm_steer or maybe for closed models you just have to append to the prompt.
What open source models are available? In the docs I only see mention of Google Flash Lite or something which is closed.
- Are you not concerned that model creation companies will bake this into their next model? I am trying to understand business model.
Another question is how you would claim credit. People believe the quality of the end result depends only on the model, with serving only responsible for speed.
by Python3267
2 subcomments
- --I was able to jailbreak it--
https://playground.ctgt.ai/c/5028ac78-1fa4-4158-af73-c9089cb...
Nevermind That was the ungoverned version of gemini, their models worked.
by kraddypatties
1 subcomments
- Running into "no healthy upstream" when navigating to the link -- hug of death maybe?
- The link sends me to a Chat UI with no context about the product. An intro or walkthrough would be useful.
- Why not apply changes to the underlying model so that you crush every available eval?
by GuinansEyebrows
1 subcomments
- do you see the looming butlerian jihad as a challenge to your business model?
by rrr_oh_man
0 subcomment
- > where the fallout
Heh.