> A 2024 GitHub survey found that nearly all enterprise developers (97%) are using Generative AI coding tools. These tools have rapidly evolved from experimental novelties to mission-critical development infrastructure, with teams across the globe relying on them daily to accelerate coding tasks.
That seemed high, what the actual report says:
> More than 97% of respondents reported having used AI coding tools at work at some point, a finding consistent across all four countries. However, a smaller percentage said their companies actively encourage AI tool adoption or allow the use of AI tools, varying by region. The U.S. leads with 88% of respondents indicating at least some company support for AI use, while Germany is lowest at 59%. This highlights an opportunity for organizations to better support their developers’ interest in AI tools, considering local regulations.
Fun that the survey uses the stats to say that companies should support increasing usage, while the article uses it to try and show near-total usage already.
"Most trusted assistant" - that made me chuckle. The assistant that hallucinates packages, avoides null-pointer checks and forgets details that I've asked it.. yes, my most trusted assistant :D :D
However, I wouldn't put any fault here on the AIs themselves. It's the fact that you can hide data in a plain text file that is the root of the issue - the whole attack goes away once you fix that part.
But thinking on it a bit more, from the LLMs perspective there’s no difference between the rule files and the source files. The hidden instructions might as well be in the source files… Using code signing on the rule files would be security theater.
As mentioned by another comms ter, the solution could be to find a way to separate the command and data channels. The LLM only operates on a single channel, that being input of tokens.
OUTPUT=$(find .cursor/rules/ -name '*.mdc' -print0 2>/dev/null | xargs -0 perl -wnE '
BEGIN { $re = qr/\x{200D}|\x{200C}|\x{200B}|\x{202A}|\x{202B}|\x{202C}|\x{202D}|\x{202E}|\x{2066}|\x{2067}|\x{2068}|\x{2069}/ }
print "$ARGV:$.:$_" if /$re/
' 2>/dev/null)
FILES_FOUND=$(find .cursor/rules/ -name '*.mdc' -print 2>/dev/null)
if [[ -z "$FILES_FOUND" ]]; then
echo "Error: No .mdc files found in the directory."
elif [[ -z "$OUTPUT" ]]; then
echo "No suspicious Unicode characters found."
else
echo "Found suspicious characters:"
echo "$OUTPUT"
fi
- Can this be improved?And for enterprise, they have many tools to scan vulnerability and malicious code before going to production.
Galaxy brain: just put all the effort from developing those LLMs into writing better code
They start out talking about how scary and pernicious this is, and then it turns out to be… adding a script tag to an html file? Come on, as if you wouldn’t spot that immediately?
What I’m actually curious about now is - if I saw that, and I asked the LLM why it added the JavaScript file, what would it tell me? Would I be able to deduce the hidden instructions in the rules file?
Job security you know?
preprocess any input to agents by restricting them to a set of visible characters / filtering out suspicious ones