FRESH

Hacker News

Home

Bot or human? Creating an invisible Turing test for the internet

139 points by timshell

by imiric

11 subcomments

I applaud the effort. We need human-friendly CAPTCHAs, as much as they're generally disliked. They're the only solution to the growing spam and abuse problem on the web.
Proof-of-work CAPTCHAs work well for making bots expensive to run at scale, but they still rely on accurate bot detection. Avoiding both false positives and negatives is crucial, yet all existing approaches are not reliable enough.
One comment re:
> While AI agents can theoretically simulate these patterns, the effort likely outweighs other alternatives.
For now. Behavioral and cognitive signals seem to work against the current generation of bots, but will likely also be defeated as AI tools become cheaper and more accessible. It's only a matter of time until attackers can train a model on real human input, and inference to be cheap enough. Or just for the benefit of using a bot on a specific target to outweigh the costs.
So I think we will need a different detection mechanism. Maybe something from the real world, some type of ID, or even micropayments. I'm not sure, but it's clear that bot detection is at the opposite, and currently losing, side of the AI race.

by JimDabell

1 subcomments

This is interesting stuff, but I’d be seriously concerned about this accidentally catching people who have accessibility needs. How is it going to handle somebody using the keyboard to tab through controls instead of the mouse? Is a typing cadence detector going to flag people who use voice interfaces?

by Animats

1 subcomments

Previous CAPTCHAs were based on tasks humans could do but machines could not. The machines caught up and passed humans on those tasks. These new tasks are based on the concept that humans are dumber than AI agents, making more mistakes and showing more randomness.
It might work for a while, but that's a losing battle.

by NoMoreNicksLeft

2 subcomments

The problem has never been that some bots could eventually seem like they were human. The problem is and will continue to be that many humans (millions upon millions) look like bots.
Have you never once looked at the captcha and couldn't decide whether the 3 pixels of the motorcycle sticking out into the grid square meant that you should select that grid square too? Not once? As the tests become ever more sophisticated, more and more of you all will be locked out.

by lugu

7 subcomments

It is late and I am thinking out load. How about a reputation system where users bring proof that other websites haven't found them abusive.
Visit a website that require identification. Generate a random unique identifier in your user agent. Live your life on that site. Download from that site a certificate that prove that your didn't abuse their site. Repeat that a few times.
Visit the site that wants to know if you are an abusive user. Share your certificates. They get to choose if they accept you.
If you abuse that site, it reports the abuse to the other sites that delivered you a certificate. Those sites gets to decide if they revoke their certificate or not.
It is a self policying system that require some level of cooperation. Users make themselves vulnerable to the risk of having sites they like loose trust in them.

by logsr

5 subcomments

In a few more years there will probably be virtually no human users of web sites and apps. Everything will be through an AI agent mediation layer. Building better CAPTCHAs is interesting technically, but it is doubling down on a failed solution that nobody actually wants. What is needed is an authentication layer that allows agents to act on behalf of registered users with economic incentives to control usage. CAPTCHA has always been an economic bar only, since they are easy to farm out to human solvers, and it is a very low bar. Having an agent API with usage charges is a much better solution because it compensates operators instead of wasting the cost of solving CAPTCHAs. Maybe this will finally be the era of micro payments?

by Saris

0 subcomment

I feel like analyzing keystrokes or mouse movements is just going to punish people who use password managers that autofill for them. It does seem like I get more captchas when on sites because of that.

by qoez

5 subcomments

I totally assumed typing cadence and mouse behaviour was incorperated into bot detection for years before this already, interesting.

by dsekz

1 subcomments

Plenty of improvements to mouse movement algorithms have already been made and they’re still evolving. While the blog post and the product it introduces offer some interesting ideas, they don’t yet reach the robustness of modern anti-bot solutions and still trail current industry standards. I doubt it would take me - or any average reverse engineer - more than five seconds to bypass something like this. There are already numerous open source mouse movement libraries available; and even if they didn’t exist, writing one wouldn’t be difficult. Yes, mouse movement or keyboard data can be quite powerful in a modern anti-bot stack and an in depth analysis of it is genuinely valuable, but on its own it’s still insufficient. Relying on this data alone isn’t costly for the attacker and offers little real protection.

by curtisblaine

0 subcomment

AFAIK, reCaptcha is not based on user behaviour anymore since v3, but uses proprietary network host information from Google. See this issue opened in 2018: https://github.com/google/recaptcha/issues/235

by sly010

1 subcomments

First off, I always thought the type of things described (tracking mouse movements, keypress jitter, etc) are already done by ReCacpha to decide when to present the user with a captcha. I am surprised they are not already doing this.
Second, I am surprised AI agents are this naive. I thought they would emulate human behavior better.
In fact, just based on this article, very little effort has been put into this race on either side.
So I wonder if is has to do with the fact that if companies like google reliably filtered out bot traffic, they would loose 90% of their AD revenue. This way they have plausible deniability.

by charcircuit

0 subcomment

>How much can these behavioral patterns be spoofed? This remains an ongoing question, but the evidence to date is optimistic. Academic studies have found behavioral biometrics to be robust against attacks under adversarial conditions, and industry validation from top financial institutions demonstrates real-world resilience
I have the opposite view. This already played out in the Minecraft community and it turns out ghost clients are effective in spoofing such behavioral signals and avoiding anticheat. Also I doubt you can get any meaningful signal from the couple of a seconds a user's ai agent is scrolling through a site.

by koalaman

1 subcomments

I'm not sure reCAPTCHA is really trying to detect automated vs human interaction with a browser. The primary use-case is to detect abusive use. The distinction here is if I automate my own browser to do things for me on sites using my personal account may not be a problem for site owners, while a spam operation or reselling operation which generates thousands of false accounts using automation is a big problem that they'd want to be able to block. I think reCAPTCHA is tailored towards the latter, and for it not to block the former might be more of a feature than a bug.

by bwfan123

2 subcomments

We also need an inverse turing test. ie, detect humans pretending to be AI.
Like the case recently of builder.ai which had humans pretending to be ai.
Turing was a visionary - but even he could not imagine a time when humans pretend to be bots.

by hinkley

2 subcomments

I’ve wanted to create a wiki for a hobby for a long time, but I don’t want to get stuck in spam and abuse reports, which just becomes more of a given with each passing year.
With a hobby wiki, eventual consistency is fine. I believe ghost bans and quarantine and some sort of invisible captcha would go a long way toward my goal, but it’s hard to find invisible captcha.
There was a research project long ago that used high resolution data from keyboards to determine who was typing. The idea was not to use the typing pattern as a password, but to flag suspicious activity. To have someone walk past that desk to see if Sally hurt her arm playing tennis this weekend of if Dave is fucking around on her computer while she’s in a meeting
That’s about the level I’m looking for. Assume everyone is a bot during a probationary period and put accounts into buckets of likely human, likely bot, and unknown.
What I’d have to work out though is temporary storage for candidate edits in a way they cannot fill up my database. A way to throttle them and throw some away if they hit a limit. Otherwise it’s still a DOS attack.

by b0a04gl

0 subcomment

assume this is basically nosedive but for presence on the internet. except you don't rate anyone. your device, motion, latency, and scroll inertia get rated by some pipeline you’ll never see. and that’s what decides what version of the site you get.
> what if the turing test already runs silently across every site you open. just passive gating based on scroll cadence, mouse entropy, input lag without captcha or prompt
>what if you already failed one today. maybe your browser fingerprint was too rare, maybe your keyboard rhythm matched a bot cluster from six months ago. so the UI throttled by 200ms. or the request just 403'd.
> what if the system doesn't need to prove you're a bot. it just needs a small enough doubt to skip serving you the real content.
> what if human is no longer biological but statistical. a moving average of behavior trained on telemetry from five metro cities. everyone outside that gets misclassified.
>what if you'll never know. timeline loads emptier than someone else with explicit rejection to the content

by BobbyTables2

0 subcomment

Ironic that we are so intent on creating bots that ask and check questions unsolvable by other bots.

by instagib

0 subcomment

Anyone tried creating a new GitHub account lately? I considered using AI to help get past the proof of work requirement.
The imagery test had me click dragging the finger pointing to see if I could line up with the animal. Probably 80% success.
Audio detect two different people with heavy accents speaking vs singular people speaking was my lowest score probably 60-70%.
Music switching instruments was around 70% also.
Ultimately I passed after many attempts on the imagery detection.

by joshmarinacci

2 subcomments

I feel like we are fighting the wrong battle here. Eventually AI bot behavior online will be indistinguishable from human, but so what?! We've had teams of underpaid humans being paid to be organic bots for years now.
Whether the person interacting with your website is human or not isn't relevant anymore. What matters is what they are doing; be they human, bot, or AI agent.

by lucb1e

1 subcomments

And so what am I supposed to do if a false positive happens?
I use keyboard navigation on many pages. Using the firefox setting "search when you start typing", I don't have to hit ctrl+f to search on the page, I just type what I want to click on and press enter or ctrl+enter for a new browser tab, or press (shift+)tab to go to the nearest (previous/next) input field. When I open HN, it's muscle memory: ctrl+t (new tab) new enter (autocompletes to the domain) thr enter (go to threads page) anything new? type first few chars of username, shift+tab+tab enter to upvote. Done? Backspace to go back. View comments of a link? Type last char of a word in the link, space, and first char of next word, that's almost always unique on the page, then escape, type men, enter, to almost always activate the comment link. Or shift+tab enter instead to upvote. On the comments page, reading top-level comments is either searching for [ and then enter+f3 when I want to collapse the next one, space for page down... Don't have to take my hands off the home row
etc. on lots of website, also ones I've never visited before (it'll be slower and less habitual of course, but still: if there is text near to where I want to go, I'm typing it). I use the mouse as well, but I find it harder to use than the keys that are always in the same place, much easier to press
So will it tell me that my mouse movements don't look human enough or will I see a "Sorry, something went wrong" http 403 error and have no clue if it's tracking cookies, my IP address, that I don't use Google Chrome®, that I went through pages too fast, that I didn't come past the expected page (where a cookie gets set) but clicked on a search result directly, that I have a bank in country A but residence in country B, that I now did too many tries in figuring out which of these factors is blocking me.... I can give examples of websites where I got blocked in the last ~2 months for each of these. It's such a minefield. The only thing that always passes is proof-of-work CPU challenges, but I dread to think what poor/eco people with slow/old computers are facing. Will this "invisible" captcha (yeah, invisible until you get banned) at least tell me how I'm supposed to give my money to whatever service or webshop will use this?

by hinkley

0 subcomment

I think the real purpose of Google’s recaptcha is to punish people who have privacy settings turned on, and gather training data for AI research.

by renegat0x0

1 subcomments

So recently two things have happened. I have been banned on reddit technology, and warned on other subreddit that I behave like a bot.
Maybe it was my fault to advertise my own solution in comments.
Such behavior however triggered bot detection. I might have behaved like a NPC. So currently a human can be identified as a bot, and banned on that premise. Crazy times.
Currently I feel I must act like a human.

by illegally

0 subcomment

It's pointless, it's just a matter of time when AI agents will be able to mimic human behavior exactly (they probably already do, it's just not public).
These tests here are easily bypassable, just adding a random delay somewhere during the action phases to mimic humans, and there's already tools for mimicking human mouse movements.

by mzmzmzm

0 subcomment

All of the behavioral analysis stuff going on in the background makes me wonder if big accessibility problems are brewing. If we're looking at how naturally keystrokes are input, what does that mean for someone who uses dictation tools that generate text in chunks? Will this strategy make accessibility worse in unforeseen ways?

by Nevermark

0 subcomment

The problem becomes simpler if you turn it around.
It is getting easier and easier to create questions/problems that humans can't answer at LLM speed.
Of course, that solves a complementary problem, not the original. But in terms of instances, by any definition, the demographics are quickly moving in one direction.

by loandbehold

0 subcomment

Aren't those distinctions only work because bots aren't specifically designed to circumvent them? If you have an arms race between bots and bot detectors, eventually bots will learn to overcome them to the point that you can't distinguish human and bot.

by codedokode

1 subcomments

Recently I started getting a captcha when trying to use Google, probably because of VPN or Linux. I decided to switch to bing and duckduckgo. Dear Google, go solve your captchas yourself.

by throwaway48476

1 subcomments

Not all automation is malicious. AI promised us agents that will browse the web for us. PoW is useful in that the difficulty can be scaled to prevent egregious abuse but still lower the cost enough to allow non malicious use.

by userbinator

0 subcomment

Or you could just ask it to count how many letters are in certain words.

by thatcat

2 subcomments

If the general Internet was based on torrents, then the required upload ratio enforcement would have ensured bots contribute to the reliability rather than destabilize the infrastructure.

by randomtoast

0 subcomment

Isn’t it possible to emulate mouse movements and keypress jitter using a neural network trained on human data in order to simulate human behavior?

by protocolture

0 subcomment

I have noticed that a particular website will tell me I fail captcha half the time, until I resize my browser from a square to a rectangle.
Took me ages to figure out what its issue was.

by PaulHoule

0 subcomment

Personally I think CAPTCHAs are harmful. They defend the enshittification economy, preventing the development of tools that protect human users of the web right in a time when it is more practical than ever to develop those tools.
Even in 2009 I knew people who were using neural networks (in PHP no less!) to decode CAPTCHAs with superhuman peformance. I see the whole thing as performative as those things get in my way tens or hundreds of times a day when I browse the web as a human but in years of webcrawling they didn't give me any trouble until the last two weeks.

by kjok

1 subcomments

Solutions relying on JavaScript that runs in user-controlled browsers are vulnerable to attacks and manipulation.

by ATechGuy

0 subcomment

Please don't deploy this on the internet, it may block real users and lock them out.

by adityaagr

0 subcomment

This is a super clean research post! Absolutely loved the demos too

by avoutos

0 subcomment

Anyone know how this compares to Cloudflare Turnstile?

by xarope

0 subcomment

> Take for example the Stroop task . It's a classic psychology experiment where humans select the color a word is written it and not what the word says. Humans typically show slower responses when the meaning of a word conflicts with its color (e.g., the word "BLUE" written in green), reflecting an overriding of automatic behavior . Bots and AI agents, by contrast, are not subject to such interference and can respond with consistent speed regardless of stimuli.
So I completely disagree with this; you can train youself to completely ignore the color and just read/act on the word very fast. In fact, this is a game that people play.

by Kate5477

0 subcomment

[dead]

by chromatin

2 subcomments

"When a measure becomes a target, it ceases to be a good measure".
https://en.wikipedia.org/wiki/Goodhart%27s_law
BRB, changing the simulated latency in my bot.

by guddlickspls

0 subcomment

[dead]

by TechDebtDevin

3 subcomments

I personally work on this all day everyday, you're never going to find my crawlers, stop trying lmfao.