by shevy-java
16 subcomments
- > This is a search tool that will only return content created before ChatGPT's first public release on November 30, 2022.
The problem is that Google's search engine - but, oddly enough, ALL search engines - got worse before that already. I noticed that search engines got worse several years before 2022. So, AI further decreased the quality, but the quality had a downwards trend already, as it was. There are some attempts to analyse this on youtube (also owned by Google - Google ruins our digital world); some explanations made sense to me, but even then I am not 100% certain why Google decided to ruin google search.
One key observation I made was that the youtube search, was copied onto Google's regular search, which makes no sense for google search. If I casually search for a video on youtube, I may be semi-interested in unrelated videos. But if I search on Google search for specific terms, I am not interested in crap such as "others also searched for xyz" - that is just ruining the UI with irrelevant information. This is not the only example, Google made the search results worse here and tries to confuse the user in clicking on things. Plus placement of ads. The quality really worsened.
- somebody said once we are mining "low-background tokens" like we are mining low-background (radiation) steel post WW2 and i couldnt shake the concept out of my head
(wrote up in https://www.latent.space/i/139368545/the-concept-of-low-back... - but ironically repeating something somebody else said online is kinda what i'm willingly participating in, and it's unclear why human-origin tokens should be that much higher signal than ai-origin ones)
- Somewhat related, the leaderboard of em-dash users on HN before ChatGPT:
https://www.gally.net/miscellaneous/hn-em-dash-user-leaderbo...
by keiferski
6 subcomments
- Projects like this remind me of a plot point in the Cyberpunk 2077 game universe. The "first internet" got too infected with dangerous AIs, so much so that a massive firewall needed to be built, and a "new" internet was built that specifically kept out the harmful AIs.
(Or something like that: it's been awhile since I played the game, and I don't remember the specific details of the story.)
It makes me wonder if a new human-only internet will need to be made at some point. It's mostly sci-fi speculation at this point, and you'd really need to hash out the details, but I am thinking of something like a meatspace-first network that continually verifies your humanity in order for you to retain access. That doesn't solve the copy-paste problem, or a thousand other ones, but I'm just thinking out loud here.
- besides for training future models, is this really such a big deal? most of the AI-gened text content is just replacing content-farm SEO-spam anyway. the same stuff that any half-awares person wouldn't have read in the past is now slightly better written, using more em dashes and instances of the word "delve". if you're consistently being caught out by this stuff then likely you need to improve your search hygiene, nothing so drastic as this
the only place I've ever had any issue with AI content is r/chess, where people love to ask ChatGPT a question and then post the answer as if they wrote it, half the time seemingly innocently, which, call me racist, but I suspect is mostly due to the influence of the large and young Indian contingent. otherwise I really don't understand where the issue lies. follow the exact same rules you do for avoiding SEO spam and you will be fine
by potato-peeler
0 subcomment
- You don’t need an extension to do this. Simply add a “before:” search filter to your search query, eg - https://www.google.com/search?q=Happiness+before%3A2022
by themanmaran
1 subcomments
- The low-background steel of the internet
https://en.wikipedia.org/wiki/Low-background_steel
by softwaredoug
2 subcomments
- The other day I was researching with ChatGPT.
* ChatGPT hallucinated an answer
* ChatGPT put it in my memory, so it persisted between conversations
* When asked for a citation, ChatGPT found 2 AI created articles to back itself up
It took a while, but I eventually found human written documentation from the organization that created the technical thingy I was investigating.
This happens A LOT for topics on the edge of knowledge easily found on the Web. Where you have to do true research, evaluate sources, and make good decisions on what you trust.
- For images, https://same.energy is a nice option that, being abandoned but still functioning since a few years, seems to naturally not have crawled any AI images. And it’s all around a great product.
by GaryBluto
1 subcomments
- Why use this when you can use the before: syntax on most search engines?
- google results were already 90% SEO crap long before ChatGPT
just use Kagi and block all SEO sites...
- Most of college courses and school books haven't changed in decades. Some reputed college keep courses for Pascal and Fortran instead of Python or Java, just because, it might affect their reputation of being classical or pure or to match their campus buildings style.
- FWIW Mojeek (an organic search engine in the classic sense) can do this with the before: operator.
https://www.mojeek.com/search?q=britney+spears+before%3A2010...
by Bad_Initialism
0 subcomment
- How about a search engine that only returns what you searched for, and not a million other unrelated things that it hopes you might like to buy?
This goes for you, too, website search.
by anticensor
1 subcomments
- You should call it Predecember, referring to the eternal December.
- If I want dead information I'll go find a newspaper. This is kind of silly. Even if AI rewrites the entire internet - we aren't going to live in a time capsule.
Plus, the AI already read everything made before 2023, so what does it matter?
Creatives need to think a bit bigger with this particular issue.
- Does this filter out traditional SEO blogfarms?
- I don't know how this works under the hood but it seems like no matter how it works, it could be gamed quite easily.
- Just the other evening, as my family argued about whether some fact was or was not fake, I detached from the conversation and began fantasizing about whether it was still possible to buy a paper encyclopedia.
by Barathkanna
1 subcomments
- I didn’t know “eccentric engineering” was even a term before reading this. It’s fascinating how much creativity went into solving problems before large models existed. There’s something refreshing about seeing humans brute force the weird edges of a system instead of outsourcing everything to an LLM.
It also makes me wonder how future kids will see this era. Maybe it will look the same way early mechanical computers look to us. A short period where people had to be unusually inquisitive just to make things work.
by DontForgetMe
0 subcomment
- This is an imperfect search extension.
It's a hell of a lot better than nothing, if one is using chrome or Firefox (neither of which are my primary browsers).
by throwawayk7h
0 subcomment
- I noticed AI-generated slop taking over google search results well before ChatGPT. So I don't agree with the premise on this site that you can be "you can be sure that it was written or produced by the human hand."
- It doesn't really work. I tried my website and it shows up, while definitely being built after 2023. There is a mistake in the metadata of the page that shows it as from 2011.
https://audiala.com/changelog
by defraudbah
2 subcomments
- ChatGPT also returns content only created before ChatGPT release, which is why I still have to google damn it!
- I hope there's an uncensored version of the Internet Archive somewhere, I wish I could look at my website ca. 2001, but I think it got removed because of some fraudulent DMCA claim somewhere in the early 2010s.
- > This is a search tool that will only return content created before ChatGPT's first public release on November 30, 2022.
How does it do that? At least Google seems to take website creation date metadata at face value.
- For a while I've been saying it's a pity we hadn't been regularly trusted-timestamping everything before that point as a matter of course.
- Not affiliated, but I've been using kagi's date range filter to similar effect. The difference in results for car maintenance subjects is astounding (and slightly infuriating).
by stocksinsmocks
0 subcomment
- I really thought this was going to be the Dewey Decimal system. Exclude sources from this century. It’s the only way to be sure.
by phplovesong
1 subcomments
- The slop is getting worse, as there is so much llm generated shit online, now new models are getting trained on the slop. Slop training slop, and slop. We have gone full circle just in a matter of a few years.
by RomanPushkin
0 subcomment
- For that purpose I do not update my book on LeanPub about Ruby. I just know one day people gonna read it more, because human-written content would be gold.
- Of course my first thought was: Let's use this as a tool for AI searches (when I don't need recent news).
by josephjrobison
0 subcomment
- The real gold is content created before the internet!
- In hindsight, that would've been a real utility use case for NFTs. A decentralized cryptographic prove that some content existed in a particular form at a particular moment.
- Something generated by humans does not mean high quality.
- so it's a filter by date and you chose the chatgpt's public release?
- This is such a great idea
by cryptozeus
0 subcomment
- technically you can ask chatgpt to return the same result by asking it to filter by year
- I'm grateful that I published a large body of content pre-ChatGPT so that I have proof that I'm not completely inarticulate without AI.
by erikpukinskis
0 subcomment
- Interesting concept. As a side benefit this would allow you to make steady progress fighting SEO slop as well, since there can be no arms race if you are ignoring new content.
You could even add options for later cutoffs… for example, you could use today’s AIs to detect yesterday’s AI slop.
by 1vuio0pswjnm7
0 subcomment
- "This browser extension uses the Google search API to only return content published before Nov 30th, 2022 so you can be sure that it was written or produced by the human hand."
by micromacrofoot
0 subcomment
- What kind of heuristics does it use to determine age? a lot of content on Google actually backdates for some reason... presumably some sort of SEO scam?
- Can't we just append "before:2021-01-01" to Google?
I use this to find old news articles for instance.
- I mean I get it, but it seems a bit silly. What's next - an image search engine that only returns images created before photoshop?
by diavarlyani
0 subcomment
- We now need an extension to hide 3 years of the internet because it was written by robots.
This timeline is undefeated.
by 2OEH8eoCRo0
0 subcomment
- low-background information
- This tool has no future. We have that in common with it, I fear.
What we really need to do is build an AI tool to filter out the AI automatically. Anybody want to help me found this company?
- [dead]
by ListAndFuse
0 subcomment
- [dead]
- [dead]
- [dead]
- [dead]
- [flagged]
- [flagged]