FRESH

Hacker News

Home

Is the doc bot docs, or not?

188 points by tobr

by emil_sorensen

4 subcomments

Docs bots like these are deceptively hard to get right in production. Retrieval is super sensitive to how you chunk/parse documentation and how you end up structuring documentation in the first place (see frontpage post from a few weeks ago: https://news.ycombinator.com/item?id=44311217).
You want grounded RAG systems like Shopify's here to rely strongly on the underlying documents, but also still sprinkle a bit of the magic of the latent LLM knowledge too. The only way to get that balance right is evals. Lots of them. It gets even harder when you are dealing with GraphQL schema like Shopify has since most models struggle with that syntax moreso than REST APIs.
FYI I'm biased: Founder of kapa.ai here (we build docs AI assistants for +200 companies incl. Sentry, Grafana, Docker, the largest Apache projects etc).

by dworks

2 subcomments

We're going to see increasingly more of these, and it's going to cause a big scandal at one point, that pops the current AI bubble. It's really obvious that you can't use non-deterministic systems this way but companies are hellbent on doing it anyway. This is why I won't take a role to implement "AI" in an existing product.

by schnable

5 subcomments

Reminds me of when I asked Gemini how to do some stuff in Google Docs App Script, and it just hallucinated the capability and code to make it work. Turns out what I wanted to do isn't supported at all.
I feel like we aren't properly using AI in products yet.

by Bewelge

3 subcomments

To be fair, for me at least, that weird chat bot only appears on https://help.shopify.com/ while the technical documentation is on shopify.dev/.
Everytime I land on help.shopify.com I get the feeling it's one of those "Doc pages for sales people". Like it's meant to show "We have great documentation and you can do all these things" but never actually explains how to do anything.
I tried that bot a couple of months ago and it was utterly useless:
question: When using discountRedeemCodeBulkAdd there's a limit to add 100 codes to a discount. Is this a limit on the API or on the discount? So can I add 100 codes to the same discount multiple times?
answer: I wasn't able to find any results for that. Can you tell me a little bit more about what you're looking for?
Telling it more did not help. To me that seemed like the bot didn't even have access to the technical documentation. Finding it hard to believe that any search engine can miss a word like discountRedeemCodeBulkAdd if it actually is in the dataset: https://shopify.dev/docs/api/admin-graphql/latest/mutations/...
So it's a bit like asking sales people technical questions.
edit: Okay, I should have tried that before commenting. They seem to have updated it. When I ask the same question now it answers correctly (weirdly in German) :
Die Begrenzung von 100 Codes bei der Verwendung von discountRedeemCodeBulkAdd bezieht sich auf die Anzahl der Codes, die Sie in einem einzelnen API-Aufruf hinzufügen können, nicht auf die Gesamtanzahl der Codes, die einem Rabatt zugeordnet werden können. Ein Rabattcode kann bis zu 20.000.000 eindeutige Rabattcodes enthalten. Daher können Sie mehrfach jeweils 100 Codes zum selben Rabatt hinzufügen, bis Sie das Limit von 20.000.000 Codes erreicht haben. Beachten Sie, dass Drittanbieter-Apps oder benutzerdefinierte Lösungen dieses Limit nicht umgehen oder erhöhen können.
~= It's a limit on the API endpoint, you can add up to 20M to a single discount.

by simonw

1 subcomments

This is a great example of the kind of question I'd love to be able to ask these documentation bots but that I don't trust them to be able to get right (yet):
> What’s the syntax, in Liquid, to detect whether an order in an email notification contains items that will be fulfilled through Shopify Collective?
I suspect the best possible implementation of a documentation bot with respect to questions like this one would be an "agent" style bot that has the ability to spin up its own environment and actually test the code it's offering in the answer before confidently stating that it works.
That's really hard to do - Robin in this case could only test the result by placing and then refunding an order! - but the effort involved in providing a simulated environment for the bot to try things out in might make the difference in terms of producing more reliable results.

by cco

0 subcomment

At CURRENT_CO we're going through another evaluation of several LLM bots you can add into your docs or Slack etc.
We've done three trials since 2023 and each time we've found them not good enough to put in front of our customers.
Usually the distribution has been about 60% good answers, 20% neutral to bad, 20% actively harmful that wastes the user's time.
Really hoping we'll see better results this time but so far nothing has beat the recommendation to add our docs to your local LLM IDE of choice (Cursor etc) and then ask it questions with your own codebase as context.

by shlomo_z

1 subcomments

> so I did my customary dance of order-refund, order-refund, order-refund. My credit card is going to get locked one of these days.
I don't know the first thing about Shopify, but perhaps you can create a free "test" item so you don't actually need to make a credit card transaction.

by bravesoul2

1 subcomments

Need a real CC to test. Right there makes me lose respect for shopify if true. Even stripe let's you test :)

by ngriffiths

1 subcomments

The doc bot goes in the same category as asking a human who has read the docs. In order of helpfulness you could get:
- "Oh yeah just write this," except the person is not an expert and it's either wrong or not idiomatic
- An answer that is reliably correct enough of the time
- An answer in the form "read this page" or quotes the docs
The last one is so much better because it directly solves the problem, which is fundamentally a search problem. And it places the responsibility for accuracy where it belongs (on the written docs).

by domk

0 subcomment

Working with Shopify is an example of something where a good mental model of how it works under the hood is often required. This type of mistake, not realising that the tag is added by an app after an order is created and won't be available when sending the confirmation email, is an easy one to make, both for a human or an LLM just reading the docs. This is where AI that just reads the available docs is going to struggle, and won't replace actual experience with the platform.

by jasonm23

0 subcomment

If you're using an LLM and you are surprised it makes things up, I don't know who to blame?
The overblown claims? The systems prompt "team" that ensure "I don't know" can never be uttered? Or user expectations?
I find it unfair to blame users, but I tend to think creating a system prompt that accentuates "with an air of certainty" over ... straightforward honesty, both telling of the org culture, and the wider trend of "look like you know"...

by trjordan

2 subcomments

The core argument here is: LLM docbots are wrong sometimes. Docs are not. That's not acceptable.
But that's not true! Docs are sometimes wrong, and even more so if you could errors of omission. From a users perspective, dense / poorly structured docs are wrong, because they lead users to think the docs don't have the answer. If they're confusing enough, they may even mislead users.
There's always an error rate. DocBots are almost certainly wrong more frequently, but they're also almost certainly much much faster than reading the docs. Given that the standard recommendation is to test your code before jamming it in production, that seems like a reasonable tradeoff.
YMMV!
(One level down: the feedback loop for getting docbots corrected is _far_ worse. You can complain to support that the docs are wrong, and most orgs will at least try to fix it. We, as an industry, are not fully confident in how to fix a wrong LLM response reliably in the same way.)

by schaum

0 subcomment

There is also https://gurubase.io/ Which is sometimes used as a kind of talk with the documentation, it claims to validate the response somehow

by ysofunny

0 subcomment

it's lossy docs.
docs with JPEG artifacts, the more you zoom, the more specific your query, the worse the noise becomes

by anentropic

0 subcomment

I would guess these narrow docs bots probably perform worse than ChatGPT et al in 'search' mode

by BossingAround

1 subcomments

It's probably docs... If it can hallucinate an answer, it's docs with probably the most infuriating UX one can imagine.
I remember being taught that no docs is better (i.e. less frustrating to the user) than bad/incorrect docs.

by goroutines

0 subcomment

sounds like a good time to plug install.md (precise step-by-step docs / guides as MCP, with simple RAG) - which I think is the right direction when paired with coding agents.

by deepdarkforest

0 subcomment

I mean that's the dirty secret of any RAG chatbot. The concept of "grounding" is arbitrary. It doesn't matter if you use embeddings, or use a tool that uses your usual search and gets the top items, like most web search tools or google's. Is still relies on the model to not hallucinate given this info, which is very hard since too much info -> model gets confused, but too little info -> model assumes the info might not be there so useless. The fine balance depends on the user's query, and all approaches like score cutoff for embeddings etc just don't generalize.
This is the same exact problem in coding assistants when they hallucinate functions or cannot find the needed dependencies etc.
There are better and more complex approaches that use multiple agents to summarize different smaller queries and then iteratively buildup etc, internally we and a lot of companies have them, but for external customer queries, way too expensive. You can't spend 30 cents on every query

0 subcomment

by nickphx

0 subcomment

Placing live orders on your card is a violation of Shopify, Shopify merchant, and card holder terms..

by TZubiri

0 subcomment

nots

by PeterStuer

3 subcomments

Confused. I just tried it in the Shopify Assistant and got:
There is no built-in Liquid property to directly detect Shopify Collective fulfillment in email notifications.
You can use the Admin GraphQL API to programmatically detect fulfillment source.
In Liquid, you must rely on tags, metafields, or custom properties that you set up yourself to mark Collective items.
If you want to automate this, consider tagging products or orders associated with Shopify Collective, or using an app to set a metafield, and then check for that in your Liquid templates.
What you can do in Liquid (email notifications):
If Shopify exposes a tag, property, or metafield on the order or line item that marks it as a Shopify Collective item, you could check for that in Liquid. For example, if you tag orders or products with "Collective", you could use:
```
  {% if order.tags contains "Collective" %}
    
  {% endif %}
```
or for line items:
```
  {% for line_item in line_items %}
    {% if line_item.product.tags contains "Collective" %}
      
    {% endif %}
  {% endfor %}
```
In the author's 'wrong' vs 'seems to work' answer, the only difference is the tag on the line items vs, the order. The flow (template? as he refers to it as 'some other cryptic Shopify process' ) he uses in his tests does seem to add the 'Shopify Collective' tag to the line items, and potentially also to the order if the whole order is Shopify Collective fullfilled, but without further info we can only guess his setup.
While using AI can always lead to non-perfect results, I feel the evidence presented here does not support the conclusion.
P.S. Given the reference to 'cryptic Shopify processes', I wonder how far the author would get with 'just the docs'.