You want grounded RAG systems like Shopify's here to rely strongly on the underlying documents, but also still sprinkle a bit of the magic of the latent LLM knowledge too. The only way to get that balance right is evals. Lots of them. It gets even harder when you are dealing with GraphQL schema like Shopify has since most models struggle with that syntax moreso than REST APIs.
FYI I'm biased: Founder of kapa.ai here (we build docs AI assistants for +200 companies incl. Sentry, Grafana, Docker, the largest Apache projects etc).
I feel like we aren't properly using AI in products yet.
Everytime I land on help.shopify.com I get the feeling it's one of those "Doc pages for sales people". Like it's meant to show "We have great documentation and you can do all these things" but never actually explains how to do anything.
I tried that bot a couple of months ago and it was utterly useless:
question: When using discountRedeemCodeBulkAdd there's a limit to add 100 codes to a discount. Is this a limit on the API or on the discount? So can I add 100 codes to the same discount multiple times?
answer: I wasn't able to find any results for that. Can you tell me a little bit more about what you're looking for?
Telling it more did not help. To me that seemed like the bot didn't even have access to the technical documentation. Finding it hard to believe that any search engine can miss a word like discountRedeemCodeBulkAdd if it actually is in the dataset: https://shopify.dev/docs/api/admin-graphql/latest/mutations/...
So it's a bit like asking sales people technical questions.
edit: Okay, I should have tried that before commenting. They seem to have updated it. When I ask the same question now it answers correctly (weirdly in German) :
Die Begrenzung von 100 Codes bei der Verwendung von discountRedeemCodeBulkAdd bezieht sich auf die Anzahl der Codes, die Sie in einem einzelnen API-Aufruf hinzufügen können, nicht auf die Gesamtanzahl der Codes, die einem Rabatt zugeordnet werden können. Ein Rabattcode kann bis zu 20.000.000 eindeutige Rabattcodes enthalten. Daher können Sie mehrfach jeweils 100 Codes zum selben Rabatt hinzufügen, bis Sie das Limit von 20.000.000 Codes erreicht haben. Beachten Sie, dass Drittanbieter-Apps oder benutzerdefinierte Lösungen dieses Limit nicht umgehen oder erhöhen können.
~= It's a limit on the API endpoint, you can add up to 20M to a single discount.
> What’s the syntax, in Liquid, to detect whether an order in an email notification contains items that will be fulfilled through Shopify Collective?
I suspect the best possible implementation of a documentation bot with respect to questions like this one would be an "agent" style bot that has the ability to spin up its own environment and actually test the code it's offering in the answer before confidently stating that it works.
That's really hard to do - Robin in this case could only test the result by placing and then refunding an order! - but the effort involved in providing a simulated environment for the bot to try things out in might make the difference in terms of producing more reliable results.
We've done three trials since 2023 and each time we've found them not good enough to put in front of our customers.
Usually the distribution has been about 60% good answers, 20% neutral to bad, 20% actively harmful that wastes the user's time.
Really hoping we'll see better results this time but so far nothing has beat the recommendation to add our docs to your local LLM IDE of choice (Cursor etc) and then ask it questions with your own codebase as context.
I don't know the first thing about Shopify, but perhaps you can create a free "test" item so you don't actually need to make a credit card transaction.
- "Oh yeah just write this," except the person is not an expert and it's either wrong or not idiomatic
- An answer that is reliably correct enough of the time
- An answer in the form "read this page" or quotes the docs
The last one is so much better because it directly solves the problem, which is fundamentally a search problem. And it places the responsibility for accuracy where it belongs (on the written docs).
The overblown claims? The systems prompt "team" that ensure "I don't know" can never be uttered? Or user expectations?
I find it unfair to blame users, but I tend to think creating a system prompt that accentuates "with an air of certainty" over ... straightforward honesty, both telling of the org culture, and the wider trend of "look like you know"...
But that's not true! Docs are sometimes wrong, and even more so if you could errors of omission. From a users perspective, dense / poorly structured docs are wrong, because they lead users to think the docs don't have the answer. If they're confusing enough, they may even mislead users.
There's always an error rate. DocBots are almost certainly wrong more frequently, but they're also almost certainly much much faster than reading the docs. Given that the standard recommendation is to test your code before jamming it in production, that seems like a reasonable tradeoff.
YMMV!
(One level down: the feedback loop for getting docbots corrected is _far_ worse. You can complain to support that the docs are wrong, and most orgs will at least try to fix it. We, as an industry, are not fully confident in how to fix a wrong LLM response reliably in the same way.)
docs with JPEG artifacts, the more you zoom, the more specific your query, the worse the noise becomes
I remember being taught that no docs is better (i.e. less frustrating to the user) than bad/incorrect docs.
This is the same exact problem in coding assistants when they hallucinate functions or cannot find the needed dependencies etc.
There are better and more complex approaches that use multiple agents to summarize different smaller queries and then iteratively buildup etc, internally we and a lot of companies have them, but for external customer queries, way too expensive. You can't spend 30 cents on every query
There is no built-in Liquid property to directly detect Shopify Collective fulfillment in email notifications.
You can use the Admin GraphQL API to programmatically detect fulfillment source.
In Liquid, you must rely on tags, metafields, or custom properties that you set up yourself to mark Collective items.
If you want to automate this, consider tagging products or orders associated with Shopify Collective, or using an app to set a metafield, and then check for that in your Liquid templates.
What you can do in Liquid (email notifications):
If Shopify exposes a tag, property, or metafield on the order or line item that marks it as a Shopify Collective item, you could check for that in Liquid. For example, if you tag orders or products with "Collective", you could use:
{% if order.tags contains "Collective" %}
<!-- Show Collective-specific content -->
{% endif %}
or for line items: {% for line_item in line_items %}
{% if line_item.product.tags contains "Collective" %}
<!-- Show something for Collective items -->
{% endif %}
{% endfor %}
In the author's 'wrong' vs 'seems to work' answer, the only difference is the tag on the line items vs, the order. The flow (template? as he refers to it as 'some other cryptic Shopify process' ) he uses in his tests does seem to add the 'Shopify Collective' tag to the line items, and potentially also to the order if the whole order is Shopify Collective fullfilled, but without further info we can only guess his setup.While using AI can always lead to non-perfect results, I feel the evidence presented here does not support the conclusion.
P.S. Given the reference to 'cryptic Shopify processes', I wonder how far the author would get with 'just the docs'.