- You guys are talking about copyright but I think a bigger takeaway is there is a process breakdown at Microsoft. Nobody is reading or reviewing these documentation so what hope is there that anybody is reading or reviewing their new code?
I guess the question to leadership is that two of the three pillars , namely security and quality are at odds with the third pillar— AI innovation. Which side do you pick?
(I know you mean well and I love you, Scott Hanselman but please don't answer this yourself. Please pass this on to the leadership.)
- The real cherry on top, is that the Microsoft link from the blog post by the Microsoft senior product manager goes to a Kaggle dataset page claiming the dataset is CC0: Public Domain.
https://www.kaggle.com/datasets/shubhammaindola/harry-potter...
More than just using the data, it seems linking to a copy that claims the dataset is public domain, would be problematic copyright-wise.
Also interesting, this blog post has been up since November of 2024,
very surprising to me that Microsoft hasn't taken it down yet.
- Update: Microsoft has taken the page down. But posterity being what it is...
https://archive.is/D9vEN
by beached_whale
1 subcomments
- The AI generated thumbnail, https://devblogs.microsoft.com/azure-sql/wp-content/uploads/..., is that of young Harry and friend with a prominent MS logo. Wow
by protocolture
1 subcomments
- It doesnt offer a guide to piracy, it offers a guide on including specific data from a dataset into SQL so it can be referenced by an LLM.
If anything Kaggle would be on the hook for including the data as CC0. Or perhaps to Shubham Maindola for uploading it. In fact the "provenance" listed would give me chills. Crazy how this got a 10.0 score. "I downloaded the ebooks of Harry Potter. Then converted them to txt files."
by andsoitis
3 subcomments
- This article is from 2024 and points to Kaggle, which hosts the data set.
I'm surprised that JKR's people haven't come down like a tonne of bricks on Kaggle / Microsoft.
Does anyone know whether there is some special reason why this has lasted so long without being taken down?
by throwaway150
2 subcomments
- Page is gone.
Archived copy: https://web.archive.org/web/20260105115129/https://devblogs....
It is very worrying that people with no ethics work for these trillion dollar companies who are supposed to be shaping the technology of tomorrow.
- In case the page disappears:
https://archive.is/7WLho
- Since IP law is apparently dead, does anyone want to invest in my ai generated novel startup where it just spits out Harry Potter verbatim but uses a bunch of power to do so.
by waffletower
3 subcomments
- I have hated Microsoft for decades and am somewhat of an extremist when it comes to avoiding their products. That being said, this piracy shaming headline for a Microsoft research project example, not a product integration, is entirely misleading and hysterical. The lengths that stooges will go to protect copyright monopolies and eradicate fair use is also extreme and should be embarrassing.
- I feel like the title is a bit misleading, unless the person who put all HP books on Kaggle as a (supposedly) CC0-licensed data set did so as a Microsoft employee.
Nevertheless pretty egregious oversight (incompetence?) and something that shouldn't have been published.
by electronsoup
1 subcomments
- I guess the end of copyright is near if this is fine to put on a corporate website
by 8cvor6j844qw_d6
0 subcomment
- Looks like the unwritten stance of large companies is copyrighted works are free to use for training.
Although this seems is not reciprocal. Rule for thee, but not for me.
- Original title: "LangChain Integration for Vector Support for SQL-based AI applications"
- How Microsoft protects its own IP:
https://news.microsoft.com/source/2004/02/12/statement-from-...
In case the new anti-copyright Microslop memory-holes that link:
https://web.archive.org/web/20260215220230/https://news.micr...
The tutorial could have used that leaked source code for "educational purposes", as many here claim.
- I recall the source code for Windows XP was leaked some years ago; not just isolated parts of the code base, like with the earlier Windows NT4/2000 source code leak, but a completely buildable repository.
If I write an article on training an LLM on the leaked Windows XP source code, blithely mark the source code repo as in 'the public domain', but used Azure resources for the how-to steps, would that would make it OK Microsoft? You know, your Azure division might get some money...
Seriously, this is just so...blatant. It's like we've all collectively decided that copyright just doesn't matter anymore. Just readin this article, I feel like I'm taking crazy pills.
- How soon before someone will be able to make an online library which generates the original books using LLMs? Surely popular titles like Harry Potter may end up so well represented in the training that we'll get the full books out of the LLM with a close to 100% accuracy?
- I thought it was exaggerated but reading the archive, yeah that’s something that should not pass even glancing over by public communication person, or even like any manager like senior product manager…
by til_something
2 subcomments
- I can still get to the article on the site, perhaps it’s cached in the CDN somewhere.
Also, reviewing the repo the full entire article is there which promotes the same silly things.
https://github.com/Azure-Samples/azure-sql-db-vector-search/...
- Jupyter notebook version here for the curious: https://github.com/Azure-Samples/azure-sql-db-vector-search/...
- My guess is HP makes such an enormous amount of money already from movies, games, toys, and other tie-ins, that they can't be bothered to chase down the odd digital infringement of a plain text copy of the original books.
I'm sure the scripts of Star Wars would be similarly ignored if they were used.
- Well this is entirely believable for what Microsoft is up to nowadays. Copyright law and IP need not apply in 2026.
- I... There are parts of the world where certain developers don't understand the way the west tends to work with regard to copyright, or not blindly copying anything that is out there.
This however is a very, VERY poor situation when you end up placing your employer at risk because you think copyright doesn't matter and everything on the internet is fair game.
This is probably the most polite way I would describe this to most, UG. For the rest, jus stop acting like cheating through a situation to get a step up is the norm, it's just dirty behaviour.
- Wonderful 404 page. Wonder if Kai Lentit optimized it.
- So if you read books to learn how to write books then you're "pirating"?
by starkeeper
0 subcomment
- They tore the page down any copies?
- “Fair use” allows for educational usage of copyrighted material. Technically it probably is not fair use as Microsoft isn’t an educational institution or a nonprofit.
But come on … these guides really are for learning purposes. Doesn’t seem like a big deal to me at all. They aren’t even hosting it, just pointing to kaggle who is hosting it.
On principle copyright law should allow this kind of learning use case anyway.
by wewewedxfgdf
0 subcomment
- Refreshingly honest.
- If copyrighted materials are used, surely copyright allows for the maker to require disclosure that their content was used in training a model.
- "but it's fair use"
Rowling is known for actively protecting her rights as an author, they couldn't have picked a worse author to slop up
by outside1234
0 subcomment
- I mean they are also offering up the code you are writing in your private repos to LLMs to regenerate in my repo, so let's just go nuts.
by thehamkercat
1 subcomments
- It's taken down lmao, in 1 hour
by ErroneousBosh
0 subcomment
- On the one hand, Piracy Bad Especially By LLMs, but on the other hand, JK Rowling Worse In Every Possible Way.
by ThrowawayTestr
0 subcomment
- Absolutely shameless
by conartist6
0 subcomment
- What in the absolute fuck
- Someone forgot the national no snitching rules, and in service of Jo, no less.
Everyone should torrent and rip off those books, anyway.
- I guess legal was a part of the layoff these past few years. Too bad we can't get a bounty from the RIAA of books, whatever that is
by charcircuit
1 subcomments
- This is fair use as it is for educational purposes and not for reading.