So your question is: Why does the industry focus on reusable solutions to hard problems, over piece-meal recreating it every project? And when phased in that way, the answer is self-evident. Productivity/cost/ease.
It is an object store called Didgets (i.e. Data Widgets). Each Didget has a specific type. One type is used to hold unstructured data like a file does. These Didgets are unsurprisingly called File Didgets. Other types of Didgets can hold lists (used to create hierarchical folders, music play lists, photo albums, etc.).
Others hold sets of Key/Value pairs which are used to create a tagging system for other Didgets or columns in a relational table.
Using a variety of Didgets, I have been able to create hierarchical file systems where a simple query can find one or thousands of files instantly out of 200 million+ based on the values of any tags attached.
In the same container (called a pod), it can store tens of thousands of relational tables; each one capable of having 100,000+ columns and billions of rows.
The system is 'multi-model' so it could manage hierarchical data, relational data, graph data, or anything managed by a NoSQL system.
It is not only versatile, but is incredibly fast.
If you're using a relational DB, like SQL, as a relational database, then it gives you a lot the FS doesn't give you. If you're using a relational database as a key-value store, SQLite is 35% than the filesystem [1]
Perhaps one of the biggest users of the filesystem as a KV store is git -- (not an llm, I just wanted to use --) .git/objects/xx/xxxxx maps the sha1 file hash to the compressed data, splayed by the first 2 bytes. However git also uses a database of sorts (.git/objects/pack/....). To sum up the git pack-objects man page, it's more efficient.
https://en.wikipedia.org/wiki/ISAM
https://en.wikipedia.org/wiki/Record_Management_Services
They were more like BerkeleyDB and lacked Query Planner.
I think Oracle internally using something similar, i.e. a native filesystem optimized for an RDBMS.
https://archive.fosdem.org/2021/schedule/event/new_type_of_c...
I adapted most of it into an article for The Register:
https://www.theregister.com/2024/02/26/starting_over_rebooti...
A database is a data structure with (generally) many small items that need to be precisely updated, read and manipulated.
A lot of files don't necessarily have this access pattern (for instance rendering a large video file) ... a filesystem has a generic access pattern and is a lower level primitive than a database.
For this same reason you even have different kinds of database for different types of access patterns and data types (e.g Elasticsearch for full text search, MongoDB for JSON, Postgres for SQL)
Filesystem is generic and low-level, database is a higher order abstraction.
It turns out having a defined abstraction like a database makes things faster than having to rely on a rawer interface like filesystems because you can then reduce the number of system calls and context switches necessary. If you wanted to optimize that in your own code rather than relying on a database, you'd end up reinventing a database system of sorts, when (probably) better solutions already exist.
The advantages of a database:
* Locking - Tables applying locking conventions to prevent race conditions from multiple operations writing competing changes to the same data repository. If its only a single application that has access to the data this can be solved with queues, but locking is necessary when multiple applications write to the same data at the same time.
* API - SQL provides a grammar that many people are familiar with. In practice this all goes to shit when you have to write a bunch of stored procedures, SQL functions, or table triggers. I really don't like SQL.
* References - In RDBMSs records of tables can reference records of other tables using secondary keys that point to unique identifiers of the given other table. This is typically employed as secondary keys. This is also solved auto-magically in languages where objects are passed by reference, but that isn't the file system.
---
If your database, data system, whatever is in memory only there are very few real advantages to using something like SQL. If the data is on disk the file system is the lower level and is designed in such a way as to optimize access to the file system by something like SQL, such as with Logical Volume Manager that can create data volumes that span different hardware.
Some of my horizontally scaled services have like 500mb disk.