Which database system is the most in the spirit of FreeBSD?

dbdemon · Dec 15, 2021

blind0ne said:
Are there any working in "production" websites, services on the web that is using such approach as here is mentioned in?

I would be surprised if this was used for anything particularly serious in production. It's a "hack" that demonstrates the power of the Unix command-line tools.

Given the hesitance I'm detecting in this thread to embracing more fully-featured SQL DB systems (although honourable mentions of DB systems such as Postgres), is there a sense that such DB systems are contrary to the design principles/philosophy behind FreeBSD / Unix? Yes, I understand the attitude that you want to use the right tool for the job, and not necessarily choose a tool which is overkill and would introduce unnecessary bloat. However, I'm thinking about how e.g. MySQL/MariaDB can be configured to circumvent the OS and access storage devices directly, and more generally how more fully-featured DB systems tend to replace multiple features already present in the OS with their own, purportedly more performant features. Am I on to something here?

hruodr · Dec 15, 2021

dbdemon said:
I'm thinking about how e.g. MySQL/MariaDB can be configured to circumvent the OS and access storage devices directly, and more generally how more fully-featured DB systems tend to replace multiple features already present in the OS with their own, purportedly more performant features.

BTW, see:

35% Faster Than The Filesystem

hardworkingnewbie · Dec 15, 2021

dbdemon: The point is that your original question right from the beginning doesn't make a lot of sense at all, because what exactly does "created, developed and/or designed more "in the spirit" of FreeBSD?" mean? That's an absolutely wishy-washy phrase, sounds good and totally undefined by yourself, so everybody has a different idea on mind what this should entail, or not.

So you are getting a broad variety of different answers. So this is why you are not on to something here, because your whole question is really muddy.

If it's embedded SQL storage, then SQLite. For everything else - Postgres. End of story.

dbdemon · Dec 15, 2021

hardworkingnewbie said:
dbdemon: The point is that your original question right from the beginning doesn't make a lot of sense at all, because what exactly does "created, developed and/or designed more "in the spirit" of FreeBSD?" mean? That's an absolutely wishy-washy phrase, sounds good and totally undefined by yourself, so everybody has a different idea on mind what this should entail, or not.

So you are getting a broad variety of different answers. If you want better answers, then you should ask a better question from the start!

The question is deliberately a little vague exactly because I would like to see a variety of answers and reasons.

astyle · Dec 15, 2021

dbdemon said:
I would be surprised if this was used for anything particularly serious in production. It's a "hack" that demonstrates the power of the Unix command-line tools.

Given the hesitance I'm detecting in this thread to embracing more fully-featured SQL DB systems (although honourable mentions of DB systems such as Postgres), is there a sense that such DB systems are contrary to the design principles/philosophy behind FreeBSD / Unix? Yes, I understand the attitude that you want to use the right tool for the job, and not necessarily choose a tool which is overkill and would introduce unnecessary bloat. However, I'm thinking about how e.g. MySQL/MariaDB can be configured to circumvent the OS and access storage devices directly, and more generally how more fully-featured DB systems tend to replace multiple features already present in the OS with their own, purportedly more performant features. Am I on to something here?

No, you're not on to anything. A proper RDBMS (like MySQL/MariaDB/PostreSQL) is just a very separate technology from an OS. You can have an Oracle install running on a UNIX machine or on a Windows server, and the design of the Oracle RDBMS will be exactly the same in either case.
--
There's a BIG difference between text-processing tools (which at best should not try to process more than 10MB of relatively free-form text) and managing terabytes of well-organized data (which is what RDBMS systems are for). ?

ralphbsz · Dec 15, 2021

eternal_noob said:
I never thought bad about Kirk but this is weird. (If it's true).

Don't worry, it's nothing weird or indecent. In the mid- or late 90s, Kirk was one of the first people to have a personal web site, and it included some home automation, long before such things were fashionable or easy. And his web site had "real time" tracking of the temperature of his wine cellar (which is very important when wanting to drink wine) and of the hot tub. Those web pages are long gone. Today, we're more concerned about privacy, so for example at our house I can monitor the pressure of our water supply and the fill level of our water tanks, but that information is not on the public web:

Screen Shot 2021-12-15 at 9.48.07 AM.png

dbdemon said:
However, I'm thinking about how e.g. MySQL/MariaDB can be configured to circumvent the OS and access storage devices directly, and more generally how more fully-featured DB systems tend to replace multiple features already present in the OS with their own, purportedly more performant features. Am I on to something here?

That is a long-running debate. Since time immemorial, databases have used their own storage management, often bypassing the OSes facilities (file systems in particular). This goes back to the mainframes of the 1960s. To some extent, this is good, because real database experts can tune things incredibly well. To some extent, it is bad, because modern storage systems are very fast, and often running a database on top of storage software (file systems, volume managers, RAID, object storage, ...) is actually faster than going directly to hardware. And when using the full-featured storage stack, you get things like snapshots, backups, automated management (space allocation). I think the consensus is that today, databases that bypass the storage software stack are making a short-sighted mistake, but that is sometimes debated, and not true for all situations.

Jose · Dec 15, 2021

ralphbsz said:
That is a long-running debate. Since time immemorial, databases have used their own storage management, often bypassing the OSes facilities (file systems in particular). This goes back to the mainframes of the 1960s. To some extent, this is good, because real database experts can tune things incredibly well. To some extent, it is bad, because modern storage systems are very fast, and often running a database on top of storage software (file systems, volume managers, RAID, object storage, ...) is actually faster than going directly to hardware. And when using the full-featured storage stack, you get things like snapshots, backups, automated management (space allocation). I think the consensus is that today, databases that bypass the storage software stack are making a short-sighted mistake, but that is sometimes debated, and not true for all situations.

My first serious professional mistake was to newfs the raw partitions the Oracle database was using on a Sun server. The Anderson consultants were very annoyed with me.

My checkered past includes some Oracle experience. Oracle went back-and-forth bewteen "use our storage management system" and "use the filesystem" at least twice.

Alain De Vos · Dec 15, 2021

My biggest mistakes where not professional. Offcourse i cannot elaborate. But hey, it was a lesson !
And on postgresql : Pipe vs Socket vs IPC.

dbdemon · Mar 22, 2022

This thread/topic has been on my mind for some time even if I haven't posted again since last year. I have some more thoughts, but it'll have to wait.

In the meantime, I thought I should share this video demonstrating databases/recutils. Beware, this is a GNU product! But I think it does check a lot of boxes for a "Unix database system".

View: https://youtu.be/qnlkr3mCqW8

(Disclaimer: I'm obviously not saying this sort of software can compete with a RDBMS in terms of performance, scale, user management, table joins and so on.)

See also Wikipedia: https://en.wikipedia.org/wiki/Recfiles

bakul · Mar 23, 2022

Hope someone ports visidata to bsd: https://www.visidata.org/
It seems very convenient for tabular data exploration. Here’s a lightening demo:

View: https://www.youtube.com/watch?v=N1CBDTgGtOU

dbdemon · Mar 23, 2022

I think the ideal database system "in the spirit" of FreeBSD/Unix would be an RDBMS with both traditional SQL data access and also "text file" interface to the same data.

astyle · Mar 23, 2022

dbdemon said:
I think the ideal database system "in the spirit" of FreeBSD/Unix would be an RDBMS that in addition to SQL access to its data somehow also supported accessing its data tables as if they were text files.

SQL92 offers the API to do a text dump of select * from database.table. UNIX also offers plenty of ways to do a text dump. Unfortunately, just the very design of on-disk files that are accessible via SQL makes those files incompatible with being processed with stuff like grep. It's like trying to open a.out with a text editor - you do need a hex editor to open a.out, or GDB. It's that kind of difference. ?

dbdemon · Mar 23, 2022

astyle said:
SQL92 offers the API to do a text dump of select * from database.table. UNIX also offers plenty of ways to do a text dump. Unfortunately, just the very design of on-disk files that are accessible via SQL makes those files incompatible with being processed with stuff like grep. It's like trying to open a.out with a text editor - you do need a hex editor to open a.out, or GDB. It's that kind of difference. ?

(I edited my post to clarify my point, but I think you understood what I meant regardless.)

Yes. But wouldn't it be possible for DBMS vendors to offer a "text file interface" to their tables? At least for reading? Similar to /proc/ or /dev/ filesystems on Unix-like OSes.

astyle · Mar 23, 2022

dbdemon said:
(I edited my post to clarify my point, but I think you understood what I meant regardless.)

Yes. But wouldn't it be possible for DBMS vendors to offer a "text file interface" to their tables? At least for reading? Similar to /proc/ or /dev/ filesystems on Unix-like OSes.

Umm... due to the very design of databases, this is putting the cart before the horse. The output of the command is in plaintext. Once the command completes, and you have the output, that's when you can put the usual text-processing commands like grep to work. The select SQL command is for reading, and insert / delete SQL commands are for writing.

BTW, /proc and /dev are not exactly for editing or even reading text. I would strongly suggest that you read a good book to understand what those are for. Andrew Tanenbaum is a good author of a few of those books.

dbdemon · Mar 25, 2022

astyle said:
Umm... due to the very design of databases, this is putting the cart before the horse. The output of the command is in plaintext. Once the command completes, and you have the output, that's when you can put the usual text-processing commands like grep to work. The select SQL command is for reading, and insert / delete SQL commands are for writing.

I know very well how SQL works, thank you

(After all, my job title is 'senior database engineer'!) I must not be making myself clear. I believe building a different interface than SQL for an SQL database is entirely within the realm of non-fiction. In fact, this was done for MySQL with their document store using their "X protocol". And MariaDB ColumnStore has a Write API which also circumvents the SQL layer to enable more direct write-access to the data. I'm sure there are other examples. But it should also be possible to build a "text file" interface that utilises SQL under the hood.

In fact, it turns out someone did try to do what I'm suggesting, using FUSE - see the README file here:

GitHub - BMDan/DFuse: DFuse: Database FUSE - MySQL Tables Exposed as a FS via FUSE

DFuse: Database FUSE - MySQL Tables Exposed as a FS via FUSE - BMDan/DFuse

github.com

shkhln · Mar 25, 2022

dbdemon said:
But wouldn't it be possible for DBMS vendors to offer a "text file interface" to their tables?

That kind of interface prevents one from effectively using: 1. transactions — your typical file system access API has no commit/rollback functionality; 2. indexes — everything is a full table scan; 3. constraints — POSIX error codes are not going to cut it.

Thus, technical possibility doesn't mean it's a good idea.

bakul · Mar 25, 2022

shkhln said:
That kind of interface prevents one from effectively using: 1. transactions — your typical file system access API has no commit/rollback functionality; 2. indexes — everything is a full table scan; 3. constraints — POSIX error codes are not going to cut it.

In mid/late 1980s there was a unix startup that built in transaction semantics on file system operations. I did some contract work for them but can’t remember their name now — it was some generic name like Relational Systems or something so hard to search for. Its performance was not great — much worse than comparable unix boxes. I think they made the classic mistake of not understanding the difference between a research project with a deliverable product!

astyle · Mar 25, 2022

dbdemon said:
I know very well how SQL works, thank you (After all, my job title is 'senior database engineer'!) I must not be making myself clear. I believe building a different interface than SQL for an SQL database is entirely within the realm of non-fiction. In fact, this was done for MySQL with their document store using their "X protocol". And MariaDB ColumnStore has a Write API which also circumvents the SQL layer to enable more direct write-access to the data. I'm sure there are other examples. But it should also be possible to build a "text file" interface that utilises SQL under the hood.

In fact, it turns out someone did try to do what I'm suggesting, using FUSE - see the README file here:

GitHub - BMDan/DFuse: DFuse: Database FUSE - MySQL Tables Exposed as a FS via FUSE

DFuse: Database FUSE - MySQL Tables Exposed as a FS via FUSE - BMDan/DFuse

github.com

All you need is a web-based front-end that talks SQL to the database... like databases/phpmyadmin... they are a dime a dozen out there. ? A 'text-based interface' is a UNIX shell... you start by typing, say 'mysql' into bash, you get dropped into a shell that allows you to type in straight SQL commands and talk to the server that way. Oracle works that way, too, and even MS SQL Server has a utility that works similar. No need to circumvent anything, just slap on a front-end that reflects your customer's 'Best Practices', and connect it to the proper databases. Yeah, the portion with databases do need to be well-designed, that's the job for a 'senior database engineer'... and I'd expect a 'senior database engineer' to actually know to avoid filesystem shims - that can mess up the file itself to the point that the database is not recoverable. I don't like being this blunt, but I do think it's important to be able to connect the dots properly in this industry.

Jose · Mar 25, 2022

dbdemon said:
I know very well how SQL works, thank you (After all, my job title is 'senior database engineer'!) I must not be making myself clear. I believe building a different interface than SQL for an SQL database is entirely within the realm of non-fiction. In fact, this was done for MySQL with their document store using their "X protocol"...

Postgresql has Hstore.

A SQL interface to files is essentially what Mysql with the Myisam backend was. It did not support transactions. It was a pretty sad and losing excuse for an RDMBS but gained huge adoption because it was simple and free.

ralphbsz · Mar 25, 2022

bakul said:
In mid/late 1980s there was a unix startup that built in transaction semantics on file system operations. I did some contract work for them but can’t remember their name now — it was some generic name like Relational Systems or something so hard to search for. Its performance was not great — much worse than comparable unix boxes. I think they made the classic mistake of not understanding the difference between a research project with a deliverable product!

There are several things in this area. First, there are file systems that can be "connected" to a database: If you do that, then any file that is mentioned in the database gets transactional semantics, where updates to the file are done under database transactions, so you get ACID properties. The one shipping example I know comes from IBM, which used DB2 and AIX' jfs file system.

Second, there are file systems that allow users to create transactions, and pack multiple read and write operations into a transaction, which is then atomically executed or not. Quite a few research examples exist; the only shipping product ready for use that I remember was a transactional version of Microsoft's NTFS. In the early 2000s, quite a few startups were playing in that field; for some reason, I remember DataBrix (not sure I spelled that right). Internally, many file systems use database techniques such as transactions and logging to achieve consistency.

bakul · Mar 25, 2022

ralphbsz said:
Internally, many file systems use database techniques such as transactions and logging to achieve consistency.

This company allowed *user processes* to implement atomic transaction semantics over *multiple* file operations. This is quite different from using transaction semantics behind the scenes as it were for *individual* file operations. I have a very vague memory of them providing two additional syscalls: begin transaction and commit/abort the current transaction. File ops bracketed by a pair of these calls would be done atomically. But I could be completely wrong! Now I think that this space is still not explored, probably because if you want transaction semantics you’d just use a proper RDBMS.

edit: I imagine a transactional version of NTFS might do something similar.

ralphbsz · Mar 26, 2022

Exactly. Transactional file systems do just that: call beginXact, then do a lot of operations under that transaction, then call commitXact (I'm glossing over complicated things like cancel, rollback, collision and all that). There are OODLES of implementation problems here. What happen if the user overwrites the same sector a billion times, alternating between 0x55 and 0xAA? Now in the transaction log we have a billion writes. Where do we store a terabyte of transaction log, so we can undo or redo? What if one program starts a transaction on files a, then adds file b, while the other program starts on b and then adds file a? Sure, you have a deadlock detector, but one of the programs now has to deal with a full cancel. The whole thing is an enormous mess, and requires lots of work. What is the use? Do customers really pay enough to make all the development and testing worthwhile?

And I think in practice your answer is correct: People who really care and up using a database anyway. Problem solved.

dbdemon · Mar 26, 2022

astyle said:
All you need is a web-based front-end that talks SQL to the database... like databases/phpmyadmin... they are a dime a dozen out there. ? A 'text-based interface' is a UNIX shell... you start by typing, say 'mysql' into bash, you get dropped into a shell that allows you to type in straight SQL commands and talk to the server that way. Oracle works that way, too, and even MS SQL Server has a utility that works similar. No need to circumvent anything, just slap on a front-end that reflects your customer's 'Best Practices', and connect it to the proper databases. Yeah, the portion with databases do need to be well-designed, that's the job for a 'senior database engineer'... and I'd expect a 'senior database engineer' to actually know to avoid filesystem shims - that can mess up the file itself to the point that the database is not recoverable. I don't like being this blunt, but I do think it's important to be able to connect the dots properly in this industry.

I do not think a web-based front-end would generally be considered a tool that follows the Unix philosophy.

I didn't say "text based interface". I said "text file interface". In other words, providing access to the database tables as if the they were files in a filesystem.

I also never suggested I, as a user, wanted to directly access the actual data files of the database. I suggested a DBMS vendor could possibly build a non-SQL interface to their own database system. And then expose the database tables through that interface as (virtual) files in a filesystem.

astyle · Mar 26, 2022

dbdemon said:
I do not think a web-based front-end would generally be considered a tool that follows the Unix philosophy.

I didn't say "text based interface". I said "text file interface". In other words, providing access to the database tables as if the they were files in a filesystem.

I also never suggested I, as a user, wanted to directly access the actual data files of the database. I suggested a DBMS vendor could possibly build a non-SQL interface to their own database system. And then expose the database tables through that interface as (virtual) files in a filesystem.

astyle said:
UNIX is not a religion, buddy.??‍

I've said so in many threads, BTW.

Oh, and I'd like to point out that most RDBMS vendors are not gonna offer a "non-SQL interface to their own database system". They will offer more front-ends like what I was talking about. Sometimes, you gotta wake up to the current industry standards and best practices, instead of recalling some obscure research projects from 15 years ago and imagine that to somehow fit the business scenario that was reviewed like, yesterday.

dbdemon · Mar 27, 2022

astyle said:
UNIX is not a religion, buddy.??‍

Absolutely. But I was talking about what would be an ideal database system with respect to the Unix philosophy.

shkhln said:
That kind of interface prevents one from effectively using: 1. transactions — your typical file system access API has no commit/rollback functionality; 2. indexes — everything is a full table scan; 3. constraints — POSIX error codes are not going to cut it.

Thus, technical possibility doesn't mean it's a good idea.

I suppose making use of transactions and indexes would require adding special commands: for "start transaction" and "commit", and for searching particular columns.

Although, if instead a table was represented as a directory, and then within it the individual columns were represented as files (each filename prefixed with the primary key value of the row), then users would not need a special command to specify the column to search on. Also, transactions might not be as important for how I imagine it would be used. (SQL would still be the main interface, whereas the virtual filesystem could be used for more ad-hoc operations.)

Constraints: Yeah. Some generic error code(s) would have to be used, I suppose. Maybe there could be an error log exposed in a special file within the virtual filesystem. Not perfect.