Which database system is the most in the spirit of FreeBSD?

bakul · Mar 28, 2022

Th real question is what do you gain by representing a database as a filesystem?

ralphbsz · Mar 28, 2022

A long time ago, there was a research paper that explained how you can use the standard Unix commands sort, uniq, join and awk to implement a makeshift relational database. I can't find it online, and I'm not even sure I've ever seen that paper. Anyone know? It would probably be 30-40 years old by now.

mark_j · Mar 28, 2022

bakul said:
Th real question is what do you gain by representing a database as a filesystem?

I can give you some examples, but time is not my friend at present. Suffice to say, have a read of this, but only pp 29 onwards.

This file system/database allows you to use languages like Basic, Fortran, C, Macro etc on the file system as if it was a database.

It's just a different way of thinking about a file system and is, somewhat, an old/dead concept nowadays but is/was used in other OSs, eg Pick, Prime etc.

bakul · Mar 28, 2022

ralphbsz said:
A long time ago, there was a research paper that explained how you can use the standard Unix commands sort, uniq, join and awk to implement a makeshift relational database. I can't find it online, and I'm not even sure I've ever seen that paper. Anyone know? It would probably be 30-40 years old by now.

There is this 1991 paper: The UNIX Shell As a Fourth Generation Language

astyle · Mar 28, 2022

Ahh... sort, uniq, and awk work best on the output of SQL commands. Did everyone forget that the strength of UNIX is actually pipes?

A sysadmin is nothing more than the high-tech equivalent of a plumber who works the toilets and kitchen sinks, and knows better than you why pipes need maintenance, and to be separated based on where the water is coming from. Sorry to be gross, but any self-respecting plumber would know to never mix the downstream outputs from toilets and kitchen sinks.

Relating all that back to UNIX design: UNIX pipes are also one-way data flow mechanism, they connect outputs from one place/process to inputs elsewhere. But try playing with the order of the commands whose outputs get piped - you'll get a HUGE mess. At best - it's not what you're even looking for.

As for UNIX design vs database design - there's a reason those are separate. Yes, it's not impossible to pretend that database tables are regular files, and to try and use text-processing tools on them - but the bigger the database, the worse the performance hit, and that was noticed back in 80s. ?

Vull · Mar 28, 2022

eternal_noob said:
PostgreSQL is nice once you get used to it. But at first, you have to memorize the weird replacements for MySQL commands, e.g.

SQL:

SHOW TABLES;

becomes

Code:

\dt

and so on.

Code:

    case sqlmy: $q = "SHOW TABLES"; break;
    case sqlpg: $q = "SELECT table_name FROM information_schema.tables
        WHERE table_schema='public' AND table_type='BASE TABLE'
        ORDER BY table_name"; break;

ralphbsz · Mar 28, 2022

astyle said:
Ahh... sort, uniq, and awk work best on the output of SQL commands.

But there is a tension here: SQL can do everything that sort, uniq and (simple) awk can do. So you have the option to do everything within the SQL framework, with a coherent set of commands (SQL statements). Or you can split the work between SQL and the traditional Unix tools; now there are two different things to program and maintain. I would think in most cases having a single coherent implementation is preferable.

As for UNIX design vs database design - there's a reason those are separate. Yes, it's not impossible to pretend that database tables are regular files, and to try and use text-processing tools on them - but the bigger the database, the worse the performance hit, and that was noticed back in 80s. ?

Modern databases have highly complex encoding and compression of the data that's stored in tables. That's particularly true for "big" databases, the ones used in data mining, which are usually stored in columnar format. That allows interesting sorting, delta compression and bit encoding, making it incredibly efficient (I've seen examples where additional rows in a database need less than 1 byte per row, meaning the compression has reached the level of encoding into individual bits).

But that also means that the columnar format files on disk are virtually impossible to understand, unless you use the database's reading code.

bakul · Mar 28, 2022

ralphbsz said:
Modern databases have highly complex encoding and compression of the data that's stored in tables. That's particularly true for "big" databases, the ones used in data mining, which are usually stored in columnar format. That allows interesting sorting, delta compression and bit encoding, making it incredibly efficient (I've seen examples where additional rows in a database need less than 1 byte per row, meaning the compression has reached the level of encoding into individual bits).

One of the reasons I like array languages such as APL & particularly K is for their use in “columnar databases”. It’s a slightly different way of processing data than SQL and more powerful (but less user friendly). I have wanted to write an “array shell” that makes data wrangling as easy as writing shell scripts….