Just got done reading C.J. Date's book Database in Depth: Relational Theory for Practitioners about the relational model and relational algebra.
We want to provide an ASCII format for relations. We want also to provide some algebraic operations on them. The term "relation" is used rather than table. A relation is like a simple table with a fixed set of columns. Each column has a name and a type. Column order should not matter as the columns are named.
Here we give a simple format for a file containing a relation. Tuples (rows) are each on their own line. Lines are separated by a newline character as in normal ASCII text. Values in each tuble are separated by tab characters.
The first line of input must be a header containing meta data.
The "column headers" are separated the same way all rows are.
Each column header is of the form "name: type" where ": type" is optional.
"type" should be one of "string" (the default), "name",
"integer", "decimal", "date", or "/regular expression/".
Maybe "counting", "natrual", "integer", and "rational" for numbers.
Maybe (start, end) or [start .. end) for intervals?
Different types could use their natural orders, dictionary for names and/or strings,
chronological for dates and times.
Maybe we should support only one format for dates and times: yyyy-mm-dd hh:mm:ss.
name number: integer
Eric 17
Fred 10
Such a file could be edited in any text editor. To ensure consistency (integrity, validity, etc.) you would have to run a program like rcheck below. Some spreadsheet programs (like gnumeric) can also be used to edit such files. It might be nice to have a specialized relation editing program. This might be something like a simple spreadsheet that enforces type adherence in each cell.
Maybe each header could be complicated as name[[: domain]: type].
This would allow a finer check for "union compatiblity" of relations
for set operations.
Then we can have specialized tools like rprojection, rselection, rjoin, etc. Composite commands can be composed with Posix pipes. Also could use rsort, rmerge, others? Maybe rsuss (or rguess) which would take a headless table, and deduce column types. Perhaps rcheck could check that values conform to type. Would need rel2html, rel2dif, etc.
rprojection.rb -columns=colname[,colname2[,...]
rproj.awk columns=colname[,colname2[,...]is another take.
rselection criteria
- A criterion might look like colname{=val|~regex}.
Criteria might be conjoined with boolean operators & or | and parentheses.
rsel.awk column=name pattern=re is a start.
rjoin rel1 [rel2] - natural joinrunion rel1 rel2 - set union
runion.sh rel1 rel2 is a start.
runion.awk rel1 rel2 is better.
runion.py rel1 rel2 is slower.
rdiff rel1 rel2 - set difference
rdiff.awk rel1 rel2 is one way.
rdiff.py rel1 rel2 is another.
It uses ra.py as does runion.py.
rrename colname=newname[,...] - rename attributes
rrename.awk column=name newname=name works.rcheckrsort.rb -column=namerel2html - This may not be needed as tsv2html would do. Maybe to remove (or use) types.
Eric Blossom
http://www.BlossomAssociates.net/tables/relations.html
2005-12-02, 2007-05-15