Regis a small program to manipulate small registers. A register is a sequences of records stored in a file, i.e., the same as a table, or a relation, in a relational database.
Registers are represented as
Haskell values of type
[[String]] in textual
form. The first element in this list determines the names of the
fields of the records in the file.
All records should have the same number of fields.[["Name","Extension"], ["Thomas Hallgren","5422"], ["Magnus Carlsson","1058"], ["Ana Bove","1020"] ]
Operations are performed from right to left, just as function applications and function compositions in Haskell. The input_format is this thus the first operation, determining the format of the input, and output_format is the last operation, that determines the format of the output. Since all operations have a known number of parameters, no parantheses or other delimiters are needed to separate one operation from the next.
Reg location output_format op1 ... opn input_format
Without any arguments,
Reg could naturally pass the input to
the output unchanged, but since this would be rather pointless,
Reg outputs a usage message instead.
|if no location is given, input is taken from standard input and output is written to standard output|
Reg supports a number of bidirectional format conversions:
|If no input or output format is given, the register file format is used.|
|A human readable textual format. See Examples below.|
|Use the CSV format (comma-separated values, RFC 4180), where fields are separated by comma. Fields can be enclosed in double quotes, in which case they can contain commas. (No other features of the CSV format are supported at the moment.) The first line is assumed to contain the names of the fields.|
|A variant of |
|Use the UNIX password file format, that is,
with one line per record and |
|Use the tab-separated values format, i.e. one record per line with field values separated by tabs. For input, if the fields on the first line are all enclosed in square brackets, they are assumed to be the names of the fields. For output, the first line is always the field names in square brackets.|
|A variant of |
|The format is one url-encoded-query per line. url-encoded-query is the format used by web browsers when submitting the contents of a form to a web server.|
|The format is a JSON array containing a number of records.|
In addition to the intput/output formats above,
Reg can also
read input in the following formats:
|The input is assumed to be in the
UNIX mailbox format,
which is a
sequence of mail messages where the beginning of each message is
identified by a line starting with "|
|The input is assumed to be in the Common Log Format, or Combined Log Format, used by some web servers.|
In addition to the intput/output formats above,
Reg can also
produce output in the following formats:
|Generate an HTML table. The fields are assumed to contain plain text, so characters that have special meaning in HTML are escaped.|
|Generate an HTML table. The fields are assumed to contain HTML, and are output as is.|
|Format the output according to a formatting string (see below).|
Formatting stringsFormatting strings used with the
fmtcommand work in much the same way as the formatting strings used in the C functions
strftime. Most characters stand for themselves, except the
%characters, which starts a formatting command.
|The percent character.|
|The newline character.|
|The contents of the named record field as is.|
|The contents of the named record field wrapped in double
quotes. Double quotes and newline characters are escaped as
|The contents of the named record field wrapped in single
quotes. Single quotes and newline characters are escaped as
|The line count of the contents of the named record field.|
|The contents of the named record field.
Characters that have special meaning in HTML are escaped, that is,
|This creates an HTML link by using the named field as the URL
and the formatting string |
|As above, but the empty string is substituted if the link field is empty.|
OperationsIn the table below, fields denotes a comma seperated list of field names, for example
|Add a new record to the register|
|Update records matching the url-encoded-query where with the values given in url-encoded-query what.|
|Projection. Select the named fields from the records. The order of the fields is not changed.|
|Projection. Remove the named fields from the records.|
|Rearrange the fields of the records. This can change the order of the fields, drop some fields, duplicate fields and introduce new fields.|
|Selection. Select the records that has string as a substring of some field. The comparison is case insensitive.|
|Selection. Select the records that has string as a substring of some of the mentioned fields. The comparison is case insensitive.|
|Selection. Select records with fields that contain substrings of the strings given in url-encoded-query. Fields not mentioned in the query can contain anything. The comparison is case insensitive.|
|Selection. Select records with fields that are not exactly matched by the url-encoded-query. Fields not mentioned in the query can contain anything.|
|Selection. Select records with fields that match the
url-encoded-query, which can contain |
|Remove duplicate records.|
|Remove duplicate records. Use only the given fields to determine if two records are equal.|
|Sort the records. Fields are compared lexicographically from left to right.|
|Sort the records. The given fields are compared lexicographically in the order given.|
|Sort the records like |
|Reverse the order of the records.|
|If the contents of field is more than one line long, split the record into several records with one line per record.|
|The opposite of |
|Similar to |
|Apply aggregation function fn to the given fields.
This is useful as a postprocessing step after |
|Combine the given fields into one field by concatenating the contents of the given fields in each record. The name of the new field is the concatenation of fields.|
|Split field into several fields. The contents of the field are split on line breaks. The number of fields is thus determined by the maximum number of lines in the field. The names of the new fields are obtained by appending a number to field.|
|string||If string is not one of the operations recognized by
ExamplesFor the following examples, we assume that the file
peoplecontains the register displayed in the introduction.
Name····· Thomas Hallgren Extension 5422 Name····· Magnus Carlsson Extension 1058 Name····· Ana Bove Extension 1020
Thomas Hallgren 5422 Magnus Carlsson 1058 Ana Bove 1020
Name····· Magnus Carlsson Extension 1058
Regis implemented in Haskell. The source is 555 lines long (2001-05-20), of which 383 lines were written specifically for
Regand 172 lines were reused from other programs and libraries. It also uses functions defined in the Haskell prelude and standard libraries, which are not counted here.
Past, present and future
Reghas evolved over time and could still be improved in various ways.
aggrwere added in September 2021.
- An operation for adding new records to a register was added on 2007-02-04.
- There should probably be an operation for adding new fields to a
register (in an easier way than with
- Except that
pickis slightly more efficient than
arrange, there is no good reason to have two so similar operations.
- There should probably be operations that combine two or more registers
in various ways, for example union, intersection and join.
Currently, there are two separate programs,
RegCat, for joining and concatenating registers, respectively.
- Conversion from alternate input formats was initially added on
2001-05-20, and the list of supported formats has been extended over
Regcould support more input formats.
- Relational databases
- Relational algebra
- Unix shell commands: grep, sort, cut.
- Haskell standard functions: nub, nubBy, sort, sortBy, groupBy, lines, filter.