Reg

Reg is a small program to manipulate small registers. A register is a sequences of records stored in a file, i.e., the same as a table, or a relation, in a relational database.

Registers are represented as Haskell values of type [[String]] in textual form. The first element in this list determines the names of the fields of the records in the file. Example:

[["Name","Extension"],
 ["Thomas Hallgren","5422"],
 ["Magnus Carlsson","1058"],
 ["Ana Bove","1020"]
]

All records should have the same number of fields.

Synopsis

Reg location output_format op₁ ... op_n input_format

Operations are performed from right to left, just as function applications and function compositions in Haskell. The input_format is this thus the first operation, determining the format of the input, and output_format is the last operation, that determines the format of the output. Since all operations have a known number of parameters, no parantheses or other delimiters are needed to separate one operation from the next.

Without any arguments, Reg could naturally pass the input to the output unchanged, but since this would be rather pointless, Reg outputs a usage message instead.

Register locations

Location Meaning

if no location is given, input is taken from standard input and output is written to standard output

file path Reg operates on the contents of the file at the given path. To prevent data loss, the result is written to a temporary file that is renamed to replace the file at path. Write permission is thus required in the directory where the register is located, permissions and ownership of the register file might change and links might be broken. It is probably also wise to limit the use of input and output format conversions, to keep the output in the same format as the input.

Location	Meaning
	if no location is given, input is taken from standard input and output is written to standard output
`file` `path`	`Reg` operates on the contents of the file at the given `path`. To prevent data loss, the result is written to a temporary file that is renamed to replace the file at `path`. Write permission is thus required in the directory where the register is located, permissions and ownership of the register file might change and links might be broken. It is probably also wise to limit the use of input and output format conversions, to keep the output in the same format as the input.

Input/output formats

Reg supports a number of bidirectional format conversions:

Input	Output	Meaning
		If no input or output format is given, the register file format is used.
`from-show`	`show`	A human readable textual format. See Examples below.
`from-csv`	`csv`	Use the CSV format (comma-separated values, RFC 4180), where fields are separated by comma. Fields can be enclosed in double quotes, in which case they can contain commas. (No other features of the CSV format are supported at the moment.) The first line is assumed to contain the names of the fields.
`from-ssv`	`ssv`	A variant of `from-csv`/`csv` where the values are separated by semicolons instead of commas.
`from-passwd`	`passwd`	Use the UNIX password file format, that is, with one line per record and `:` separating the fields. For input, if the first line starts with `#` it is assumed to contain the names of the fields, separated by `:`. For output, a line started with `#` followed by the names of the fields is always included.
`from-tabbed`	`tabbed`	Use the tab-separated values format, i.e. one record per line with field values separated by tabs. For input, if the fields on the first line are all enclosed in square brackets, they are assumed to be the names of the fields. For output, the first line is always the field names in square brackets.
`from-tabbed0`	`tabbed0`	A variant of `from-tabbed`/`tabbed` which doesn't require the field names to be enclosed in square brackets. The first line is always the field names, without square brackets.
`from-url`	`url`	The format is one url-encoded-query per line. url-encoded-query is the format used by web browsers when submitting the contents of a form to a web server.
`from-json`	`json`	The format is a JSON array containing a number of records.

Input-only formats

In addition to the intput/output formats above, Reg can also read input in the following formats:

Format Meaning

from-mbox The input is assumed to be in the UNIX mailbox format, which is a sequence of mail messages where the beginning of each message is identified by a line starting with "From ". The resulting register will have the following fields: From, To, Date, Subject, Message-Id, Headers, FilePos and Body. The five first fields contain the values of the corresponding mail headers, The Headers field contains the values of the remaining headers, the FilePos field contains the position of the message in the input file and the Body field contains the body of the mail message.

from-clf The input is assumed to be in the Common Log Format, or Combined Log Format, used by some web servers.

Format	Meaning
`from-mbox`	The input is assumed to be in the UNIX mailbox format, which is a sequence of mail messages where the beginning of each message is identified by a line starting with "`From` ". The resulting register will have the following fields: From, To, Date, Subject, Message-Id, Headers, FilePos and Body. The five first fields contain the values of the corresponding mail headers, The Headers field contains the values of the remaining headers, the FilePos field contains the position of the message in the input file and the Body field contains the body of the mail message.
`from-clf`	The input is assumed to be in the Common Log Format, or Combined Log Format, used by some web servers.

Output-only formats

In addition to the intput/output formats above, Reg can also produce output in the following formats:

Format	Meaning
`html`	Generate an HTML table. The fields are assumed to contain plain text, so characters that have special meaning in HTML are escaped.
`html0`	Generate an HTML table. The fields are assumed to contain HTML, and are output as is.
`fmt` `format`	Format the output according to a formatting string (see below).

Formatting strings

Formatting strings used with the fmt command work in much the same way as the formatting strings used in the C functions printf and strftime. Most characters stand for themselves, except the % characters, which starts a formatting command.

Format	Meaning
`%%`	The percent character.
`%/`	The newline character.
`%0field;`	The contents of the named record `field` as is.
`%"field;`	The contents of the named record `field` wrapped in double quotes. Double quotes and newline characters are escaped as `\"` and `\n`, respectively.
`%'field;`	The contents of the named record `field` wrapped in single quotes. Single quotes and newline characters are escaped as `\'` and `\n`, respectively.
`%#field;`	The line count of the contents of the named record `field`.
`%field;`	The contents of the named record `field`. Characters that have special meaning in HTML are escaped, that is, `&` is replaced by `&` and `<` is replaced by `<`.
`%{field-fmt}`	This creates an HTML link by using the named `field` as the URL and the formatting string `fmt` as the link text. If the URL `field` is empty (or contains only blank space), the link text is output without turning it into a link. Links can not be nested and `fmt` can not contain `}`.
`%{field=fmt}`	As above, but the empty string is substituted if the link field is empty.

Operations

In the table below, fields denotes a comma seperated list of field names, for example Name,Phone.

Operation	Meaning
`add` `url-encoded-query`	Add a new record to the register
`update` `where` `what`	Update records matching the url-encoded-query `where` with the values given in url-encoded-query `what`.
`pick` `fields`	Projection. Select the named fields from the records. The order of the fields is not changed.
`drop` `fields`	Projection. Remove the named fields from the records.
`arrange` `fields`	Rearrange the fields of the records. This can change the order of the fields, drop some fields, duplicate fields and introduce new fields.
`grep` `string`	Selection. Select the records that has `string` as a substring of some field. The comparison is case insensitive.
`grep-in` `fields` `string`	Selection. Select the records that has `string` as a substring of some of the mentioned `fields`. The comparison is case insensitive.
`urlgrep` `url-encoded-query`	Selection. Select records with fields that contain substrings of the strings given in `url-encoded-query`. Fields not mentioned in the query can contain anything. The comparison is case insensitive.
`urlgrep-v` `url-encoded-query`	Selection. Select records with fields that are not exactly matched by the `url-encoded-query`. Fields not mentioned in the query can contain anything.
`urlmatch` `url-encoded-query` `urlmatch-v` `url-encoded-query`	Selection. Select records with fields that match the `url-encoded-query`, which can contain `?` and `*` wildcards. Fields not mentioned in the query can contain anything. `urlmatch-v` selects records that don't match.
`nub`	Remove duplicate records.
`nubBy` `fields`	Remove duplicate records. Use only the given `fields` to determine if two records are equal.
`sort`	Sort the records. Fields are compared lexicographically from left to right.
`sortBy` `fields`	Sort the records. The given `fields` are compared lexicographically in the order given.
`sortBy-n` `fields`	Sort the records like `sortBy`, except the first `field` is compared numerically.
`reverse`	Reverse the order of the records.
`lines` `field`	If the contents of `field` is more than one line long, split the record into several records with one line per record.
`unlines` `field`	The opposite of `lines`, that is, consecutive records which are equal except for the contents of `field`, are combined.
`groupBy` `fields`	Similar to `unlines`: consecutive records where the corresponing `fields` agree are combined.
`aggr` `fn` `fields`	Apply aggregation function `fn` to the given `fields`. This is useful as a postprocessing step after `groupBy`. Supported aggregation functions: `count`, `max`, `min`, `nub`, `product` and `sum`. The numeric aggregation functions understand numbers that have a unit suffix, e.g. 53m². The aggregation is only applied if all number have the same unit.
`concat` `fields`	Combine the given `fields` into one field by concatenating the contents of the given fields in each record. The name of the new field is the concatenation of `fields`.
`split` `field`	Split `field` into several fields. The contents of the field are split on line breaks. The number of fields is thus determined by the maximum number of lines in the field. The names of the new fields are obtained by appending a number to `field`.
`string`	If `string` is not one of the operations recognized by `Reg`, it is intepreted as `grep` `string`, that is, when you search, you can omit the word `grep` in most cases.

Examples

For the following examples, we assume that the file people contains the register displayed in the introduction.

Command	Output
`Reg show <people`	Name····· Thomas Hallgren Extension 5422 Name····· Magnus Carlsson Extension 1058 Name····· Ana Bove Extension 1020
`Reg fmt '%Name; %Extension;%/' <people`	Thomas Hallgren 5422 Magnus Carlsson 1058 Ana Bove 1020
`Reg show grep magnus <people`	Name····· Magnus Carlsson Extension 1058
`Reg file people update 'Name=Thomas*' Extension=5555` `Reg show thomas <people`	`Name..... Thomas Hallgren Extension 5555`

Implementation

Reg is implemented in Haskell. The source is 555 lines long (2001-05-20), of which 383 lines were written specifically for Reg and 172 lines were reused from other programs and libraries. It also uses functions defined in the Haskell prelude and standard libraries, which are not counted here.

Past, present and future

Reg has evolved over time and could still be improved in various ways.

groupBy and aggr were added in September 2021.
An operation for adding new records to a register was added on 2007-02-04.
There should probably be an operation for adding new fields to a register (in an easier way than with arrange).
Except that pick is slightly more efficient than arrange, there is no good reason to have two so similar operations.
There should probably be operations that combine two or more registers in various ways, for example union, intersection and join. Currently, there are two separate programs, RegJoin and RegCat, for joining and concatenating registers, respectively.
Conversion from alternate input formats was initially added on 2001-05-20, and the list of supported formats has been extended over time, but Reg could support more input formats.

Author

Thomas Hallgren