Reg

Reg is a small program to manipulate small registers. A register is a sequences of records stored in a file, i.e., the same as a table, or a relation, in a relational database.

Registers are represented as Haskell values of type [[String]] in textual form. The first element in this list determines the names of the fields of the records in the file. Example:

[["Name","Extension"],
 ["Thomas Hallgren","5422"],
 ["Magnus Carlsson","1058"],
 ["Ana Bove","1020"]
]
All records should have the same number of fields.

Synopsis

Reg location output_format op1 ... opn input_format
Operations are performed from right to left, just as function applications and function compositions in Haskell. The input_format is this thus the first operation, determining the format of the input, and output_format is the last operation, that determines the format of the output. Since all operations have a known number of parameters, no parantheses or other delimiters are needed to separate one operation from the next.

Without any arguments, Reg could naturally pass the input to the output unchanged, but since this would be rather pointless, Reg outputs a usage message instead.

Register locations

LocationMeaning
 if no location is given, input is taken from standard input and output is written to standard output
file path Reg operates on the contents of the file at the given path. To prevent data loss, the result is written to a temporary file that is renamed to replace the file at path. Write permission is thus required in the directory where the register is located, permissions and ownership of the register file might change and links might be broken. It is probably also wise to limit the use of input and output format conversions, to keep the output in the same format as the input.

Input/output formats

Reg supports a number of bidirectional format conversions:

InputOutputMeaning
    If no input or output format is given, the register file format is used.
from-show show A human readable textual format. See Examples below.
from-csv csv Use the CSV format (comma-separated values, RFC 4180), where fields are separated by comma. Fields can be enclosed in double quotes, in which case they can contain commas. (No other features of the CSV format are supported at the moment.) The first line is assumed to contain the names of the fields.
from-ssv ssv A variant of from-csv/csv where the values are separated by semicolons instead of commas.
from-passwd passwd Use the UNIX password file format, that is, with one line per record and : separating the fields. For input, if the first line starts with # it is assumed to contain the names of the fields, separated by :. For output, a line started with # followed by the names of the fields is always included.
from-tabbed tabbed Use the tab-separated values format, i.e. one record per line with field values separated by tabs. For input, if the fields on the first line are all enclosed in square brackets, they are assumed to be the names of the fields. For output, the first line is always the field names in square brackets.
from-tabbed0 tabbed0 A variant of from-tabbed/tabbed which doesn't require the field names to be enclosed in square brackets. The first line is always the field names, without square brackets.
from-url url The format is one url-encoded-query per line. url-encoded-query is the format used by web browsers when submitting the contents of a form to a web server.
from-json json The format is a JSON array containing a number of records.

Input-only formats

In addition to the intput/output formats above, Reg can also read input in the following formats:

FormatMeaning
from-mbox The input is assumed to be in the UNIX mailbox format, which is a sequence of mail messages where the beginning of each message is identified by a line starting with "From ". The resulting register will have the following fields: From, To, Date, Subject, Message-Id, Headers, FilePos and Body. The five first fields contain the values of the corresponding mail headers, The Headers field contains the values of the remaining headers, the FilePos field contains the position of the message in the input file and the Body field contains the body of the mail message.
from-clf The input is assumed to be in the Common Log Format, or Combined Log Format, used by some web servers.

Output-only formats

In addition to the intput/output formats above, Reg can also produce output in the following formats:

FormatMeaning
html Generate an HTML table. The fields are assumed to contain plain text, so characters that have special meaning in HTML are escaped.
html0 Generate an HTML table. The fields are assumed to contain HTML, and are output as is.
fmt format Format the output according to a formatting string (see below).

Formatting strings

Formatting strings used with the fmt command work in much the same way as the formatting strings used in the C functions printf and strftime. Most characters stand for themselves, except the % characters, which starts a formatting command.

FormatMeaning
%% The percent character.
%/ The newline character.
%0field; The contents of the named record field as is.
%"field; The contents of the named record field wrapped in double quotes. Double quotes and newline characters are escaped as \" and \n, respectively.
%'field; The contents of the named record field wrapped in single quotes. Single quotes and newline characters are escaped as \' and \n, respectively.
%#field; The line count of the contents of the named record field.
%field; The contents of the named record field. Characters that have special meaning in HTML are escaped, that is, & is replaced by &amp; and < is replaced by &lt;.
%{field-fmt} This creates an HTML link by using the named field as the URL and the formatting string fmt as the link text. If the URL field is empty (or contains only blank space), the link text is output without turning it into a link. Links can not be nested and fmt can not contain }.
%{field=fmt} As above, but the empty string is substituted if the link field is empty.

Operations

In the table below, fields denotes a comma seperated list of field names, for example Name,Phone.

OperationMeaning
add url-encoded-query Add a new record to the register
update where what Update records matching the url-encoded-query where with the values given in url-encoded-query what.
pick fields Projection. Select the named fields from the records. The order of the fields is not changed.
drop fields Projection. Remove the named fields from the records.
arrange fields Rearrange the fields of the records. This can change the order of the fields, drop some fields, duplicate fields and introduce new fields.
grep string Selection. Select the records that has string as a substring of some field. The comparison is case insensitive.
grep-in fields string Selection. Select the records that has string as a substring of some of the mentioned fields. The comparison is case insensitive.
urlgrep url-encoded-query Selection. Select records with fields that contain substrings of the strings given in url-encoded-query. Fields not mentioned in the query can contain anything. The comparison is case insensitive.
urlgrep-v url-encoded-query Selection. Select records with fields that are not exactly matched by the url-encoded-query. Fields not mentioned in the query can contain anything.
urlmatch url-encoded-query
urlmatch-v url-encoded-query
Selection. Select records with fields that match the url-encoded-query, which can contain ? and * wildcards. Fields not mentioned in the query can contain anything. urlmatch-v selects records that don't match.
nub Remove duplicate records.
nubBy fields Remove duplicate records. Use only the given fields to determine if two records are equal.
sort Sort the records. Fields are compared lexicographically from left to right.
sortBy fields Sort the records. The given fields are compared lexicographically in the order given.
sortBy-n fields Sort the records like sortBy, except the first field is compared numerically.
reverse Reverse the order of the records.
lines field If the contents of field is more than one line long, split the record into several records with one line per record.
unlines field The opposite of lines, that is, consecutive records which are equal except for the contents of field, are combined.
groupBy fields Similar to unlines: consecutive records where the corresponing fields agree are combined.
aggr fn fields Apply aggregation function fn to the given fields. This is useful as a postprocessing step after groupBy. Supported aggregation functions: count, max, min, nub, product and sum. The numeric aggregation functions understand numbers that have a unit suffix, e.g. 53m². The aggregation is only applied if all number have the same unit.
concat fields Combine the given fields into one field by concatenating the contents of the given fields in each record. The name of the new field is the concatenation of fields.
split field Split field into several fields. The contents of the field are split on line breaks. The number of fields is thus determined by the maximum number of lines in the field. The names of the new fields are obtained by appending a number to field.
string If string is not one of the operations recognized by Reg, it is intepreted as grep string, that is, when you search, you can omit the word grep in most cases.

Examples

For the following examples, we assume that the file people contains the register displayed in the introduction.

CommandOutput
Reg show <people
Name····· Thomas Hallgren
Extension 5422
 
Name····· Magnus Carlsson
Extension 1058
 
Name····· Ana Bove
Extension 1020
Reg fmt '%Name; %Extension;%/' <people
Thomas Hallgren 5422
Magnus Carlsson 1058
Ana Bove 1020
Reg show grep magnus <people
Name····· Magnus Carlsson
Extension 1058
Reg file people update 'Name=Thomas*' Extension=5555
Reg show thomas <people
Name..... Thomas Hallgren
Extension 5555

Implementation

Reg is implemented in Haskell. The source is 555 lines long (2001-05-20), of which 383 lines were written specifically for Reg and 172 lines were reused from other programs and libraries. It also uses functions defined in the Haskell prelude and standard libraries, which are not counted here.

Past, present and future

Reg has evolved over time and could still be improved in various ways.
  • groupBy and aggr were added in September 2021.
  • An operation for adding new records to a register was added on 2007-02-04.
  • There should probably be an operation for adding new fields to a register (in an easier way than with arrange).
  • Except that pick is slightly more efficient than arrange, there is no good reason to have two so similar operations.
  • There should probably be operations that combine two or more registers in various ways, for example union, intersection and join. Currently, there are two separate programs, RegJoin and RegCat, for joining and concatenating registers, respectively.
  • Conversion from alternate input formats was initially added on 2001-05-20, and the list of supported formats has been extended over time, but Reg could support more input formats.

See also

Author

Thomas Hallgren