Filtering functions (Mailfromd Manual)

5.8 Filtering functions

This section describes functions that transform data using Mailutils filter pipes. Filter pipe is a string defining data flow between several filters. Each filter takes input, transforms it according to certain rules and produces the transformed data on its output. As in shell, multiple filters are connected using pipe characters (‘|’). For example, the crlf filter inserts a carriage return character before each newline character. A filter doing that kind of transformation is defined as:

"crlf"

Another filter, base64, converts its input to a BASE64 encoded string. To transform each newline into carriage return + newline pair and encode the resulting stream in BASE64, one would write:

"crlf | base64"

Some filters take one or more arguments. These are specified as a comma-delimited list in parentheses after the filter name. For example, the linelen filter limits the length of each output line to the given number of octets. The following filter pipe will limit the length of base64 lines in the filter above to 62 octets:

"crlf | base64 | linelen(62)"

Many filters operate in two modes: encode and decode. By default all MFL functions apply filters in encode mode. The desired mode can be stated explicitly in the filter string by using encode() and decode() functions. They take a filter pipe line as their argument. For example, the following will decode the stream produced by the example filter above:

"decode(base64 | crlf)"

See Filters, for a discussion of available filters and their arguments.

Built-in Function: string filter_string (string input, string filter_pipe)

Transforms the string input using filters in filter_pipe and returns the result. Example:

set input "test\ninput\n"
filter_string(input, "crlf|base64") ⇒ "dGVzdA0KaW5wdXQNCg=="

Built-in Function: void filter_fd (number src_fd, number dst_fd, string filter_pipe)

Given two I/O descriptors, reads data from src_fd, transforms it using filter_pipe and writes the result to descriptor dst_fd.

Both descriptors must be obtained using functions described in I/O functions.

5.8.1 Filters and Filter Pipes

A filter pipe is a string consisting of filter invocations delimited by pipe characters (‘|’). Each invocation is a filter name optionally followed by a comma-separated list of parameters. Most filters can operate in two modes: encode and decode. Unless specified otherwise, filters are invoked in encode mode. To change the mode, the encode and decode meta-filters are provided. Argments to these filters are filter pipes that will be executed in the corresponding mode.

The following Mailutils filters are available:

Filter: 7bit

In encode mode, converts its input into 7-bit ASCII, by clearing the 8th bit on each processed byte.

In decode mode, it operates exactly as the 8bit filter, i.e. copies its input to the output verbatim.

The filter takes no arguments.

Filter: 8bit
Filter: binary: Copies its input to output verbatim.

Filter: base64

Filter: B

Encodes or decodes the input using the base64 encoding.

The only difference between BASE64 and B is that, in encode mode, the former limits each ouput line length to 76 octets, whereas the latter produces a contiguous stream of base64 data.

In decode mode, both filters operate exactly the same way.

Filter: charset (cset)

Filter: charset (cset, fallback)

A convenience interface to the iconv filter, available for use only in the message_body_to_stream function. It decodes the part of a MIME message from its original character set, which is determined from the value of the Content-Type header, to the destination character set cset. Optional fallback parameter specifies the representation fallback to be used for octets that cannot be converted between the charater sets. Its use is described in See iconv.

This filter is normally takes its input from the mimedecode filter, as in:

message_body_to_stream(fd, msg, 'mimedecode|charset(utf-8)')

See mimedecode, for a detailed discussion.

Filter: crlf

Filter: rfc822

Converts line separators from LF (ASCII 10) to CRLF (ASCII 13 10) and vice-versa.

In decode mode, translates each CRLF to LF. Takes no arguments.

In encode mode, translates each LF to CRLF. If an optional argument ‘-n’ is given, produces a normalized output, by preserving each input CRLF sequence untouched (otherwise such sequences will be are translated to CR CR LF).

Filter: crlfdot

In encode mode, replaces each LF (‘\n’ or ASCII 10) character with CRLF (‘\r\n’, ASCII 13 10), and byte-stuffs the output by producing an additional ‘.’ in front of any ‘.’ appearing at the beginning of a line in input. Upon end of input, it outputs additional ‘.\r\n’, if the last output character was ‘\n’, or ‘\r\n.\r\n’ otherwise.

If supplied the ‘-n’ argument, it preserves each CRLF input sequence untranslated (see the CRLF above).

In decode mode, the reverse is performed: each CRLF is replaced with a single LF byte, and additional dots are removed from beginning of lines. A single dot on a line by itself marks the end of the stream and causes the filter to return EOF.

Filter: dot

In encode mode, byte-stuffs the input by outputting an additional dot (‘.’) in front of any dot appearing at the beginning of a line. Upon encountering end of input, it outputs additional ‘.\n’.

In decode mode, the reverse is performed: additional dots are removed from beginning of lines. A single dot on a line by itself (i.e. the sequence ‘\n.\n’) marks the end of the stream and causes the filter to return EOF.

This filter doesn’t take arguments.

Filter: from

Performs a traditional UNIX processing of lines starting with a ‘From’ followed by a space character.

In encode mode, each ‘From ’ at the beginning of a line is replaced by ‘>From ’.

In decode mode, the reverse operation is performed: initial greater-then sign (‘>’) is removed from any line starting with ‘>From ’.

The filter takes no arguments.

Filter: fromrd

MBOXRD-compatible processing of envelope lines.

In encode mode, each ‘From ’ optionally preceded by any number of contiguous ‘>’ characters and appearing at the beginning of a line is prefixed by another ‘>’ character on output.

In decode mode, the reverse operation is performed: initial greater-then sign (‘>’) is removed from any line starting with one or more ‘>’ characters followed by ‘From ’.

Filter: header

This filter treats its input as a RFC-2822 email message. It extracts its header part (i.e. everything up to the first empty line) and copies it to the output. The body of the message is ignored.

The filter operates only in decode mode and takes no arguments.

Filter: iconv (src, dst [, fallback])

Converts input from character set src to dst. The filter works the same way in both decode and encode modes.

It takes two mandatory arguments: the names of the input (src) and output (dst) charset. Optional third argument specifies what to do when an illegal character sequence is encountered in the input stream. Its possible values are:

none: Raise a e_ilseq exception.
copy-pass: Copy the offending octet to the output verbatim and continue conversion from the next octet.
copy-octal: Print the offending octet to the output using the C octal conversion and continue conversion from the next octet.

The default is copy-octal.

The following example creates a iconv filter for converting from iso-8859-2 to utf-8, raising the e_ilseq exception on the first conversion error:

iconv(iso-8859-2, utf-8, none)

Filter: inline-comment

Filter: inline-comment (str, [options])

In decode mode, the filter removes from the input all lines beginning with a given inline comment sequence str. The default comment sequence is ‘;’ (a semicolon).

The following options modify the default behavior:

-i, str: Emit line number information after each contiguous sequence of removed lines. The argument str supplies an information starter – a sequence of characters which is output before the actual line number.
-r: Remove empty lines, i.e. the lines that contain only whitespace characters.
-s: Squeeze whitespace. Each sequence of two or more whitespace characters encountered on input is replaced by a single space character on output.
-S: A whitespace-must-follow mode. A comment sequence is recognized only if followed by a whitespace character. The character itself is retained on output.

In encode mode the inline-comment filter adds a comment-starter sequence at the beginning of each line. The default comment-starter is ‘;’ and can be changed by specifying the desired comment starter as the first argument.

The only option supported in this mode is -S, which enables the whitespace-must-follow mode, in which a single space character (ASCII 20) is output after each comment sequence.

Filter: linecon

Filter: linecon (-i, str)

Implements a familiar UNIX line-continuation facility. The filter removes from itsinput stream any newline character immediately preceded by a backslash. This filter operates only in decode mode.

If given the arguments (‘-i’, str), enables the line number information facility. This facility emits current input line number (prefixed with str) after each contiguous sequence of one or more removed newline characters. It is useful for implementing parsers which are normally supposed to identify eventual erroneous lines with their input line numbers.

Filter: linelen (n): Limits the length of each output line to a certain number of octets. It operates in encode mode only and requires a single parameter: the desired output length in octets. This filter makes no attempt to analyze the lexical structure of the input: the newline caracters are inserted when the length of the output line reaches a predefined maximum. Any newline characters present in the input are taken into account when computing the input line length.

Filter: mimedecode: This is a domain-specific filter available for use only with the message_body_to_stream function. It decodes the part of a MIME message from whatever encoding that was used to store it in the message to a stream of bytes. See mimedecode.

Filter: quoted-printable
Filter: Q: Encodes or decodes the input using the quoted-printable encoding.

Filter: XML

In encode mode, the xml filter converts input stream (which must contain valid UTF-8 characters) into a form suitable for inclusion into a XML or HTML document, i.e. it replaces ‘<’, ‘>’, and ‘&’ with ‘<’, ‘>’, and ‘&’, correspondingly, and replaces invalid characters with their numeric character reference representation.

In decode mode, a reverse operation is performed.

The filter does not take arguments.