Yelt Training

Yelt Home page
HOWTOs
Yelt options
Writing Scripts
Program Control Statements
I/O Statements
Using Variables
Text processing Statements
Functions
Programming Examples
Extending yelt

Introduction

Yelt is stream text editor, like its inspiration, SED. A stream text editor is a program that reads input files and produces output to its standard output file stream based on the contents of its input files. This workflow is generally referred to as "filter processing". In fact many scripts sent to yelt or SED operate like filters. That is they discard parts of the input data and print out only the interesting material. However, both programs can also create new data and merge disparate parts of the input files to produce its output.

Yelt versus SED

These pages are about yelt, not SED. For information on SED, see the SED home page which can be found here. Note that many of the examples defined there can be used in yelt as well. Though not all of them are necessary in yelt.

Yelt's syntax depends heavily on SED's, but it is not a true proper superset. Here are some of the major differences:

Why use Yelt?

Yelt exists because SED doesn't do quite enough and perl, tkl, and other languages require more setup that necessary to do the small number of things more than SED does.

Why use filter programs in general?

Filter programs are nice to work with from a command line interpreter because they let you execute a command, see the results printed to the console window, then quickly change the command line and retry. When you have determined, by experiment, the complete command line, you can then cut-and-paste the command line into a script file -- if needed -- and often it isn't.

What is a filter program?

Filter programs can be strung together like this on a command line:
command1 option11 option12 ... | command2 option21 option22 option23 ... | command3 ...
That is, the output from command1 is passed directly to command2, whose output is then passed directly to command3 without any intervening files being created. For example, consider this command line:
head -10 file1 | cut -c1-20 | grep XYZ | yelt -e 's/Y/(Y)/g'
This sequence of commands produces no intermediate files and prints its final results to standard out -- ie to the console window in which it is invoked. It does the following:
  1. read file 1 and print only the first 10 lines to stdout
  2. which is then fed to the cut program which selects only the first 20 characters of each line and prints them to stdout
  3. which is then fed to the grep program which discards all lines except those containging XYZ, and the remainders are sent to standard out
  4. which is then fed to the yelt program which substitutes "Y" with "(Y)" and prints each resultant line to standard out.

So?

The ability of command line interpreters to recall previous commands for subsequent editing greatly simplifies the development process for filter programs because you can try, correct, and retry so quickly.

Why use streaming text editors?

The most common use of streaming text editors is the transformation of log files from one form to another. The output of most programs is designed by someone else. Often, the output is not formatted in the manner that is most useful to you. In most cases, it is not possible to rewrite the program producing the output or even to create a new one that produces desirable output. A streaming text edit session lets you discard the uninteresting material and reformat the remainders to your liking.

Stream text editors are useful for writing scripts but they are also useful in saving labor. Suppose you are trying to make a non-trivial change to a file. You could edit it by hand, or you could try a sequence of yelt commands to do the work for you. Then when you need to make the same changes to another file, you can re-use the existing commands.

For example:

First you execute yelt with only part of the final command sequence, like this:
yelt -e 's/string/other/g' file
Then if you like the results that get printed out, you press the up arrow and your keyboard to recall the previous command and add more yelt commands:
yelt -e 't' -e 's/string/other/g' file
In this case, tab expansion was added before the string conversion. If this output works better then you go on to the next change:
yelt -e 't' -e 's/string/other/g' -e '/badlines/d' file
As a side note, the "head" program can be used to cut excessively long listings down to size while you are debugging scripts using the above techniques. If your 'file' contains 10,000 lines, you may need only need the first 10 in order understand if your script is working. To long listings, rewrite the above script invocation like this:
head -10 file | yelt -e 't' -e 's/string/other/g' -e '/badlines/d'
As you can see from this example, it is not necessary to fully plan out a yelt script before you start calling yelt (or SED). Instead, you build up the yelt command lines by experiment. Obviously planning your implementation speeds up development, but sometimes you really don't know what you are going to do until you can see what is possible.

Yelt scripts can also be read from files or can be stored in a single command line parameter string. See Invoking yelt below.

Why use yelt?

The primary use of yelt or any other filter program is to perform text editing activities on ASCII files in an automated fashion. In may unix shell scripts, the following sort of command line is regularly employed:
program ... | grep ... | cut ... | sed ... | tr ... | sed ... | grep ....
That is, many scripts have a long series of text filters applied one after the other. Yelt exists to eliminate some of the program invocations. It does so by combining the features of many of the tools into a single command line invocation. This should reduce machine load and reduce script complexity and confusion.

Yelt is fast, flexible, and has a lot of built in tools to eliminate unnecessary calls to other filter programs. Yelt has the look and feel of SED but also has some features from the following filter programs:

Tool replaced Yelt Command Description
sed all Yelt is an improper superset of SED. Almost all features of SED can be found in yelt, though the syntax is different in some cases. Yelt does not have gotos or a hold buffer. Yelt has while loops, break, and continue statements instead of gotos and it has 10 string variables, not just the two found in sed. Instead of this:
sed -e 's/x/y/g'
do this:
yelt -e 's/x/y/g'
expand t Expand tabs in the input line so that you don't need a call to expand on the command line. Instead of this:
expand file | sed -e 's/x/y/g'
Do this:
yelt -e 't; s/x/y/g' file
cut c Select a subset of the characters in the input line so that you need to call cut. Instead of this:
grep 'string' file | cut -c1-99 | sed -e 's/x/y/g'
do this:
yelt -S "w{ n; /string/ { c 1-99; s/x/y/g; p; } }" file
Note that both the call to grep and cut have been eliminated here. Something similar could have been done with sed -- but you'd need a command something like this:
sed -e 's/\(.....[99 dots]\).*/\1/1' -e 's/x/y/g' file
Which isn't terrible, but if you had to change the number from 99 to 77, you'ld spend a lot of time counting dots to make sure...
tr y translate character ranges to avoid calls to tr on the command line. Note that yelt also lets you use character escape sequences, such as \n, as input and outputs in y commands (unlike SED).

Some SEDs support substitutions and translations involving control characters, like \n, \t, etc. However, they don't always. Yelt will.

Instead of this:

sed -e 's/stuff/stuff@/g' | tr '@' '\n'
do this:
yelt -e 's/stuff/stuff\n/g'
perl SW/SR/SC parse tokens out of lines so that you don't have to call perl. Actually perl can do many more things than yelt or SED but then you have to learn perl as well as the other tools. Of course you have to learn yelt too, but there's a lot less of yelt than there is of perl.
yelt F/W The F and W commands can eliminate extraneous calls to yelt itself because it allows yelt to compute the name of a file and perform yelt commands on that file without having to perform separate yelt steps.

Invoking yelt

Yelt is mainly used as a filter program. That is, it reads from the standard input and writes to the standard output (mainly). The data transformation between the input and the output is defined by command line options.

Thus yelt is invoked like this:

yelt [options] [file ...]
The following table describes the command line options:

Option Description
-h The -h option instructs yelt to print help and quit. The output of the -h option looks much like that which can be seen here.
-v The -v option instructs yelt to print its current version information.
-f scriptFile The -f option instructs yelt to use the script found in the specified script file. Note that the -e, -f, and -S options are mutually exclusive.
-S "scriptString" The -S option instructs yelt to use the script found in the string that follows. Note that the -e, -f, and -S options are mutually exclusive.
-e "cmd" The -e option instructs yelt to extend the default script by the command found in the following script fragment. Note that multiple commands can be specified in a single -e option string and they are separated by ;'s:
-e "s/z/b/g; /r/d; p;"
Note that the -e, -f, and -S options are mutually exclusive.

The default script looks like this:

w { n; extensions; p; }
Where the 'extensions' are a concatenation of all -e option strings.
-l Turns on execution time logging -- very verbose, not likely to be helpful without a lot of patience.
-M Defines the name of the string substitution file that is used by the 'M' command.
-D char Defines the character that delimites the keys from the values in the string substitution file defined by the -M option.
-r[0-9] string Pre-populate the specified string register with a given string value.

This is the mechanism for passing command line variables to the script. See the F command for a way to specify the file on which to work as a command line parameter (as an alternative to the normal mechanisms.)

Yelt script language

Where scripts come from

Most yelt scripts consist of nothing more than a sequence of -e command line options and thus verbosity is undesireable:
yelt -e "s1" -e "s2;s3" -e "s4" ...
Scripts are specified to yelt using one of the following methods: When using the -f or the -S option, a complete script must be specified, but when using the -e option, a default wrapper script is created for you and the -e statements are inserted into the middle thereof. That script looks like this:
    w
    {
      n;
      [-e option extensions]
      p;
    }
That is, the default script is a while loop that reads each line of the input file, processes that line, then prints it to standard out. So the above yelt invocation would create a final script that looks like this:
    w
    {
      n;
      s1;
      s2;
      s3;
      s4;
      p;
    }

Note however, that the "n" and "p" statements are actually shortened forms of "n0" and "p0". That is, the read and print statements actually refer to string variable 0. It is perfectly permissible to write the same script like this:
    w
    {
      n9;

      [other commands using variable 9]

      p9;
    }
There are 10 string variables, numbered 0-9, and they are used for all yelt commands.

What's in a script?

Statements in the yelt script language fall into one of the following forms:

Program control statements

Example Name Commentary
w while Repeatedly execute the statement block that follows. For example:
w { n; p; }
{ cmd [; ...] } block of statements Execute each of the statements in a list that begins and ends with {}.
b break
d continue continue the execution of the innermost while loop without executing any more statements in the current iteration . This statement is called 'd' rather than 'c' because in the default script, it has the effect of suppressing the final 'p' command at the end of the while loop that prints the current line -- thus this command seems to be a 'delete' of the current line, but in fact it is a continue of the loop.
Q exit Quit executing the current function or script if not in a function. AKA "return".
if if then else Perform a test, then decide based on that test, which of two statements to executed. The syntax is as follows:
if /regex/[~var][!]
{
  [if clause stuff]
}
else
{
  [else clause stuff]
}
Note that the regular expressions can be applied to variables other than 0, and that the expression can be inverted. Both of which are optional.
1,30 cmd range conditional statement Execute the cmd if and only if the current line number of the current input file (or stdin) is in the range 1-30 (inclusive).

If the second number is replaced by $, then this means until end of file. Thus, the range "40,$" means, from line 40 until the end of file.

/regex/ cmd pattern conditional statement The regular expression can be modified by ~var and by !, just like the regular expression in the if statement, above.
|[d][d][!] cmd computed conditional statement In this case, the string defining the regular expression is defined in the first register and the string to be compared against is found in the second register. The ! operator lets you invert the logic of the test. The cmd is executed only if the string matches the regex expression (including the ! behavior). For example:
	   q1 fred.*bill;
	   |12 { p2; d; }
	 
In this case, register 2 will be printed only if it contains fred[^b]*bill.
/r1/,/r2/ cmd two pattern conditional statement The regular expressions, r1 can be modified by ~var and by !, just like the regular expression in the if statement, above. r2 can be modified only by !.   If r1 applies to a specific variable, then r2 also applies to that variable (though such cannot be specified).

Note that either /r1/ or /r2/ can be replaced with a line number so that the body of the executable executes between a specific line number until a regular expression is encounter (or vice versa).

Also, /r2/ can be replaced with $. This means that the range terminates at the end of the file.

Def fun [digit]
{
  cmds
}
Define function Define a function named 'fun' with a specified parameter count and whose behavior is defined by the statements in the block statement that follows.

The parameter count defines the number of registers which are communicated between the caller and the called functions. See below.

C fun [digit][,..]
Call function Call the function named 'fun' given the list of parameters specified. The parameters must be a list of register numbers like this:
8,3,9
When a function is called, the registers named in the call's parameter list are copied into the register set of the called function. However, these registers are given new numbers: they are ordered 0, 1, 2, ....

All other registers in the called function are set to empty string.

Upon exit of the called function, its registers, 0-N, where N is the number of parameters to the function, are copied back into the caller's address space to the registers named in the invocation.

That is, the returned values from the function must be stored in the registers passed into the function as parameters.

For example, suppose a function needs 2 parameters, the caller might pass registers 4 and 9. The called function understands these two registers as 0 and 1. When the called function returns, its register 0 is copied back into register 4 of the caller and its register 1 is copied back into register 9 of the caller.

See also the F-Statement below.

I/O statements

Example Name Commentary
n[0-9] read line Read the next line from the current file and store it in the specified variable number -- or in variable 0 if none is specified. For example:
n;
n4;
N[0-9] push back line Push a line of text back into the input stream. That is, make a string available for the 'n' command to read as if it were a new line of text from the current input file. It is not necessary that the string be actually read from the file in the first place, any register variable can be "pushed back" into the input stream, but a common use of this statement type is to read sections of data from a file and when a new section is detected by an inner processing loop, it can push the section delimiter line back into the input stream and let the outer loop handle it.

Note that this command has no bearing on the current input line number -- so printing the line number of a pushed back line is equivalent to printing the next available line number from the input file -- which may not be the line number for the text that got pushed back -- it could be 1 larger than the desired line number. You can adjust line numbers using the "-" command if so desired.

p[0-9] [string prefix] print Print the contents of variable 0, or the that of the specified variable to standard out. For example:
p;
p9;

Note that the it is possible to specify a string to print before the text of register is printed. This text is specified in one of the following ways:

p stuff;
p3 "stuff";
p0 'stuff';
If quotes are specified, they will not be printed. The string is subject to escape sequence expansion, such that \s becomes blank, \n becomes newline, etc.
F[0-9] { cmds } read file Open the file whose name appears in variable 0 or the specified variable number. Read each line of that file and process it using the commands specified in the block statement.
q8 junk.txt;

F8
{
n;
s/.*/junk.txt: &/1;
p;
}

In this example, the file named junk.txt is opened for reading and each line is processed using the yelt commands: n, s, and p per the above script fragment.
R[0-9][0-9] [filename] read whole file Read the entire file into a register variable as a single string.

Note that the register variable will contain the newline characters from the input file. If you print the string using p, without removing said newline, you'll get an extra blank line.

This statement can be formatted in each of the following ways:
R Read the entire stdin file into register 0
R9 Read the entire stdin file into register 9
R1 filename Read the specified file into register 1
R34 Read the entire file whose name is in register 4 into register 3
Warning, if you use this command to read the stdin file, you are likely to get undesired results if you are using the default script.

The default script reads a line of input from the stdin file before giving control to user defined commands -- so if you must use the default script, you'll have to deal with the fact that the first line of the stdin file has been placed in register 0 before you started.

W[0-9][0-9] write file Open the file whose name is found in the first register number, and write the contents of the second register number to the file as its entire body, then close the file.

This is command is not meant for large scale file processing.

You must put your on \n's in the contents of the second variable.

Using Variables

There are 10 register variables in yelt. These variables are all strings. The + command and - command can be used to increment and decrement well formatted string variables. Normally this is only useful for manipulating counts.

Most commands affect one or more variables. Usually, the variable affected is embedded in the command invocation:

  p4;
  n9;
  s8/fred/bill/g;
This would be difficult with the pattern conditional statements which begin with /. In this case, the variable name appears at the end of the expression:
  /fred/~8
  /begin/~4, /end/
Register variable 0 is a special case. Since it is so heavily used, if the command does not specify a register, then register 0 can often be assumed to be the target of the operation. This does not apply to command which have multiple registers affected, such as: x, a, A, SR, SW, etc.

There are of course special commands whose principle function is to manipulate variagles:

For more information, see below

Text processing statements

The majority of commands in yelt are in fact text processing statements because that is the tool's primary function. Most text processing statements resemble function calls where the parameters are register numbers -- much like assembly language statements.

Yelt text processing commands operate on 0, 1, 2, or 3 registers and described in the table below.

Example Name Commentary
s[d]/patrn/rplcmnt/[1g] substitute See this section below.
t[0-9] expand tabs Expand tabs in variable 0 or the specified variable. Tabs are assumed to mean 8 character cells per the unix standard.
y[0-9]/set1/set2/ translate character set Convert character sets in variable 0 or the specified register. A character set can be a random list of individual characters, or it can contain a mix of individual characters and character ranges.

A character range is a pair of characters specified like this:

X-Y
Where "X" is the low end of a character range and "Y" is the high end. X and Y are both included in the range. For example:
a-z
Refers to the lower case letters.

To specify the individual character, "-", use one of the following:

  • \-
  • put the - at the beginning of the character set

To specify control characters, use one of the following:

Note, translating a character into a newline character (\n) does not create a new input line to yelt. Do so at your own peril.

See also: dealing with alphabetic case.

l[0-9] get current line number Store the current line number from the current file in the specified register (or register 0 if none are specified).

The text will include leading and trailing spaces. The spaces are defined to enable a columnar output for line numbers in the range 1-10,000.

L[0-9] get current file name Store the current file name in variable 0 or the specified variable. If reading from standard in rather than a named file, this will put an empty string in the output variable.
a[0-9][0-9] copy one variable to another Copy the contents of the first variable to the second. Both must be specified.
A[0-9][0-9] append one variable to another Append the contents of the first variable to the second. Both must be specified.
q[0-9] text load variable with text Store the specified text in variable 0 or the specified variable. The text is terminated by ';'. If you need to include a semicolon in the text, use '\;'. Other escape sequences will work as well -- see this.
c[0-9] x-y[,a-b...] cut character ranges Eliminate undesired text from variable 0 or the specified register. The cut ranges look like this:
[number]-[number][,range ... ]
For example:
c3 10-13,99-115
This commands to eliminate all the text in variable 3 except that found in columns 10 through 13 and between 99 and 115. The resultant will be stored in variable 3. That which was previously column 10 will now be stored in column 1. That which was previously in column 99 will now be in column 5.
SC[digit][digit] Count split character columns This command splits the text in the first register (as defined by the digit immediately after SC) into two parts. The first "Count" columns stay in the first register. The remainder of the text go into the second register. For example:
SC93 10
This command splits variable 9 into 2 parts: The first 10 characters stay in variable 9, but the rest goes into variable 3.
SW[d][d][d] delims split words This command splits the text in a register into three parts. Three registers must be specified (as 3 digits following the SW).

The delims string is a group of characters, any one of which ends the initial string.

The text currently in the first register (as defined by the first digit after SW) will be split into three parts:

  1. the part before the first delimiter stays in the first register.
  2. the character that matched the delimiter set and ended the first string will end up in the second register.
  3. all text after the first delimiter will be moved into the third register.
For example:
SW743 @:
In this case, the text in variable 7 will be split up like this:
  1. all text up to the first @ or : character will be left in variable 7.
  2. whichever character caused the split to occur will be left in variable 4
  3. all remaining text will go into variable 3.
SR[d][d][d] /regex/ split on regex This command splits the text in a register into three parts. Three register numbers must be specified as digits after the SR.

The /regex/ is a regular expression which serves as the delimiter for the splitting.

The text currently in the first register (as defined by the first digit after SR) will be split into three parts:

  1. the part before the text that matches the regex stays in the first register.
  2. the text matching the regex and which ended the first string will end up in the second register.
  3. all text after the first delimiter will be moved into the third register.
For example:
SR345 /bob/
In this case, the text in variable 3 will be split up like this:
  1. all text up to the first @ or : character will be left in variable 3.
  2. whichever character caused the split to occur will be left in variable 4
  3. all remaining text will go into variable 5.
M[0-9] map one string to another Using the table specified on the command line by the -M and -D command line options, see above, Convert the contents of variable 0 or the specified variable into its mapped form. For example, if the map file, m.txt, looks like this:
junk|crap
And the -M and -D options look like this: "-M m.txt -D '|'", and variable 9 contains 'junk', and you use the following command:
-e "M9"
then variable 9 will be left containing "crap".
j[0-9] count left justify register This command widens the specified register, or register 0 if none is specify to be at least count characters wide.

Spaces are added to the right hand side of the string to make it at least count characters wide.

J[0-9] count right justify register This command widens the specified register, or register 0 if none is specify to be at least count characters wide.

Spaces are added to the left hand side of the string to make it at least count characters wide.

+[0-9] increment a register Add one to the string in the specified register (or register zero). If the string in the register is not a valid integer string, then this will cause a fatal error.
-[0-9] decrement a register Subtract one from the string in the specified register (or register zero). If the string in the register is not a valid integer string, then this will cause a fatal error.

The substitute command
The most common yelt command to be used is the substitute command. Its syntax is borrowed directly from SED. The substitute commands lets you replace strings in a register. The strings to be replaced are defined by a regular expression. The replacement can be constant text or it can contain special characters that allow you include parts of the text that matched the regular expression into the output.

The substitute command consists of the following components:

See also: dealing with alphabetic case.

Regular Expressions
Yelt uses SED style regular expressions. That is to say, yelt regular expressions are Basic Regular Expressions, (BREs), documented here.

Yelt is implemented using the GNU regular expression source files regex.h and regex.c. Yelt passes a 0 as the eFlags option to regexec1 in the GNU source code. Note that this file requires that you obey the GNU GPS copyright as well as that of Lowell Boggs -- or use a different regex package (;->).

Functions

Yelt functions can execute any yelt statement and are defined like this:
    Def funcName parmCount
    {
      [yelt statements]
    }
And they are invoked like this:
    C funcName [parmList]
For example:
    C myFunc 9,2,0

    Def myFunc 3
    {
      A01;
      A12;
      if /badData/ Q;
      p2;
    }

Yelt functions can be called before they are defined, and vice versa. Redefining a function is illegal. Functions can be defined anywhere, but it makes sense to put the main script at the top of any yelt script and put the functions at the bottom. You cannot easily define functions using the -e options -- but technically you can.

To return from a yelt function, either allow the execution to drop off the bottom of the function, or trigger an early return using the Q statement. Q has no parameters.

Parameters passed to a function must be placed in the register list in the call statement. These parameters are stored in the called functions register set in ascending order, starting with register 0. So, if the caller passes 5,9,4, then the called function will see these values as 0,1,2. Upon return, the only data that the function can pass back to the caller is in this same register list. So, for example:

    # call function fred passing it the contents of this
    # function's (or script's) registers 9 and 7 and 4:

    C fred 9,4,7

    # check for the function's return value in register 4.

    /returnValue/~4
    {
      # the return value from fred is found in register 4
      #
      # Note that return data could be in either 9, 7, or 4.
    }

    ...

    Def fred 3
    {
      #
      #  parms are in 0 and 1, returning a value in r2:
      #

      /firstParm/~0 
      {
	# use the data from the caller's register 9
      }

      /secondParm/~1
      {
	# use the data from the caller's register7

	# the following code sets up an error-return
	# value

	q2 badValue;

	# the following statement executes an early return
	# from the function.
	Q;
      }

      #
      # returning data in the caller's register 4
      # because it is mapped to register 2 in the
      # function's register space.
      #

      q2 returnValue;
    }
Function parameters appear to the caller as if they are passed by reference.

Other than the rather clunky mechanism for calling functions and returning values, there is one additional wierdness of yelt functions: b, d, and n behavior can be interpreted by the caller, in some circumstances.

Yelt functions can define while loops just as in the main script. When the b, n, or d command executes from within a while loop inside a function, it behaves normally. If you execute a b, d, or an n statement which reads the end of the file outside a while loop but in a function, the function immediately returns and the caller acts as if the b, d, or empty n statement had been executed in it. For example:

  w
  {
      C myfunc
  }

  ...

  Def myFunc 0
  {
    # return and terminate the call'er while loop
    b; 
  }
This unusual feature lets your write functions that act like a "super-n" which operates across function boundaries. It avoids extra function parameters and return values that would otherwise be necessary to communicate this information and clutter up your script.

Again, if the b, d, or n is executed in a while loop inside the function, it will behave as normal.

Programming Examples

top
Simple Substitutions
Conditional Execution
Using Maps
Yelt Variables
Yelt Parsing Statements
Dealing the alphabetic case
This section is not yet complete.

The following sections describe how some yelt scripts are designed in significant detail. Please see HOWTOs for a more terse treatment of many common tasks.

Simple textual substitutions

Simple substitutions are the most common form of stream editing. Yelt provides these mechanisms for doing this: The s command is the most general, flexible, and useful. The M command is meant for performing large table lookups. It will be discussed in more detail later.

Trivial textual substitutions

To use yelt to replace all instances of one string with another, you can invoke it like this:
yelt -e 's/fred/bill/g'
In this case, yelt will read all the lines in its input file(s), either specified as file names on the command line or in from the standard input stream. For example, if you have two files named data1.txt data2.txt and they have the following contents:
data1.txt:
    line1 has text on it.
    line2 has fred's name on it
    line3 has something else

data2.txt:
    susan
    tom
    hank
    fred
    sundar
    gregor
You can convert all copies of 'fred' in the input stream to 'bill to produce a concatenation of the two modified files like this: yelt -e 's/fred/bill/g' data1.txt data2.txt >mergedAndFixed.txt Or you could have done this on unix: cat data1.txt data2.txt | yelt -e 's/fred/bill/g' >mergedAndFixed.txt Or this on Windows: type data1.txt data2.txt | yelt -e 's/fred/bill/g' >mergedAndFixed.txt In all of these cases, mergedAndFixed.txt will end up containing:
    line1 has text on it.
    line2 has bill's name on it
    line3 has something else
    susan
    tom
    hank
    bill
    sundar
    gregor
Note that if you did not desire to replace all instances of fred with bill, yelt, like SED, offers you the option of only replacing only the first instance of "from" string" with the "to" string. To choose this option, rather than the "global" option, change the 'g' at the end of the s command to a '1'.

Note also, that the s command can be applied to any yelt variable -- not just the default (variable 0).

Watch out for "special" characters in the "from" and "to" strings. The s command expects the "from" string to be a regular expression and the "to" string must be formatted as a replacement string.

Trivial regex replacements

Regular expressions can be very complicated to specify and understand but luckily there are a number of simple expressions that one can learn and which can be repeated with great utility.
Here's how you would add a prefix and suffix to every line in the input stream
yelt -e 's/.*/PREFIX&SUFFIX/1'
If the input stream or file looked like this:
  line1
  stuff on line2
  more on line3
Then yelt's output would look like
  PREFIXline1SUFFIX
  PREFIXstuff on line2SUFFIX
  PREFIXmore on line3SUFFIX
The "from" regular expression above means "any character and any that follows it". That means that .* will match all the text on the line.

Note that the star character, '*', does not mean "and any that follow". Rather, the '*' character means "any that follow that also match the same rule as the sub-expression before the '*'" (in this case '." which means "match any character").

The "to" or "replacement" expression only has 1 special character over and above that which might guess from the above example. In the "to" expression of an s command the & character is replaced with the actual text from the input stream that matches "from" regular expression. There is another character sequence which does the same thing. In the "to" string of the s command, the sequence \0 is also replaced with matching input text.

Here's how to remove leading and trailing blanks
yelt -e 's/^  *//1; s/  *$/1''
Because spaces are often hard to read, let me spell out the above "from" expressions:
  caret space space star
  space space star dollar
Note that in this example, the -e option specifies two yelt commands in a single expression -- the commands are separated by the semicolon character. This was done just to show that it could be done -- not that this choice has any bearing on how yelt executes the statements.

In the example above, the first substitute command only operates on text at the beginning of the line. That is, it will have no effect on spaces inside the line or at its end. Whereas the second substitute command operates only on text at the end of the line.

For a regular expression to be constrained so that it only affects the beginning of the line, you must begin it (the "from" string) with caret (^). To constrain its effects to the end of the line, it must end in the dollar sign ($).

In the example above, both at the beginning and at the end of the line, the substring space and any space that follows it are being deleted. That is, substituted with nothing. Note the comments on the star operator earlier.

Here's how to eliminate duplicate characters
yelt -e "s/AAA*/A/g"
This invocation collapses sequences of A into a single A.

As seen above, you could use the same logic to eliminate duplicate spaces (a much more common need than reducing duplicate A characters).

Here's how to eliminate all duplications
This example is the generic version of the previous. It lets you eliminate all duplicates of any character in one compact substitute command:
yelt -e 's/\(.\)\1\1*/\1/g'
This "from" expression has the following meaning:
Match any single character which is followed by at least one duplicate of itself.
The "to" expression means:
Replace the text matching the "from" expression the text that matched the first parenthesis group in the "from" expression.
While this is complicated and ugly, it gives the substitute command enormous power. You can work miracles with properly written regular expressions -- but lots of comments in scripts are often required in order make this code maintainable.

How the "from" regex works:

In the regular expression syntax, \( means "begin a sub-expression". The \) means "end the sub-expression". Sub-expressions are addressable both in "from" expressions and "to" expressions. That is, a later part of a regular expression can be defined in terms of early parts -- but only if the earlier parts are defined in a sub-expression.

So, the following fragment:

\(.\)\1
Means the following: define the first sub-expression to be the "." regular expression operator. Then, the larger expression requires that the text immediately following the text which matched the first sub-expression be a duplicate of that which was found in the first sub-expression.

Since the "." operator matches any single character, then the above fragment means: any character followed by itself.

The whole expression:

\(.\)\1\1*
then means: any character followed by 1 copy of itself, optionally followed by more copies of itself.
The "to" string, just tells the s command to replace all the text that matched the "from" expression with the contents of the first sub-expression -- in this case a single copy of the 1 character that began a multi-copy sequence.
Here's how to re-arrange text
Suppose you have input text that has the correct data on it but you desire it to be in a different order on the same line -- and some data on the line is un-desirable. Here's how to parse out the parts of interest and only print them:
yelt -e 's/^useless stuff\(interesting part1\)more junk\(more interesting stuff\).*/\2 \1/1'
In this case, the line begins with un-needed text but two interesting parts which are in the wrong order. The yelt invocation locates the interesting parts, and replaces the entire line with nothing more than the desired text -- but placed in the correct order.

If the text is column oriented, you can use the cut command to do this with a bit more clarity, but the s command works well for this.

Watch out for regular expressions that don't work the way that you expect them to. Regular expressions parse one character at a time -- sometimes if you try complicated expressions, with lots of sub-expressions, you create unusable patterns. Try splitting them up -- or use multiple s commands that have slightly different "from" regular expression patterns but have same "to" replacement patterns.
Expanding tabs in the input stream
Many programs which produce log files include tab characters in their output. Also, many people configure their text editors to produce tab characters -- presumably to save space or simplifying indentation. This causes havoc in at least the following situations:
  1. different people set their text editors to show tabs in different ways and thus two different authors may get into tab size wars.
  2. scripts which need to match patterns on text which involve spaces may not automatically detect tabs as spaces
  3. scripts which operate on columns of text get totally confused when non-standard tab settings are used.
I can't help you with issue 1 -- the best solution is simply to require that all editor's turn off tab generation -- or to have their text editors use tabs only to produce the standard spacing: 8 characters. Tab characters began on unix text consoles and saving space was an important feature due to smaller memory and disk drives available at the time. However, the world has moved on. Like most early ideas, too many people thought that they had a right to do things in whatever way popped into their heads and thus tabs went from being helpful to being a problem.

Yelt provides the command t to allow you to automatically expand tabs per the standard spacing of 8 characters. It does so much quicker than could be done with some sort of substitute command. However the y command can translate tabs out of existence even faster. All three approach work and will probably become necessary in different situations.

The t command will expand tabs, from this:

a\tb
to this:
a       b
The y command, given this kind of argument: "y/\t/ /" will convert from:
a\tb
to this:
a b
And if you use s command, you can change the \t to any string you want.

Using the map command

The substitute statement can be effectively used to perform constant text substitutions like this:
yelt -e "s/a/b/g" -e "s/r/s/g" -e "s/T/U/g" -e "s/M/N/g" ...
However, these statements execute quickly, but the more of them that you need to perform needed substitutions, the slower the script runs. To a rough approximation, the time required is proportional to O(N). Yelt provides a way to speed this up to O(ln(N)). This only matters if the number of substitutions is huge -- hundreds or thousands. The map statement is used to convert the contents of a register into its mapped form. That is, to use the map statement you do the following: For example, suppose you have a program which produces a logfile containing employee numbers and you wish to replace them with the actual employee names. You could build a yelt script that includes employee numbers and replaces them with the actual person's name. The script might be stored in a file like this:
script.yelt:
    s/01437351/Bill Billips/g;
    s/19343728/Susan Steffano/g;
    s/00000001/Bill Gates/g;
    s/00000002/Bill Joy/g;
    s/00000003/Paul Alan/g;
To use this script, you'd run it like this:
generateLog | yelt -f script.yelt
This would work fine -- but what if your employee database has 50,000 people in it? In that case, you might want to create an employee database of the following form:
employeeDatabase.map:
    01437351=Bill Billips
    19343728=Susan Steffano
    00000001=Bill Gates
    00000002=Bill Joy
    00000003=Paul Alan
And you would use the database like this:
generateLog | yelt -M employeeDatabase.map -D "=" -f fixEmployeeNumbers.yelt
This invocation requires an additional yelt script:

fixEmployeeNumbers.yelt:


    #
    # the following function replaces 8 digit employee numbers
    # with the corresponding employee's name
    #

    w
    {
      n;

      /\<[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]\>/
      {
	#
	#  the current line (in variable 0) has an employee number on it --
	#  and the employee number is a word in its own right -- not some
	#  accidental misinterpretation of larger number.  That's why the
	#  regex above uses \<\> -- to match only whole words.
	#
	#  force variable 9 to be empty -- and use it as the accumulator
	#  for all changed text

	q9;
	q2;

	w
	{
	  #
	  #  split the text of the line into 3 parts -- using the employee
	  #  number as a delimiter:
	  #

	  SR012 /\<[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]\>/;

	  #
	  # check to see if there really was an employee number (the first
	  # time through the while loop there will be, but in subsequent
	  # iterations there may not be.  If the employee number is not found
	  # then append the text that did not get split out to the outgoing buffer
	  # and leave the loop.
	  #

	  /^$/~1 { A09; b };

	  #
	  #  Register 0 now contains the part before the employee number
	  #  Register 1 contains the employee number
	  #  Register 2 contains the stuff after the employee number
	  #
	  #  first, append the text that is not an employee number to variable 9

	  A09;

	  # next replace the employee number with the employee name and append it
	  # to register 9;

	  M1;
	  A19;

	  #
	  # finally, move register 2, containing the stuff after the employee
	  # number to variable 0 for continued parsing.
	  #

	  a20;
	}

	# move the accumulator back to register 0 for subsequent printing.

	a90;


      }

      p;

    }

Conditional Execution

top
while loops
if-then-else
continuing loops
breaking out of loops
line number ranges
see this
regex pattern ranges
see this
quitting the script or function
Dealing with alphabetic case
Sometimes it is desirable to perform different kinds of processing on lines which contain fundamentally different kinds of text. For example, a log from some program might have header, body, and footer lines. You might wish to have different substitutions applied to these different kinds of lines. For example:
inputFile.txt:

  Program XYZ begins
  Source file:  somefile.inp
  Output file:  otherFile.inp:
  Trace output:

    information1
    information2
    information3

  Execution complete:
  Run time:  1 hour.

In this example, the useful information produced by the program's execution might be just the parts between the "Trace output:" line and the "Execution Complete:" line. So, a first step of filtering might be delete the unwanted lines, and then apply some other processing steps.

Use grep to pre-filter the data

One way to do this would be to use grep to filter out the lines you are not interested in and have yelt process the rest:
grep '^ ' | yelt -e 's/other/stuff/g'

Use yelt to delete unwanted data then process the rest

In the above case, grep is filtering out the lines that do not begin with a space. Yelt of course, can combine both activities. There are several approaches to doing that.

The obvious method of doing this is to use an if statement to decide what text needs to be discarded and what needs to be processed:

yelt -e "if /^ / { s/other/stuff/g; } else d; "
In this example, the if statement processes a regular expression that determines whether the if or the else clause is to be executed. Lines that begin with a space are assumed to be the "interesting" and are thus transformed. All other lines are deleted by continuing the while loop which is a part of the default script.

Another method is to detect lines that you are not interested in and discard them using the pattern conditional statement -- which acts kind of like an if statement but which does not support an else clause. After deleting the uninteresting lines, operate on all the rest. For example:

yelt -e "/^[^ ]/d" -e "s/other/stuff/g"
The first -e statement, above, has the following meaning: on lines which do not begin with a space, continue current while loop. this has the effect of suppressing all other processing in the while loop -- including the automatic printing of current line that is a natural part of the default yelt script. The d command suppresses the execution of all other -e options as well.

Another approach would be to rewrite the whole script and simply not bother with the automatic print at the end -- then you would have to make the script detect lines of interest, perform any necessary transformations and print the results:

yelt -S "w{ n; /^ /{ s/other/stuff/g; p; } }"
In this case, the while command, w, repeatedly executes the block of commands that follow it. The n command reads the next line of input. There is not automatic printing of the current line, so if any output is to occur, the remaining commands must do so.

In this case, the remaining command is a pattern conditional statement. That is, a regular expression is used to determine if work needs to be done. In this case, the expression checks the text in variable 0 (the current line in this case) to see if it begins with space. On lines where it does, a block of statements is executed. In this case, the block performs a simple substitution and prints the line.

Another way of using pattern conditional statements is to delete all lines, but print the interesting ones first:

yelt -e "/^ /{ s/other/stuff/g; p; } d;"
Here, lines beginning with a space are processed with a block of commands which perform needed transformations, the print the line. All lines are subsequently deleted (ie the while loop is continued).

Yet another way is to notice that the first four lines of the log file are always to be discarded, and use the range conditional statement to discard them while working on the rest -- you'd have to deal with the last two lines though, like this:

yelt -e "1,4d; /^Execution/,$d; s/other/stuff; }"
In this case, the 1,4d statement eliminates the first few lines. The two pattern conditional statement eliminates the last few lines, and any line not yet deleted gets processed and printed automatically.

Yelt variables

Instead of the two variables that SED provides, yelt provides 10. They are numbered 0-9. Most yelt statements (aka "commands") imbed the variables being used in their names. With no embedded variable meaning variable 0. For example:
    s/a/b/g         -- modify variable 0 such at all a's become b's
    s3/x/y/1        -- modify variable 3 such that the first x becomes y
    t9              -- expand tabs in variable 9
    /fred/~3! cmd   -- execute cmd if variable 3 does not contain fred
    SR987 /  */     -- split variable 9 into 3 pieces:  the text up to the
		       first blank is left in 9.  The space, and all spaces
		       contiguous to it are moved into variable 8, and the
		       rest of the text is moved into variable 7.
Note that since most processing operates on the current input line, it is generally advisable to reserve variable 0 for that purpose -- but there is no rule saying that you must.

The following statement types perform simple variable manipulations:

Alphabetic case -- comparisons and conversions

Yelt provides the y or translate command to allow you convert one character set to another. The most common use of this is to convert between upper and lower case. For example, the following commands converts to all upper case.
    yelt -e 'y/a-z/A-Z/'
and this converts to all lower:
    yelt -e 'y/A-Z/a-z/'
Case insensitive compares can be done either of two way:

  1.  by writing the regular expression to check for both cases:

      /[aA][bB][cC]/ { q1 Found abc or ABC; p1; d; }

      However, this mechanism is quite slow to execute and is very cumbersome
      to use.  

  2.  by converting your regex into all lowercase and converting your data to
      all lower case and comparing against strict lower case.

Since yelt provides 10 registers, option 2 is not quite so bad -- it doesn'tr require that you actually modify your working data -- you can copy the string to a second register and perform the comparison against that one consider:
    n;
    a01;
    y1/A-Z/a-z/;
    /fred/~1 { p1 found fred; p1; Q; }
This of course assumes that your regular expression is coming into you in lower case. If your regular expressions are read from a file instead of being hard coded constants in the program, you have to be creative.

One approach is to use the computed conditional statement. In this case, you don't use a hard coded regular expression but rather make a test based on the contents of two registers.

Basically, you copy the erstwhile regular expression into a register, then lower case it, the use the computed conditional to test some other string to see if matches:

    n4;
    a45;
    y5/A-Z/a-z/;
    |52 { p5 "the string in register 2 matches the expression: "; }

Parsing

top
Reading entire files
Tokenizing
Finding C++ Function Calls

Parsing primitives

Parsing is the act of split text into manageable pieces. The
substitute command lets you do that to a large extent. However, the s command was not really meant for repeatedly lopping off the first part of a piece of text and procesing it separately. Yelt does provide commands that are meant aid in writing scripts that perform parsing: To use the parsing statments, you split the text in a register into one or more other registers. Thus, if you want to repeatedly process a stream of text, you need to have at loops within loops: the outer loop gets lines of text from the source stream and the inner loop processes the tokens on a given line.

Alternatively, you might make a function that returns the next token out of the stream. This function might use this basic logic:

See the "fix employee numbers" example above.

Reading Entire files into a string

Note that the R statement is designed to work with the parsing commands. It lets you read the entire file into a register varable and process it using either a substitute command or one of the above parsing statements. Note that when you use the R command, just as when you translate characters into newlines, you end up with a slightly confused situation: more than one line in a given string. When you print strings with embedded line feeds, even if the embedded line feed is at the end of the string, the print commands puts another line feed on for you. This can result in double line feeds. You can use the substitute command to get rid of the trailing line feed before printing -- if needed:

s/\n$//1

Tokenizing

The following function can be included in any script and is used to parse C++ style tokens (except for the old fashioned C-Style comments) one at a time from an input stream:
    Def tokenize 2
    {
      # register 0 is where the input stream goes
      # register 1 is where the returned token goes
    
      # make sure the returned token is empty if we have a problem
    
      q1;
    
      w
      { 
	#
	# process input lines until a token is found or the end of stream
	# occurs.
	#
    
	w
	{
	  #
	  #  make sure there is a non-comment token available in the input
	  #  stream
	  #
      
	  /^$/ n;
	  
	  #
	  # remove leading blanks from the input data
	  #
	  
	  s/^[ \t][ \t]*//1;
	  
	  
	  #
	  # remove C++ style comments by just reading
	  # the next line and removing leading blanks
	  #
	  
	  SR012 /^\/\/.*/;
      
	  #
	  # if no comment was found, break out of this loop
	  #
      
	  /^$/~1 b; 
      
	  q0;
	  q1;
	  
	}
      
	#
	# if not end of file, process tokens
	#
      
	/./
	{
	  #
	  #  Look for single quoted strings
	  #
    
	  SR012 /^'\(\\*.\)*'/;
    
	  if /./~1
	  {
	    a20;
	  }
	  else
	  {
	    #
	    #  Look for double quoted strings
	    #
	    SR012 /^"\(\\*.\)*"/;
	    
	    if /./~1 
	    { 
	      a20; 
	    }
	    else
	    {
	    
	      SR012 /^[A-Za-z0-9_]\+/;
	    
	      if /./~1 
	      { 
		# r1 contains an identifier, return it and the rest of the line to 
		# be processed
		a20; 
	      }
	      else
	      {
		SR012 /^[<>=!:+*\/|&-]\+/;
    
		if /./~1
		{
		  # >=, >>=, etc
    
		  a20;
		}
		else
		{
		  SR012 /./;
		  a20;
		}
	      }
	    }
	  }
	}
      
	if /./~1!
	{
	  n;
	}
	else
	  b;
	
     }
    
    }
The following example script uses the above function to print the tokens found in a C++ source file:
  w
  {
    #
    #  this script reads tokens from the input stream until end of file
    #  is encountered.  Each token is printed on its own line.  This
    #  script is just an example, a lot work would necessary to turn it into
    #  real usable program.  Also, for the latest copy of the script, see
    #  the source directory TestLib/parse.script.
    #
  
    #
    #  Function tokenize expects 2 parameters:  register 0 contains the
    #  text of the current input line.  Register 1 is the returned token
    #  after the very next token is extracted from the input stream.
    #
    #  The tokenize function is meant to be called repeatedly until the input
    #  stream is exausted -- and all invocations should use the same pair of
    #  registers.
    #
  
  
    C tokenize 0,1 
  
    #
    #  If the tokenize function returns an empty string in register 1 then
    #  the final end of file has happened -- so quit.
    #
  
    /./~1! b;
  
    #
    #  print the current token and quit
    #
  
    p1;
  
  }

To run the above script, combine it into a single file with the Tokenize function above.

Finding Function Calls in C++ Source

The following script is invoked like this:
yelt -f findFuncs.yelt -r8 SomeCplusplusFile -r9 .
And it prints file name and line numbers where function-call-like syntax is found in the named file. You must include the Tokenize function above in the same file as the following text:
    
    
    #
    # We assume the name of the function to fix is in r9 and was set there
    # using the "-r9 funcname" option to the command line
    # We also assume the name of the file on which to operate is r8 set similarly.
    #
    
    q0;
    
    F8
    {
      #
      # process lines in the file
      #
      /^$/ n;
    
      SR012 /[a-zA-Z_0-9]\+ *(/;
    
      #
      # q4 is a flag meaning that this line does not contain a call
      #
      q4 notProcessed;
    
      |91 
      {
	/^if *(/~1
	{
	  # don't process if statements
	  a20;
	  d;
	}
    
	/^while *(/~1
	{
	  # don't process while statements
	  a20;
	  d;
	}
    
	/^for *(/~1
	{
	  # don't process if statements
	  a20;
	  d;
	}
    
	/^switch *(/~1
	{
	  # don't process if statements
	  a20;
	  d;
	}
    
	q4;
    
	#
	# we have found the desired function -- set up to log it out
	# print the filename and line number and the function name.
	# Also print the call parameters.
	# Use r7 as the line buffer to hold the output until it is all available
	# then flush it all at once
	#
    
	 a87;
	 l6;
	 J6 8;
	 A67;
	 q6:;
	 A67;
	 A17;
    
	 # we have everything but the call parameters in r7 -- now read tokens and store
	 # them in r7 until we are out of data or find the trailing ).
    
	 #
	 #  put the rest of the current input line back into r0 for later processing
	 #
	 a20;
    
	 #
	 # use r5 as a nest depth -- it is currently 1 because we already have the leading 
	 # paren parsed away by the above SR invocation.
	 #
	 # use r3 as a parameter count
	 #
	 q5 1;
	 q3 0;
    
	 w
	 {
	   #
	   # read the next token out of the input stream
	   #
	   C tokenize 0,1;
    
	   /^$/~1 
	   {
	     # premature end of file
	     b;
	   }
    
	   /^(/~1 +5;
	   /^)/~1 { -5; /0/~5 { A17; b;} }
    
	   # append the token to the output text
    
	   q6 \s;
	   A67;
	   A17;
    
	   /^1$/~5
	   {
	     #
	     # if the parenthesis nest depth is exactly 1 and we have a comma
	     # increment the parameter count
	     #
    
	     if /^,/~1
	     {
	       #
	       # increment parameter count
	       #
	       +3; 
	     }
	     else
	     if /^0$/~3
	     {
	       # we have a any text at the top level and the parm count is 0
	       # then change it to 1
    
	       +3;
	       
	     }
	   }
    
    
	 }
    
    
	 #
	 # include the parameter count in the output text
	 #
	 
    
	 q6 \s:\sparms=;
	 A36;
	 A67;
    
	 p7;
    
    
      }
    
      /^$/~4! 
      { 
	#
	# if this line does not contain a function call, set it to empty
	# so that the top of the loop will read a new line
	#
	q0; 
      }
    
    
    }
    
Note that the above function is a tad more complicated than is technically necessary. The -r8 option could be eliminated in favor of just processing the normal yelt input files and the -r9 option is only necessary if you wish to show selected functions -- if you want to print the ALL you wouldn't need the -r9 behavior at all.