Yelt Training |
Yelt Home page HOWTOs Yelt options Writing Scripts Program Control Statements I/O Statements Using Variables Text processing Statements Functions Programming Examples Extending yelt |
Yelt's syntax depends heavily on SED's, but it is not a true proper superset. Here are some of the major differences:
command1 option11 option12 ... | command2 option21 option22 option23 ... | command3 ...That is, the output from command1 is passed directly to command2, whose output is then passed directly to command3 without any intervening files being created. For example, consider this command line:
head -10 file1 | cut -c1-20 | grep XYZ | yelt -e 's/Y/(Y)/g'This sequence of commands produces no intermediate files and prints its final results to standard out -- ie to the console window in which it is invoked. It does the following:
Stream text editors are useful for writing scripts but they are also useful in saving labor. Suppose you are trying to make a non-trivial change to a file. You could edit it by hand, or you could try a sequence of yelt commands to do the work for you. Then when you need to make the same changes to another file, you can re-use the existing commands.
For example:
First you execute yelt with only part of the final command sequence, like this:As you can see from this example, it is not necessary to fully plan out a yelt script before you start calling yelt (or SED). Instead, you build up the yelt command lines by experiment. Obviously planning your implementation speeds up development, but sometimes you really don't know what you are going to do until you can see what is possible.Then if you like the results that get printed out, you press the up arrow and your keyboard to recall the previous command and add more yelt commands:yelt -e 's/string/other/g' file
yelt -e 't' -e 's/string/other/g' fileIn this case, tab expansion was added before the string conversion. If this output works better then you go on to the next change:yelt -e 't' -e 's/string/other/g' -e '/badlines/d' fileAs a side note, the "head" program can be used to cut excessively long listings down to size while you are debugging scripts using the above techniques. If your 'file' contains 10,000 lines, you may need only need the first 10 in order understand if your script is working. To long listings, rewrite the above script invocation like this:head -10 file | yelt -e 't' -e 's/string/other/g' -e '/badlines/d'
Yelt scripts can also be read from files or can be stored in a single command line parameter string. See Invoking yelt below.
program ... | grep ... | cut ... | sed ... | tr ... | sed ... | grep ....That is, many scripts have a long series of text filters applied one after the other. Yelt exists to eliminate some of the program invocations. It does so by combining the features of many of the tools into a single command line invocation. This should reduce machine load and reduce script complexity and confusion.
Yelt is fast, flexible, and has a lot of built in tools to eliminate unnecessary calls to other filter programs. Yelt has the look and feel of SED but also has some features from the following filter programs:
Tool replaced | Yelt Command | Description |
---|---|---|
sed | all | Yelt is an improper superset of SED. Almost all features of SED can be found in
yelt, though the syntax is different in some cases. Yelt does not have gotos or
a hold buffer. Yelt has while loops, break, and continue statements instead of
gotos and it has 10 string variables, not just the two found in sed. Instead of this:
sed -e 's/x/y/g'do this: yelt -e 's/x/y/g' |
expand | t | Expand tabs in the input line so that you don't need a call to expand on the
command line. Instead of this:
expand file | sed -e 's/x/y/g'Do this: yelt -e 't; s/x/y/g' file |
cut | c | Select a subset of the characters in the input line so that you need to call cut.
Instead of this:
grep 'string' file | cut -c1-99 | sed -e 's/x/y/g'do this: yelt -S "w{ n; /string/ { c 1-99; s/x/y/g; p; } }" fileNote that both the call to grep and cut have been eliminated here. Something similar could have been done with sed -- but you'd need a command something like this: sed -e 's/\(.....[99 dots]\).*/\1/1' -e 's/x/y/g' fileWhich isn't terrible, but if you had to change the number from 99 to 77, you'ld spend a lot of time counting dots to make sure... |
tr | y | translate character ranges to avoid calls to tr on the command line. Note that yelt
also lets you use character escape sequences, such as \n, as input and outputs in
y commands (unlike SED).
Some SEDs support substitutions and translations involving control characters, like \n, \t, etc. However, they don't always. Yelt will. Instead of this: sed -e 's/stuff/stuff@/g' | tr '@' '\n'do this: yelt -e 's/stuff/stuff\n/g' |
perl | SW/SR/SC | parse tokens out of lines so that you don't have to call perl. Actually perl can do many more things than yelt or SED but then you have to learn perl as well as the other tools. Of course you have to learn yelt too, but there's a lot less of yelt than there is of perl. |
yelt | F/W | The F and W commands can eliminate extraneous calls to yelt itself because it allows yelt to compute the name of a file and perform yelt commands on that file without having to perform separate yelt steps. |
Thus yelt is invoked like this:
yelt [options] [file ...]The following table describes the command line options:
Option | Description |
---|---|
-h | The -h option instructs yelt to print help and quit. The output of the -h option looks much like that which can be seen here. |
-v | The -v option instructs yelt to print its current version information. |
-f scriptFile | The -f option instructs yelt to use the script found in the specified script file. Note that the -e, -f, and -S options are mutually exclusive. |
-S "scriptString" | The -S option instructs yelt to use the script found in the string that follows. Note that the -e, -f, and -S options are mutually exclusive. |
-e "cmd" |
The -e option instructs yelt to extend the default script by the command found
in the following script fragment. Note that multiple commands can be specified
in a single -e option string and they are separated by ;'s:
-e "s/z/b/g; /r/d; p;"Note that the -e, -f, and -S options are mutually exclusive. The default script looks like this: w { n; extensions; p; }Where the 'extensions' are a concatenation of all -e option strings. |
-l | Turns on execution time logging -- very verbose, not likely to be helpful without a lot of patience. |
-M | Defines the name of the string substitution file that is used by the 'M' command. |
-D char | Defines the character that delimites the keys from the values in the string substitution file defined by the -M option. |
-r[0-9] string | Pre-populate the specified string register with a given string value.
This is the mechanism for passing command line variables to the script. See the F command for a way to specify the file on which to work as a command line parameter (as an alternative to the normal mechanisms.) |
yelt -e "s1" -e "s2;s3" -e "s4" ...Scripts are specified to yelt using one of the following methods:
w { n; [-e option extensions] p; }That is, the default script is a while loop that reads each line of the input file, processes that line, then prints it to standard out. So the above yelt invocation would create a final script that looks like this:
w { n; s1; s2; s3; s4; p; }
Note however, that the "n" and "p" statements are actually shortened forms of "n0" and "p0". That is, the read and print statements actually refer to string variable 0. It is perfectly permissible to write the same script like this:w { n9; [other commands using variable 9] p9; }There are 10 string variables, numbered 0-9, and they are used for all yelt commands.
See also the F-Statement below.
I/O statements
Example | Name | Commentary | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
n[0-9] | read line |
Read the next line from the current file and store it
in the specified variable number -- or in variable 0
if none is specified. For example:
n; |
||||||||
N[0-9] | push back line |
Push a line of text back into the input stream. That is,
make a string available for the 'n' command to read as if it
were a new line of text from the current input file. It is
not necessary that the string be actually read from the file in
the first place, any register variable can be "pushed back" into
the input stream, but a common use of this statement type is to
read sections of data from a file and when a new section is detected
by an inner processing loop, it can push the section delimiter line
back into the input stream and let the outer loop handle it.
Note that this command has no bearing on the current input line number -- so printing the line number of a pushed back line is equivalent to printing the next available line number from the input file -- which may not be the line number for the text that got pushed back -- it could be 1 larger than the desired line number. You can adjust line numbers using the "-" command if so desired. |
||||||||
p[0-9] [string prefix] |
Print the contents of variable 0, or the that of the
specified variable to standard out. For example:
p; Note that the it is possible to specify a string to print before the text of register is printed. This text is specified in one of the following ways: p stuff;If quotes are specified, they will not be printed. The string is subject to escape sequence expansion, such that \s becomes blank, \n becomes newline, etc. |
|||||||||
F[0-9] { cmds } | read file |
Open the file whose name appears in variable 0 or the
specified variable number. Read each line of that file
and process it using the commands specified in the block
statement.
q8 junk.txt;In this example, the file named junk.txt is opened for reading and each line is processed using the yelt commands: n, s, and p per the above script fragment. |
||||||||
R[0-9][0-9] [filename] | read whole file |
Read the entire file into a register variable as a single
string.
Note that the register variable will contain the newline characters from the input file. If you print the string using p, without removing said newline, you'll get an extra blank line. This statement can be formatted in each of the following ways:
The default script reads a line of input from the stdin file before giving control to user defined commands -- so if you must use the default script, you'll have to deal with the fact that the first line of the stdin file has been placed in register 0 before you started. |
||||||||
W[0-9][0-9] | write file |
Open the file whose name is found in the
first register number, and write the contents
of the second register number to the file
as its entire body, then close the file.
This is command is not meant for large scale file processing. You must put your on \n's in the contents of the second variable. |
Using Variables
There are 10 register variables in yelt. These variables are all strings.
The + command and - command
can be used to increment and decrement well formatted string variables. Normally
this is only useful for manipulating counts.
Most commands affect one or more variables. Usually, the variable affected is embedded in the command invocation:
p4; n9; s8/fred/bill/g;This would be difficult with the pattern conditional statements which begin with /. In this case, the variable name appears at the end of the expression:
/fred/~8 /begin/~4, /end/Register variable 0 is a special case. Since it is so heavily used, if the command does not specify a register, then register 0 can often be assumed to be the target of the operation. This does not apply to command which have multiple registers affected, such as: x, a, A, SR, SW, etc.
There are of course special commands whose principle function is to manipulate variagles:
q4 STUFF\sJUNK:;This command loads register 4 with the string "STUFF JUNK:". See more below.
a49;This command copies register 4 into register 9.
A49;This command appends register 4 into register 9.
x81;This command swaps the contents of register 8 and register 1.
Text processing statements
The majority of commands in yelt are in fact text processing
statements because that is the tool's primary function. Most
text processing statements resemble function calls where the
parameters are register numbers -- much like assembly language
statements.
Yelt text processing commands operate on 0, 1, 2, or 3 registers and described in the table below.
Example | Name | Commentary |
---|---|---|
s[d]/patrn/rplcmnt/[1g] | substitute | See this section below. |
t[0-9] | expand tabs | Expand tabs in variable 0 or the specified variable. Tabs are assumed to mean 8 character cells per the unix standard. |
y[0-9]/set1/set2/ | translate character set |
Convert character sets in variable 0 or the specified
register. A character set can be a random list of
individual characters, or it can contain a mix of
individual characters and character ranges.
A character range is a pair of characters specified like this: X-YWhere "X" is the low end of a character range and "Y" is the high end. X and Y are both included in the range. For example: a-zRefers to the lower case letters. To specify the individual character, "-", use one of the following:
Note, translating a character into a newline character (\n) does not create a new input line to yelt. Do so at your own peril. See also: dealing with alphabetic case. |
l[0-9] | get current line number |
Store the current line number from the current file in the
specified register (or register 0 if none are specified).
The text will include leading and trailing spaces. The spaces are defined to enable a columnar output for line numbers in the range 1-10,000. |
L[0-9] | get current file name | Store the current file name in variable 0 or the specified variable. If reading from standard in rather than a named file, this will put an empty string in the output variable. |
a[0-9][0-9] | copy one variable to another | Copy the contents of the first variable to the second. Both must be specified. |
A[0-9][0-9] | append one variable to another | Append the contents of the first variable to the second. Both must be specified. |
q[0-9] text | load variable with text | Store the specified text in variable 0 or the specified variable. The text is terminated by ';'. If you need to include a semicolon in the text, use '\;'. Other escape sequences will work as well -- see this. |
c[0-9] x-y[,a-b...] | cut character ranges |
Eliminate undesired text from variable 0 or the specified
register. The cut ranges look like this:
[number]-[number][,range ... ]For example: c3 10-13,99-115This commands to eliminate all the text in variable 3 except that found in columns 10 through 13 and between 99 and 115. The resultant will be stored in variable 3. That which was previously column 10 will now be stored in column 1. That which was previously in column 99 will now be in column 5. |
SC[digit][digit] Count | split character columns |
This command splits the text in the first register (as
defined by the digit immediately after SC) into
two parts. The first "Count" columns stay in the first
register. The remainder of the text go into the second register.
For example:
SC93 10This command splits variable 9 into 2 parts: The first 10 characters stay in variable 9, but the rest goes into variable 3. |
SW[d][d][d] delims | split words |
This command splits the text in a register into
three parts. Three registers must be specified (as
3 digits following the SW).
The delims string is a group of characters, any one of which ends the initial string. The text currently in the first register (as defined by the first digit after SW) will be split into three parts:
SW743 @:In this case, the text in variable 7 will be split up like this:
|
SR[d][d][d] /regex/ | split on regex |
This command splits the text in a register into
three parts. Three register numbers must be specified as
digits after the SR.
The /regex/ is a regular expression which serves as the delimiter for the splitting. The text currently in the first register (as defined by the first digit after SR) will be split into three parts:
SR345 /bob/In this case, the text in variable 3 will be split up like this:
|
M[0-9] | map one string to another |
Using the table specified on the command line by the -M and -D
command line options, see above, Convert
the contents of variable 0 or the specified variable into its
mapped form. For example, if the map file, m.txt, looks like this:
junk|crapAnd the -M and -D options look like this: "-M m.txt -D '|'", and variable 9 contains 'junk', and you use the following command: -e "M9"then variable 9 will be left containing "crap". |
j[0-9] count | left justify register |
This command widens the specified register, or
register 0 if none is specify to be at least count
characters wide.
Spaces are added to the right hand side of the string to make it at least count characters wide. |
J[0-9] count | right justify register |
This command widens the specified register, or
register 0 if none is specify to be at least count
characters wide.
Spaces are added to the left hand side of the string to make it at least count characters wide. |
+[0-9] | increment a register | Add one to the string in the specified register (or register zero). If the string in the register is not a valid integer string, then this will cause a fatal error. |
-[0-9] | decrement a register | Subtract one from the string in the specified register (or register zero). If the string in the register is not a valid integer string, then this will cause a fatal error. |
The substitute command consists of the following components:
s3/pattern/replacment/gIn this case, the text in variable 3 will be modified. All instances of pattern will be changed to replacement.
Note that the pattern and the replacement can contain standard control charactersas described above.
The pattern can also contain "backwards references" to text already processed. These backwards references are explained in more detail in the basic regular expression discussion described below. The general idea, however, is that regular expressions can be defined in terms of parenthetical sections:
stuff\(a*\)morestuff[\1]In this case, the parenthetical expression \(a*\) means optional repetitions of the letter a. The later reference, \1, means "whatever was matched by the first parenthetical expression". So if the word "stuff" was followed by "aaaa" then "morestuff", then the only way for the expression to match would be if there were "[aaaa]" immediately following "morestuff".
This is powerful but requires practice to use.
s4?pattern?replacement?
& | the complete text that mached the pattern. |
\0 | the complete text that mached the pattern. |
\[1-9] | the part of the text matched the first parenthesis in the regular expression pattern. See this section. |
Note that the replacement can also contain control characters as described above for the pattern.
See also: dealing with alphabetic case.
Regular Expressions
Yelt uses SED style regular expressions. That is to say, yelt regular expressions are
Basic Regular Expressions, (BREs), documented
here.
Yelt is implemented using the GNU regular expression source files regex.h and regex.c.
Yelt passes a 0 as the eFlags option to regexec1 in the GNU source code. Note that this
file requires that you obey the GNU GPS copyright as well as that of Lowell Boggs --
or use a different regex package (;->).
Functions
Yelt functions can execute any yelt statement and are defined like this:
Def funcName parmCount { [yelt statements] }And they are invoked like this:
C funcName [parmList]For example:
C myFunc 9,2,0 Def myFunc 3 { A01; A12; if /badData/ Q; p2; }
Yelt functions can be called before they are defined, and vice versa. Redefining a function is illegal. Functions can be defined anywhere, but it makes sense to put the main script at the top of any yelt script and put the functions at the bottom. You cannot easily define functions using the -e options -- but technically you can.
To return from a yelt function, either allow the execution to drop off the bottom of the function, or trigger an early return using the Q statement. Q has no parameters.
Parameters passed to a function must be placed in the register list in the call statement. These parameters are stored in the called functions register set in ascending order, starting with register 0. So, if the caller passes 5,9,4, then the called function will see these values as 0,1,2. Upon return, the only data that the function can pass back to the caller is in this same register list. So, for example:
# call function fred passing it the contents of this # function's (or script's) registers 9 and 7 and 4: C fred 9,4,7 # check for the function's return value in register 4. /returnValue/~4 { # the return value from fred is found in register 4 # # Note that return data could be in either 9, 7, or 4. } ... Def fred 3 { # # parms are in 0 and 1, returning a value in r2: # /firstParm/~0 { # use the data from the caller's register 9 } /secondParm/~1 { # use the data from the caller's register7 # the following code sets up an error-return # value q2 badValue; # the following statement executes an early return # from the function. Q; } # # returning data in the caller's register 4 # because it is mapped to register 2 in the # function's register space. # q2 returnValue; }Function parameters appear to the caller as if they are passed by reference.
Other than the rather clunky mechanism for calling functions and returning values, there is one additional wierdness of yelt functions: b, d, and n behavior can be interpreted by the caller, in some circumstances.
Yelt functions can define while loops just as in the main script. When the b, n, or d command executes from within a while loop inside a function, it behaves normally. If you execute a b, d, or an n statement which reads the end of the file outside a while loop but in a function, the function immediately returns and the caller acts as if the b, d, or empty n statement had been executed in it. For example:
w { C myfunc } ... Def myFunc 0 { # return and terminate the call'er while loop b; }This unusual feature lets your write functions that act like a "super-n" which operates across function boundaries. It avoids extra function parameters and return values that would otherwise be necessary to communicate this information and clutter up your script.
Again, if the b, d, or n is executed in a while loop inside the function, it will behave as normal.
Programming Examples |
top Simple Substitutions Conditional Execution Using Maps Yelt Variables Yelt Parsing Statements Dealing the alphabetic case |
The following sections describe how some yelt scripts are designed in significant detail. Please see HOWTOs for a more terse treatment of many common tasks.
yelt -e 's/fred/bill/g'In this case, yelt will read all the lines in its input file(s), either specified as file names on the command line or in from the standard input stream. For example, if you have two files named data1.txt data2.txt and they have the following contents:
data1.txt: line1 has text on it. line2 has fred's name on it line3 has something else data2.txt: susan tom hank fred sundar gregorYou can convert all copies of 'fred' in the input stream to 'bill to produce a concatenation of the two modified files like this:
line1 has text on it. line2 has bill's name on it line3 has something else susan tom hank bill sundar gregorNote that if you did not desire to replace all instances of fred with bill, yelt, like SED, offers you the option of only replacing only the first instance of "from" string" with the "to" string. To choose this option, rather than the "global" option, change the 'g' at the end of the s command to a '1'.
Note also, that the s command can be applied to any yelt variable -- not just the default (variable 0).
Watch out for "special" characters in the "from" and "to" strings. The s command expects the "from" string to be a regular expression and the "to" string must be formatted as a replacement string.
yelt -e 's/.*/PREFIX&SUFFIX/1'If the input stream or file looked like this:
line1 stuff on line2 more on line3Then yelt's output would look like
PREFIXline1SUFFIX PREFIXstuff on line2SUFFIX PREFIXmore on line3SUFFIXThe "from" regular expression above means "any character and any that follows it". That means that .* will match all the text on the line.
The "to" or "replacement" expression only has 1 special character over and above that which might guess from the above example. In the "to" expression of an s command the & character is replaced with the actual text from the input stream that matches "from" regular expression. There is another character sequence which does the same thing. In the "to" string of the s command, the sequence \0 is also replaced with matching input text.
yelt -e 's/^ *//1; s/ *$/1''Because spaces are often hard to read, let me spell out the above "from" expressions:
caret space space star space space star dollarNote that in this example, the -e option specifies two yelt commands in a single expression -- the commands are separated by the semicolon character. This was done just to show that it could be done -- not that this choice has any bearing on how yelt executes the statements.
In the example above, the first substitute command only operates on text at the beginning of the line. That is, it will have no effect on spaces inside the line or at its end. Whereas the second substitute command operates only on text at the end of the line.
For a regular expression to be constrained so that it only affects the beginning of the line, you must begin it (the "from" string) with caret (^). To constrain its effects to the end of the line, it must end in the dollar sign ($).
In the example above, both at the beginning and at the end of the line, the substring space and any space that follows it are being deleted. That is, substituted with nothing. Note the comments on the star operator earlier.
yelt -e "s/AAA*/A/g"This invocation collapses sequences of A into a single A.
As seen above, you could use the same logic to eliminate duplicate spaces (a much more common need than reducing duplicate A characters).
yelt -e 's/\(.\)\1\1*/\1/g'This "from" expression has the following meaning:
Match any single character which is followed by at least one duplicate of itself.The "to" expression means:
Replace the text matching the "from" expression the text that matched the first parenthesis group in the "from" expression.While this is complicated and ugly, it gives the substitute command enormous power. You can work miracles with properly written regular expressions -- but lots of comments in scripts are often required in order make this code maintainable.
How the "from" regex works:
In the regular expression syntax, \( means "begin a sub-expression". The \) means "end the sub-expression". Sub-expressions are addressable both in "from" expressions and "to" expressions. That is, a later part of a regular expression can be defined in terms of early parts -- but only if the earlier parts are defined in a sub-expression.The "to" string, just tells the s command to replace all the text that matched the "from" expression with the contents of the first sub-expression -- in this case a single copy of the 1 character that began a multi-copy sequence.So, the following fragment:
\(.\)\1Means the following: define the first sub-expression to be the "." regular expression operator. Then, the larger expression requires that the text immediately following the text which matched the first sub-expression be a duplicate of that which was found in the first sub-expression.Since the "." operator matches any single character, then the above fragment means: any character followed by itself.
The whole expression:
\(.\)\1\1*then means: any character followed by 1 copy of itself, optionally followed by more copies of itself.
yelt -e 's/^useless stuff\(interesting part1\)more junk\(more interesting stuff\).*/\2 \1/1'In this case, the line begins with un-needed text but two interesting parts which are in the wrong order. The yelt invocation locates the interesting parts, and replaces the entire line with nothing more than the desired text -- but placed in the correct order.
If the text is column oriented, you can use the cut command to do this with a bit more clarity, but the s command works well for this.
Watch out for regular expressions that don't work the way that you expect them to. Regular expressions parse one character at a time -- sometimes if you try complicated expressions, with lots of sub-expressions, you create unusable patterns. Try splitting them up -- or use multiple s commands that have slightly different "from" regular expression patterns but have same "to" replacement patterns.
Yelt provides the command t to allow you to automatically expand tabs per the standard spacing of 8 characters. It does so much quicker than could be done with some sort of substitute command. However the y command can translate tabs out of existence even faster. All three approach work and will probably become necessary in different situations.
The t command will expand tabs, from this:
a\tbto this:
a bThe y command, given this kind of argument: "y/\t/ /" will convert from:
a\tbto this:
a bAnd if you use s command, you can change the \t to any string you want.
yelt -e "s/a/b/g" -e "s/r/s/g" -e "s/T/U/g" -e "s/M/N/g" ...However, these statements execute quickly, but the more of them that you need to perform needed substitutions, the slower the script runs. To a rough approximation, the time required is proportional to O(N). Yelt provides a way to speed this up to O(ln(N)). This only matters if the number of substitutions is huge -- hundreds or thousands. The map statement is used to convert the contents of a register into its mapped form. That is, to use the map statement you do the following:
script.yelt: s/01437351/Bill Billips/g; s/19343728/Susan Steffano/g; s/00000001/Bill Gates/g; s/00000002/Bill Joy/g; s/00000003/Paul Alan/g;To use this script, you'd run it like this:
generateLog | yelt -f script.yeltThis would work fine -- but what if your employee database has 50,000 people in it? In that case, you might want to create an employee database of the following form:
employeeDatabase.map: 01437351=Bill Billips 19343728=Susan Steffano 00000001=Bill Gates 00000002=Bill Joy 00000003=Paul AlanAnd you would use the database like this:
generateLog | yelt -M employeeDatabase.map -D "=" -f fixEmployeeNumbers.yeltThis invocation requires an additional yelt script:
fixEmployeeNumbers.yelt: # # the following function replaces 8 digit employee numbers # with the corresponding employee's name # w { n; /\<[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]\>/ { # # the current line (in variable 0) has an employee number on it -- # and the employee number is a word in its own right -- not some # accidental misinterpretation of larger number. That's why the # regex above uses \<\> -- to match only whole words. # # force variable 9 to be empty -- and use it as the accumulator # for all changed text q9; q2; w { # # split the text of the line into 3 parts -- using the employee # number as a delimiter: # SR012 /\<[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]\>/; # # check to see if there really was an employee number (the first # time through the while loop there will be, but in subsequent # iterations there may not be. If the employee number is not found # then append the text that did not get split out to the outgoing buffer # and leave the loop. # /^$/~1 { A09; b }; # # Register 0 now contains the part before the employee number # Register 1 contains the employee number # Register 2 contains the stuff after the employee number # # first, append the text that is not an employee number to variable 9 A09; # next replace the employee number with the employee name and append it # to register 9; M1; A19; # # finally, move register 2, containing the stuff after the employee # number to variable 0 for continued parsing. # a20; } # move the accumulator back to register 0 for subsequent printing. a90; } p; }
inputFile.txt: Program XYZ begins Source file: somefile.inp Output file: otherFile.inp: Trace output: information1 information2 information3 Execution complete: Run time: 1 hour.In this example, the useful information produced by the program's execution might be just the parts between the "Trace output:" line and the "Execution Complete:" line. So, a first step of filtering might be delete the unwanted lines, and then apply some other processing steps.
grep '^ ' | yelt -e 's/other/stuff/g'
The obvious method of doing this is to use an if statement to decide what text needs to be discarded and what needs to be processed:
yelt -e "if /^ / { s/other/stuff/g; } else d; "In this example, the if statement processes a regular expression that determines whether the if or the else clause is to be executed. Lines that begin with a space are assumed to be the "interesting" and are thus transformed. All other lines are deleted by continuing the while loop which is a part of the default script.
Another method is to detect lines that you are not interested in and discard them using the pattern conditional statement -- which acts kind of like an if statement but which does not support an else clause. After deleting the uninteresting lines, operate on all the rest. For example:
yelt -e "/^[^ ]/d" -e "s/other/stuff/g"The first -e statement, above, has the following meaning: on lines which do not begin with a space, continue current while loop. this has the effect of suppressing all other processing in the while loop -- including the automatic printing of current line that is a natural part of the default yelt script. The d command suppresses the execution of all other -e options as well.
Another approach would be to rewrite the whole script and simply not bother with the automatic print at the end -- then you would have to make the script detect lines of interest, perform any necessary transformations and print the results:
yelt -S "w{ n; /^ /{ s/other/stuff/g; p; } }"In this case, the while command, w, repeatedly executes the block of commands that follow it. The n command reads the next line of input. There is not automatic printing of the current line, so if any output is to occur, the remaining commands must do so.
In this case, the remaining command is a pattern conditional statement. That is, a regular expression is used to determine if work needs to be done. In this case, the expression checks the text in variable 0 (the current line in this case) to see if it begins with space. On lines where it does, a block of statements is executed. In this case, the block performs a simple substitution and prints the line.
Another way of using pattern conditional statements is to delete all lines, but print the interesting ones first:
yelt -e "/^ /{ s/other/stuff/g; p; } d;"Here, lines beginning with a space are processed with a block of commands which perform needed transformations, the print the line. All lines are subsequently deleted (ie the while loop is continued).
Yet another way is to notice that the first four lines of the log file are always to be discarded, and use the range conditional statement to discard them while working on the rest -- you'd have to deal with the last two lines though, like this:
yelt -e "1,4d; /^Execution/,$d; s/other/stuff; }"In this case, the 1,4d statement eliminates the first few lines. The two pattern conditional statement eliminates the last few lines, and any line not yet deleted gets processed and printed automatically.
s/a/b/g -- modify variable 0 such at all a's become b's s3/x/y/1 -- modify variable 3 such that the first x becomes y t9 -- expand tabs in variable 9 /fred/~3! cmd -- execute cmd if variable 3 does not contain fred SR987 / */ -- split variable 9 into 3 pieces: the text up to the first blank is left in 9. The space, and all spaces contiguous to it are moved into variable 8, and the rest of the text is moved into variable 7.Note that since most processing operates on the current input line, it is generally advisable to reserve variable 0 for that purpose -- but there is no rule saying that you must.
The following statement types perform simple variable manipulations:
yelt -e 'y/a-z/A-Z/'and this converts to all lower:
yelt -e 'y/A-Z/a-z/'Case insensitive compares can be done either of two way:
1. by writing the regular expression to check for both cases: /[aA][bB][cC]/ { q1 Found abc or ABC; p1; d; } However, this mechanism is quite slow to execute and is very cumbersome to use. 2. by converting your regex into all lowercase and converting your data to all lower case and comparing against strict lower case.Since yelt provides 10 registers, option 2 is not quite so bad -- it doesn'tr require that you actually modify your working data -- you can copy the string to a second register and perform the comparison against that one consider:
n; a01; y1/A-Z/a-z/; /fred/~1 { p1 found fred; p1; Q; }This of course assumes that your regular expression is coming into you in lower case. If your regular expressions are read from a file instead of being hard coded constants in the program, you have to be creative.
One approach is to use the computed conditional statement. In this case, you don't use a hard coded regular expression but rather make a test based on the contents of two registers.
Basically, you copy the erstwhile regular expression into a register, then lower case it, the use the computed conditional to test some other string to see if matches:
n4; a45; y5/A-Z/a-z/; |52 { p5 "the string in register 2 matches the expression: "; }
Parsing |
top Reading entire files Tokenizing Finding C++ Function Calls |
Alternatively, you might make a function that returns the next token out of the stream. This function might use this basic logic:
Note that the R statement is designed to work with the parsing commands. It lets you read the entire file into a register varable and process it using either a substitute command or one of the above parsing statements. Note that when you use the R command, just as when you translate characters into newlines, you end up with a slightly confused situation: more than one line in a given string. When you print strings with embedded line feeds, even if the embedded line feed is at the end of the string, the print commands puts another line feed on for you. This can result in double line feeds. You can use the substitute command to get rid of the trailing line feed before printing -- if needed:
s/\n$//1
Tokenizing
The following function can be included in any script and is used to
parse C++ style tokens (except for the old fashioned C-Style comments)
one at a time from an input stream:
Def tokenize 2 { # register 0 is where the input stream goes # register 1 is where the returned token goes # make sure the returned token is empty if we have a problem q1; w { # # process input lines until a token is found or the end of stream # occurs. # w { # # make sure there is a non-comment token available in the input # stream # /^$/ n; # # remove leading blanks from the input data # s/^[ \t][ \t]*//1; # # remove C++ style comments by just reading # the next line and removing leading blanks # SR012 /^\/\/.*/; # # if no comment was found, break out of this loop # /^$/~1 b; q0; q1; } # # if not end of file, process tokens # /./ { # # Look for single quoted strings # SR012 /^'\(\\*.\)*'/; if /./~1 { a20; } else { # # Look for double quoted strings # SR012 /^"\(\\*.\)*"/; if /./~1 { a20; } else { SR012 /^[A-Za-z0-9_]\+/; if /./~1 { # r1 contains an identifier, return it and the rest of the line to # be processed a20; } else { SR012 /^[<>=!:+*\/|&-]\+/; if /./~1 { # >=, >>=, etc a20; } else { SR012 /./; a20; } } } } } if /./~1! { n; } else b; } }The following example script uses the above function to print the tokens found in a C++ source file:
w { # # this script reads tokens from the input stream until end of file # is encountered. Each token is printed on its own line. This # script is just an example, a lot work would necessary to turn it into # real usable program. Also, for the latest copy of the script, see # the source directory TestLib/parse.script. # # # Function tokenize expects 2 parameters: register 0 contains the # text of the current input line. Register 1 is the returned token # after the very next token is extracted from the input stream. # # The tokenize function is meant to be called repeatedly until the input # stream is exausted -- and all invocations should use the same pair of # registers. # C tokenize 0,1 # # If the tokenize function returns an empty string in register 1 then # the final end of file has happened -- so quit. # /./~1! b; # # print the current token and quit # p1; }To run the above script, combine it into a single file with the Tokenize function above.
yelt -f findFuncs.yelt -r8 SomeCplusplusFile -r9 .And it prints file name and line numbers where function-call-like syntax is found in the named file. You must include the Tokenize function above in the same file as the following text:
# # We assume the name of the function to fix is in r9 and was set there # using the "-r9 funcname" option to the command line # We also assume the name of the file on which to operate is r8 set similarly. # q0; F8 { # # process lines in the file # /^$/ n; SR012 /[a-zA-Z_0-9]\+ *(/; # # q4 is a flag meaning that this line does not contain a call # q4 notProcessed; |91 { /^if *(/~1 { # don't process if statements a20; d; } /^while *(/~1 { # don't process while statements a20; d; } /^for *(/~1 { # don't process if statements a20; d; } /^switch *(/~1 { # don't process if statements a20; d; } q4; # # we have found the desired function -- set up to log it out # print the filename and line number and the function name. # Also print the call parameters. # Use r7 as the line buffer to hold the output until it is all available # then flush it all at once # a87; l6; J6 8; A67; q6:; A67; A17; # we have everything but the call parameters in r7 -- now read tokens and store # them in r7 until we are out of data or find the trailing ). # # put the rest of the current input line back into r0 for later processing # a20; # # use r5 as a nest depth -- it is currently 1 because we already have the leading # paren parsed away by the above SR invocation. # # use r3 as a parameter count # q5 1; q3 0; w { # # read the next token out of the input stream # C tokenize 0,1; /^$/~1 { # premature end of file b; } /^(/~1 +5; /^)/~1 { -5; /0/~5 { A17; b;} } # append the token to the output text q6 \s; A67; A17; /^1$/~5 { # # if the parenthesis nest depth is exactly 1 and we have a comma # increment the parameter count # if /^,/~1 { # # increment parameter count # +3; } else if /^0$/~3 { # we have a any text at the top level and the parm count is 0 # then change it to 1 +3; } } } # # include the parameter count in the output text # q6 \s:\sparms=; A36; A67; p7; } /^$/~4! { # # if this line does not contain a function call, set it to empty # so that the top of the loop will read a new line # q0; } }Note that the above function is a tad more complicated than is technically necessary. The -r8 option could be eliminated in favor of just processing the normal yelt input files and the -r9 option is only necessary if you wish to show selected functions -- if you want to print the ALL you wouldn't need the -r9 behavior at all.