PERL -- Operators

Operators

Since perl expressions work almost exactly like C expressions, only the differences will be mentioned here.

Here's what perl has that C doesn't:

The exponentiation operator.

**=

The exponentiation assignment operator.

()

The null list, used to initialize an array to null.

.

Concatenation of two strings.

.=

The concatenation assignment operator.

eq

String equality (== is numeric equality). For a mnemonic just think of "eq" as a string. (If you are used to the awk behavior of using == for either string or numeric equality based on the current form of the comparands, beware! You must be explicit here.)

ne

String inequality (!= is numeric inequality).

lt

String less than.

gt

String greater than.

le

String less than or equal.

ge

String greater than or equal.

cmp

String comparison, returning -1, 0, or 1.

<=>

Numeric comparison, returning -1, 0, or 1.

=~

Certain operations search or modify the string "$_" by default. This operator makes that kind of operation work on some other string. The right argument is a search pattern, substitution, or translation. The left argument is what is supposed to be searched, substituted, or translated instead of the default "$_". The return value indicates the success of the operation. (If the right argument is an expression other than a search pattern, substitution, or translation, it is interpreted as a search pattern at run time. This is less efficient than an explicit search, since the pattern must be compiled every time the expression is evaluated.) The precedence of this operator is lower than unary minus and autoincrement/decrement, but higher than everything else.

!~

Just like =~ except the return value is negated.

x

The repetition operator. Returns a string consisting of the left operand repeated the number of times specified by the right operand. In an array context, if the left operand is a list in parens, it repeats the list.

	print '-' x 80;		# print row of dashes
	print '-' x80;		# illegal, x80 is identifier

	print "\t" x ($tab/8), ' ' x ($tab%8);	# tab over

	@ones = (1) x 80;		# an array of 80 1's
	@ones = (5) x @ones;		# set all elements to 5

x=

The repetition assignment operator. Only works on scalars.

..

The range operator, which is really two different operators depending on the context. In an array context, returns an array of values counting (by ones) from the left value to the right value. This is useful for writing "for (1..10)" loops and for doing slice operations on arrays.

In a scalar context, .. returns a boolean value. The operator is bistable, like a flip-flop, and emulates the line-range (comma) operator of sed, awk, and various editors. Each .. operator maintains its own boolean state. It is false as long as its left operand is false. Once the left operand is true, the range operator stays true until the right operand is true, AFTER which the range operator becomes false again. (It doesn't become false till the next time the range operator is evaluated. It can test the right operand and become false on the same evaluation it became true (as in awk), but it still returns true once. If you don't want it to test the right operand till the next evaluation (as in sed), use three dots (...) instead of two.) The right operand is not evaluated while the operator is in the "false" state, and the left operand is not evaluated while the operator is in the "true" state. The precedence is a little lower than || and &&. The value returned is either the null string for false, or a sequence number (beginning with 1) for true. The sequence number is reset for each range encountered. The final sequence number in a range has the string 'E0' appended to it, which doesn't affect its numeric value, but gives you something to search for if you want to exclude the endpoint. You can exclude the beginning point by waiting for the sequence number to be greater than 1. If either operand of scalar .. is static, that operand is implicitly compared to the $. variable, the current line number.

Examples:

As a scalar operator:
    if (101 .. 200) { print; }	# print 2nd hundred lines

    next line if (1 .. /^$/);	# skip header lines

    s/^/> / if (/^$/ .. eof());	# quote body

As an array operator:
    for (101 .. 200) { print; }	# print $_ 100 times

    @foo = @foo[$[ .. $#foo];	# an expensive no-op
    @foo = @foo[$#foo-4 .. $#foo];	# slice last 5 items

-x

A file test. This unary operator takes one argument, either a filename or a filehandle, and tests the associated file to see if something is true about it. If the argument is omitted, tests $_, except for -t, which tests STDIN. It returns 1 for true and '' for false, or the undefined value if the file doesn't exist. Precedence is higher than logical and relational operators, but lower than arithmetic operators. The operator may be any of:

	-r	File is readable by effective uid/gid.
	-w	File is writable by effective uid/gid.
	-x	File is executable by effective uid/gid.
	-o	File is owned by effective uid.
	-R	File is readable by real uid/gid.
	-W	File is writable by real uid/gid.
	-X	File is executable by real uid/gid.
	-O	File is owned by real uid.
	-e	File exists.
	-z	File has zero size.
	-s	File has non-zero size (returns size).
	-f	File is a plain file.
	-d	File is a directory.
	-l	File is a symbolic link.
	-p	File is a named pipe (FIFO).
	-S	File is a socket.
	-b	File is a block special file.
	-c	File is a character special file.
	-u	File has setuid bit set.
	-g	File has setgid bit set.
	-k	File has sticky bit set.
	-t	Filehandle is opened to a tty.
	-T	File is a text file.
	-B	File is a binary file (opposite of -T).
	-M	Age of file in days when script started.
	-A	Same for access time.
	-C	Same for inode change time.

The interpretation of the file permission operators -r, -R, -w, -W, -x and -X is based solely on the mode of the file and the uids and gids of the user. There may be other reasons you can't actually read, write or execute the file. Also note that, for the superuser, -r, -R, -w and -W always return 1, and -x and -X return 1 if any execute bit is set in the mode. Scripts run by the superuser may thus need to do a stat() in order to determine the actual mode of the file, or temporarily set the uid to something else.

Example:

	
	while (<>) {
		chop;
		next unless -f $_;	# ignore specials
		...
	}

Note that -s/a/b/ does not do a negated substitution. Saying -exp($foo) still works as expected, however--only single letters following a minus are interpreted as file tests.

The -T and -B switches work as follows. The first block or so of the file is examined for odd characters such as strange control codes or metacharacters. If too many odd characters (>10%) are found, it's a -B file, otherwise it's a -T file. Also, any file containing null in the first block is considered a binary file. If -T or -B is used on a filehandle, the current stdio buffer is examined rather than the first block. Both -T and -B return TRUE on a null file, or a file at EOF when testing a filehandle.

If any of the file tests (or either stat operator) are given the special filehandle consisting of a solitary underline, then the stat structure of the previous file test (or stat operator) is used, saving a system call. (This doesn't work with -t, and you need to remember that lstat and -l will leave values in the stat structure for the symbolic link, not the real file.)

Example:

	print "Can do.\n" if -r $a || -w _ || -x _;

	stat($filename);
	print "Readable\n" if -r _;
	print "Writable\n" if -w _;
	print "Executable\n" if -x _;
	print "Setuid\n" if -u _;
	print "Setgid\n" if -g _;
	print "Sticky\n" if -k _;
	print "Text\n" if -T _;
	print "Binary\n" if -B _;

Here is what C has that perl doesn't:

unary &: Address-of operator.
unary *: Dereference-address operator.
(TYPE): Type casting operator.

Like C, perl does a certain amount of expression evaluation at compile time, whenever it determines that all of the arguments to an operator are static and have no side effects. In particular, string concatenation happens at compile time between literals that don't do variable substitution. Backslash interpretation also happens at compile time. You can say

	'Now is the time for all' . "\n" .
	'good men to come to.'

and this all reduces to one string internally.

The autoincrement operator has a little extra built-in magic to it. If you increment a variable that is numeric, or that has ever been used in a numeric context, you get a normal increment. If, however, the variable has only been used in string contexts since it was set, and has a value that is not null and matches the pattern /^[a-zA-Z]*[0-9]*$/, the increment is done as a string, preserving each character within its range, with carry:

	print ++($foo = '99');	# prints '100'
	print ++($foo = 'a0');	# prints 'a1'
	print ++($foo = 'Az');	# prints 'Ba'
	print ++($foo = 'zz');	# prints 'aaa'

The autodecrement is not magical.

The range operator (in an array context) makes use of the magical autoincrement algorithm if the minimum and maximum are strings. You can say @alphabet = ('A' .. 'Z'); to get all the letters of the alphabet, or $hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15]; to get a hexadecimal digit, or @z2 = ('01' .. '31'); print @z2[$mday]; to get dates with leading zeros. (If the final value specified is not in the sequence that the magical increment would produce, the sequence goes until the next value would be longer than the final value specified.)

The || and && operators differ from C's in that, rather than returning 0 or 1, they return the last value evaluated. Thus, a portable way to find out the home directory might be:

	$home = $ENV{'HOME'} || $ENV{'LOGDIR'} ||
	    (getpwuid($<))[7] || die "You're homeless!\n";

Along with the literals and variables mentioned earlier, the operations in the following section can serve as terms in an expression. Some of these operations take a LIST as an argument. Such a list can consist of any combination of scalar arguments or array values; the array values will be included in the list as if each individual element were interpolated at that point in the list, forming a longer single-dimensional array value. Elements of the LIST should be separated by commas. If an operation is listed both with and without parentheses around its arguments, it means you can either use it as a unary operator or as a function call. To use it as a function call, the next token on the same line must be a left parenthesis. (There may be intervening white space.) Such a function then has highest precedence, as you would expect from a function. If any token other than a left parenthesis follows, then it is a unary operator, with a precedence depending only on whether it is a LIST operator or not. LIST operators have lowest precedence. All other unary operators have a precedence greater than relational operators but less than arithmetic operators. See the section on Precedence.

For operators that can be used in either a scalar or array context, failure is generally indicated in a scalar context by returning the undefined value, and in an array context by returning the null list. Remember though that there is no general rule for converting a list into a scalar. Each operator decides which sort of scalar it would be most appropriate to return. Some operators return the length of the list that would have been returned in an array context. Some operators return the first value in the list. Some operators return the last value in the list. Some operators return a count of successful operations. In general, they do what you want, unless you want consistency.