=head1 NAME
perldata - Perl data types
=head1 DESCRIPTION
=head2 Variable names
X X X X
Perl has three built-in data types: scalars, arrays of scalars, and
associative arrays of scalars, known as "hashes". A scalar is a
single string (of any size, limited only by the available memory),
number, or a reference to something (which will be discussed
in L). Normal arrays are ordered lists of scalars indexed
by number, starting with 0. Hashes are unordered collections of scalar
values indexed by their associated string key.
Values are usually referred to by name, or through a named reference.
The first character of the name tells you to what sort of data
structure it refers. This character is called a "sigil". The rest of
the name tells you the particular
value to which it refers. Usually this name is a single I,
that is, a string beginning with a letter or underscore, and
containing letters, underscores, and digits. In some cases, it may
be a chain of identifiers, separated by C<::> (or by the deprecated C);
all but the last are interpreted as names of packages,
to locate the namespace in which to look up the final identifier
(see L for details). For a more in-depth discussion
on identifiers, see L. It's possible to
substitute for a simple identifier, an expression that produces a reference
to the value at runtime. This is described in more detail below
and in L. It is legal, but not recommended, to separate a
variable's sigil from its name by space and/or tab characters.
X X
Perl also has its own built-in variables whose names don't follow
these rules. They have strange names so they don't accidentally
collide with one of your normal variables. Strings that match
parenthesized parts of a regular expression are saved under names
containing only digits after the C (see L and L).
In addition, several special variables that provide windows into
the inner working of Perl have names containing punctuation characters.
These are documented in L.
X
Scalar values are always named with the sigil C, even when referring to a
scalar that is part of an array or a hash. The C symbol works
semantically like the English word "the" in that it indicates a
single value is expected.
X
$days # the simple scalar value "days"
$days[28] # the 29th element of array @days
$days{'Feb'} # the 'Feb' value from hash %days
$#days # the last index of array @days
Entire arrays (and slices of arrays and hashes) are denoted by the sigil
C, which works much as the word "these" or "those" does in English,
in that it indicates multiple values are expected.
X
@days # ($days[0], $days[1],... $days[n])
@days[3,4,5] # same as ($days[3],$days[4],$days[5])
@days{'a','c'} # same as ($days{'a'},$days{'c'})
Entire hashes are denoted by the sigil '%':
X
%days # (key1, val1, key2, val2 ...)
In addition, subroutines are named with an initial sigil C, though this
is optional when unambiguous, just as the word "do" is often redundant
in English. Symbol table entries can be named with an initial C,
but you don't really care about that yet (if ever :-).
Every variable type has its own namespace, as do several
non-variable identifiers. This means that you can, without fear
of conflict, use the same name for a scalar variable, an array, or
a hash--or, for that matter, for a filehandle, a directory handle, a
subroutine name, a format name, or a label. This means that $foo
and @foo are two different variables. It also means that C
is a part of @foo, not a part of $foo. This may seem a bit weird,
but that's okay, because it is weird.
X
Because variable references always start with the sigils C, C, or
C, the "reserved" words aren't in fact reserved with respect to
variable
names. They I reserved with respect to labels and filehandles,
however, which don't have an initial special character. You can't
have a filehandle named "log", for instance. Hint: you could say
C rather than C. Using
uppercase filehandles also improves readability and protects you
from conflict with future reserved words. Case I significant--"FOO",
"Foo", and "foo" are all different names. Names that start with a
letter or underscore may also contain digits and underscores.
X
X
It is possible to replace such an alphanumeric name with an expression
that returns a reference to the appropriate type. For a description
of this, see L.
Names that start with a digit may contain only more digits. Names
that do not start with a letter, underscore, digit or a caret are
limited to one character, e.g., C or
C. (Most of these one character names have a predefined
significance to Perl. For instance, C is the current process
id. And all such names are reserved for Perl's possible use.)
=head2 Identifier parsing
X
Up until Perl 5.18, the actual rules of what a valid identifier
was were a bit fuzzy. However, in general, anything defined here should
work on previous versions of Perl, while the opposite -- edge cases
that work in previous versions, but aren't defined here -- probably
won't work on newer versions.
As an important side note, please note that the following only applies
to bareword identifiers as found in Perl source code, not identifiers
introduced through symbolic references, which have much fewer
restrictions.
If working under the effect of the C