yeap's profileyeap's spaceBlogGuestbookNetwork Tools Help

Blog


    July 06

    Performing SQL operations through Perl from http://www.codeproject.com/KB/perl/perldbi.aspx

    Introduction

    This article explains on how to perform operations on your database through Perl, using the DBI module. This assumes that you have basic knowledge about Perl/CGI and SQL. We will be making a simple table and performing basic SQL operations on it.

    Comments

    Like all Perl code, this code too is self explanatory. If you need detailed information, don't hesitate to use the article forums.

    Example one

    Creating a table.

    Collapse Copy Code
    #!/usr/local/bin/perl
    use DBI;
    
    $username = '';$password = '';$database = '';$hostname = '';
    $dbh = DBI->connect("dbi:mysql:database=$database;" . 
       "host=$hostname;port=3306", $username, $password);
    
    $SQL= "create table user(ID integer primary key " . 
      "auto_increment, username text not null," . 
      " password text not null, email text not null)";
    
    $CreateTable = $dbh->do($SQL);
    
    print "Content-type:text/html\n\n\n";
    if($CreateTable){
    print "Success";
    }
    else{
    print "Failure
    $DBI::errstr"; }

    Example two

    Inserting a record.

    Collapse Copy Code
    #!/usr/local/bin/perl
    use DBI;
    
    $username = '';$password = '';$database = '';$hostname = '';
    $dbh = DBI->connect("dbi:mysql:database=$database;" . 
      "host=$hostname;port=3306", $username, $password);
    
    $SQL= "insert into user (username, password, email)" .
      " values('lexxwern', 'password', 'email@host')";
    
    $InsertRecord = $dbh->do($SQL);
    
    print "Content-type:text/html\n\n\n";
    if($InsertRecord){
    print "Success";
    }
    else{
    print "Failure
    $DBI::errstr"; }

    Example three

    Updating a record.

    Collapse Copy Code
    #!/usr/local/bin/perl
    use DBI;
    
    $username = '';$password = '';$database = '';$hostname = '';
    $dbh = DBI->connect("dbi:mysql:database=$database;" .
      "host=$hostname;port=3306", $username, $password);
    
    $SQL= "update user set email = ".
      "'lexxwern@yahoo.com' where username = 'lexxwern'";
    
    $UpdateRecord = $dbh->do($SQL);
    
    print "Content-type:text/html\n\n\n";
    if($UpdateRecord){
    print "Success";
    }
    else{
    print "Failure
    $DBI::errstr"; }

    Example four

    Deleting a record.

    Collapse Copy Code
    #!/usr/local/bin/perl
    use DBI;
    
    $username = '';$password = '';$database = '';$hostname = '';
    $dbh = DBI->connect("dbi:mysql:database=$database;" .
      "host=$hostname;port=3306", $username, $password);
    
    $SQL= "delete from user where ID=1";
    
    $DeleteRecord = $dbh->do($SQL);
    
    print "Content-type:text/html\n\n\n";
    if($DeleteRecord){
    print "Success";
    }
    else{
    print "Failure
    $DBI::errstr"; }

    Example five

    Viewing all records.

    Collapse Copy Code
    #!/usr/local/bin/perl
    print "Content-type:text/html\n\n";
    
    use DBI;
    
    $username = '';$password = '';$database = '';$hostname = '';
    $dbh = DBI->connect("dbi:mysql:database=$database;" .
     "host=$hostname;port=3306", $username, $password);
    
    $SQL= "select * from user";
    
    $Select = $dbh->prepare($SQL);
    $Select->execute();
    
    while($Row=$Select->fetchrow_hashref)
    {
      print "$Row->{username}
    $Row->{email}"; }

    Conclusion

    Hopefully these examples can give you a neat preview of the capabilities of the DBI module. This site will be of further help. Good luck!

    License

    This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

    A list of licenses authors might use can be found here

    About the Author

    lexxwern


    Member
    About
    :: a fulltime college student from new delhi, india.

    Latest
    :: college 1st semester.
    Occupation: Web Developer
    Location: India India

    Talking about perl.com: A Short Guide to DBI from http://www.perl.com/pub/a/1999/10/DBI.html


    General information about relational databases

    Relational databases started to get to be a big deal in the 1970's, andthey're still a big deal today, which is a little peculiar, because they're a 1960's technology.

    A relational database is a bunch of rectangular tables. Each row of a table is a record about one person or thing; the record contains several pieces of information called fields. Here is an example table:

     LASTNAME   FIRSTNAME   ID   POSTAL_CODE   AGE  SEX
            Gauss      Karl        119  19107         30   M
            Smith      Mark        3    T2V 3V4       53   M
            Noether    Emmy        118  19107         31   F
            Smith      Jeff        28   K2G 5J9       19   M
            Hamilton   William     247  10139         2    M
    

    The names of the fields are LASTNAME, FIRSTNAME, ID, POSTAL_CODE, AGE, and SEX. Each line in the table is a record, or sometimes a row or tuple. For example, the first row of the table represents a 30-year-old male whose name is Karl Gauss, who lives at postal code 19107, and whose ID number is 119.

    Sometimes this is a very silly way to store information. When the information naturally has a tabular structure it's fine. When it doesn't, you have to squeeze it into a table, and some of the techniques for doing that are more successful than others. Nevertheless, tables are simple and are easy to understand, and most of the high-performance database systems you can buy today operate under this 1960's model.

    About SQL

    SQL stands for Structured Query Language. It was invented at IBM in the 1970's. It's a language for describing searches and modifications to a relational database.

    SQL was a huge success, probably because it's incredibly simple and anyone can pick it up in ten minutes. As a result, all the important database systems support it in some fashion or another. This includes the big players, like Oracle and Sybase, high-quality free or inexpensive database systems like MySQL, and funny hacks like Perl's DBD::CSV module, which we'll see later.

    There are four important things one can do with a table:

    SELECT
    Find all the records that have a certain property

    INSERT
    Add new records

    DELETE
    Remove old records

    UPDATE
    Modify records that are already there

    Those are the four most important SQL commands, also called queries. Suppose that the example table above is named people. Here are examples of each of the four important kinds of queries:

     SELECT firstname FROM people WHERE lastname = 'Smith'
    

    (Locate the first names of all the Smiths.)

     DELETE FROM people WHERE id = 3
    

    (Delete Mark Smith from the table)

     UPDATE people SET age = age+1 WHERE id = 247
    

    (William Hamilton just had a birthday.)

     INSERT INTO people VALUES ('Euler', 'Leonhard', 248, NULL, 58, 'M')
    

    (Add Leonhard Euler to the table.)

    There are a bunch of other SQL commands for creating and discarding tables, for granting and revoking access permissions, for committing and abandoning transactions, and so forth. But these four are the important ones. Congratulations; you are now a SQL programmer. For the details, go to any reasonable bookstore and pick up a SQL quick reference.

    Every database system is a little different. You talk to some databases over the network and make requests of the database engine; other databases you talk to through files or something else.

    Typically when you buy a commercial database, you get a library with it. The vendor has written some functions for talking to the database in some language like C, compiled the functions, and the compiled code is the library. You can write a C program that calls the functions in the library when it wants to talk to the database.

    Every vendor's library is different. The names of the functions vary, and the order in which you call them varies, and the details of passing queries to the functions and getting the data back out will vary. Some libraries, like Oracle's, are very thin—they just send the query over to the network to the real database and let the giant expensive real database engine deal with it directly. Other libraries will do more predigestion of the query, and more work afterwards to turn the data into a data structure. Some databases will want you to spin around three times and bark like a chicken; others want you to stand on your head and drink out of your sneaker.

    What DBI is For

    There's a saying that any software problem can be solved by adding a layer of indirection. That's what Perl's DBI (`Database Interface') module is all about. It was written by Tim Bunce.

    DBI is designed to protect you from the details of the vendor libraries. It has a very simple interface for saying what SQL queries you want to make, and for getting the results back. DBI doesn't know how to talk to any particular database, but it does know how to locate and load in DBD (`Database Driver') modules. The DBD modules have the vendor libraries in them and know how to talk to the real databases; there is one DBD module for every different database.

    When you ask DBI to make a query for you, it sends the query to the appropriate DBD module, which spins around three times or drinks out of its sneaker or whatever is necessary to communicate with the real database. When it gets the results back, it passes them to DBI. Then DBI gives you the results. Since your program only has to deal with DBI, and not with the real database, you don't have to worry about barking like a chicken.

    Here's your program talking to the DBI library. You are using two databases at once. One is an Oracle database server on some other machine, and another is a DBD::CSV database that stores the data in a bunch of plain text files on the local disk.

    Your program sends a query to DBI, which forwards it to the appropriate DBD module; let's say it's DBD::Oracle. DBD::Oracle knows how to translate what it gets from DBI into the format demanded by the Oracle library, which is built into it. The library forwards the request across the network, gets the results back, and returns them to DBD::Oracle. DBD::Oracle returns the results to DBI as a Perl data structure. Finally, your program can get the results from DBI.

    On the other hand, suppose that your program was querying the text files. It would prepare the same sort of query in exactly the same way, and send it to DBI in exactly the same way. DBI would see that you were trying to talk to the DBD::CSV database and forward the request to the DBD::CSV module. The DBD::CSV module has Perl functions in it that tell it how to parse SQL and how to hunt around in the text files to find the information you asked for. It then returns the results to DBI as a Perl data structure. Finally, your program gets the results from DBI in exactly the same way that it would have if you were talking to Oracle instead.

    There are two big wins that result from this organization. First, you don't have to worry about the details of hunting around in text files or talking on the network to the Oracle server or dealing with Oracle's library. You just have to know how to talk to DBI.

    Second, if you build your program to use Oracle, and then the following week upper management signs a new Strategic Partnership with Sybase, it's easy to convert your code to use Sybase instead of Oracle. You change exactly one line in your program, the line that tells DBI to talk to DBD::Oracle, and have it use DBD::Sybase instead. Or you might build your program to talk to a cheap, crappy database like MS Access, and then next year when the application is doing well and getting more use than you expected, you can upgrade to a better database next year without changing any of your code.

    There are DBD modules for talking to every important kind of SQL database. DBD::Oracle will talk to Oracle, and DBD::Sybase will talk to Sybase. DBD::ODBC will talk to any ODBC database including Microsoft Acesss. (ODBC is a Microsoft invention that is analogous to DBI itself. There is no DBD module for talking to Access directly.) DBD::CSV allows SQL queries on plain text files. DBD::mysql talks to the excellent MySQL database from TCX DataKonsultAB in Sweden. (MySQL is a tremendous bargain: It's $200 for commercial use, and free for noncommerical use.)

    Example of How to Use DBI

    Here's a typical program. When you run it, it waits for you to type a last name. Then it searches the database for people with that last name and prints out the full name and ID number for each person it finds. For example:

     Enter name> Noether
                    118: Emmy Noether
    
            Enter name> Smith
                    3: Mark Smith
                    28: Jeff Smith
    
            Enter name> Snonkopus
                    No names matched `Snonkopus'.
            
            Enter name> ^D
    

    Here is the code:

     use DBI;
    
            my $dbh = DBI->connect('DBI:Oracle:payroll')
                    or die "Couldn't connect to database: " . DBI->errstr;
            my $sth = $dbh->prepare('SELECT * FROM people WHERE lastname = ?')
                    or die "Couldn't prepare statement: " . $dbh->errstr;
    
            print "Enter name> ";
            while ($lastname = <>) {               # Read input from the user
              my @data;
              chomp $lastname;
              $sth->execute($lastname)             # Execute the query
                or die "Couldn't execute statement: " . $sth->errstr;
    
              # Read the matching records and print them out          
              while (@data = $sth->fetchrow_array()) {
                my $firstname = $data[1];
                my $id = $data[2];
                print "\t$id: $firstname $lastname\n";
              }
    
              if ($sth->rows == 0) {
                print "No names matched `$lastname'.\n\n";
              }
    
              $sth->finish;
              print "\n";
              print "Enter name> ";
            }
              
            $dbh->disconnect;
    

     use DBI;
    

    This loads in the DBI module. Notice that we don't have to load in any DBD module. DBI will do that for us when it needs to.

     my $dbh = DBI->connect('DBI:Oracle:payroll');
                    or die "Couldn't connect to database: " . DBI->errstr;
    

    The connect call tries to connect to a database. The first argument, DBI:Oracle:payroll, tells DBI what kind of database it is connecting to. The Oracle part tells it to load DBD::Oracle and to use that to communicate with the database. If we had to switch to Sybase next week, this is the one line of the program that we would change. We would have to change Oracle to Sybase.

    payroll is the name of the database we will be searching. If we were going to supply a username and password to the database, we would do it in the connect call:

     my $dbh = DBI->connect('DBI:Oracle:payroll', 'username', 'password')
                    or die "Couldn't connect to database: " . DBI->errstr;
    

    If DBI connects to the database, it returns a database handle object, which we store into $dbh. This object represents the database connection. We can be connected to many databases at once and have many such database connection objects.

    If DBI can't connect, it returns an undefined value. In this case, we use die to abort the program with an error message. DBI->errstr returns the reason why we couldn't connect—``Bad password'' for example.

     my $sth = $dbh->prepare('SELECT * FROM people WHERE lastname = ?')
                    or die "Couldn't prepare statement: " . $dbh->errstr;
    

    The prepare call prepares a query to be executed by the database. The argument is any SQL at all. On high-end databases, prepare will send the SQL to the database server, which will compile it. If prepare is successful, it returns a statement handle object which represents the statement; otherwise it returns an undefined value and we abort the program. $dbh->errstr will return the reason for failure, which might be ``Syntax error in SQL''. It gets this reason from the actual database, if possible.

    The ? in the SQL will be filled in later. Most databases can handle this. For some databases that don't understand the ?, the DBD module will emulate it for you and will pretend that the database understands how to fill values in later, even though it doesn't.

     print "Enter name> ";
    

    Here we just print a prompt for the user.

     while ($lastname = <>) {               # Read input from the user
              ...
            }
    

    This loop will repeat over and over again as long as the user enters a last name. If they type a blank line, it will exit. The Perl <> symbol means to read from the terminal or from files named on the command line if there were any.

     my @data;
    

    This declares a variable to hold the data that we will get back from the database.

     chomp $lastname;
    

    This trims the newline character off the end of the user's input.

     $sth->execute($lastname)             # Execute the query
                or die "Couldn't execute statement: " . $sth->errstr;
    

    execute executes the statement that we prepared before. The argument $lastname is substituted into the SQL in place of the ? that we saw earlier. execute returns a true value if it succeeds and a false value otherwise, so we abort if for some reason the execution fails.

     while (@data = $sth->fetchrow_array()) {
                ...
               }
    

    fetchrow_array returns one of the selected rows from the database. You get back an array whose elements contain the data from the selected row. In this case, the array you get back has six elements. The first element is the person's last name; the second element is the first name; the third element is the ID, and then the other elements are the postal code, age, and sex.

    Each time we call fetchrow_array, we get back a different record from the database. When there are no more matching records, fetchrow_array returns the empty list and the while loop exits.

     my $firstname = $data[1];
                 my $id = $data[2];
    

    These lines extract the first name and the ID number from the record data.

     print "\t$id: $firstname $lastname\n";
    

    This prints out the result.

     if ($sth->rows == 0) {
                print "No names matched `$lastname'.\n\n";
              }
    

    The rows method returns the number of rows of the database that were selected. If no rows were selected, then there is nobody in the database with the last name that the user is looking for. In that case, we print out a message. We have to do this after the while loop that fetches whatever rows were available, because with some databases you don't know how many rows there were until after you've gotten them all.

     $sth->finish;
              print "\n";
              print "Enter name> ";
    

    Once we're done reporting about the result of the query, we print another prompt so that the user can enter another name. finish tells the database that we have finished retrieving all the data for this query and allows it to reinitialize the handle so that we can execute it again for the next query.

     $dbh->disconnect;
    

    When the user has finished querying the database, they type a blank line and the main while loop exits. disconnect closes the connection to the database.

    Cached Queries

    Here's a function which looks up someone in the example table, given their ID number, and returns their age:

     sub age_by_id {
              # Arguments: database handle, person ID number
              my ($dbh, $id) = @_;
              my $sth = $dbh->prepare('SELECT age FROM people WHERE id = ?')
                or die "Couldn't prepare statement: " . $dbh->errstr;
    

     $sth->execute($id) 
                or die "Couldn't execute statement: " . $sth->errstr;
    

     my ($age) = $sth->fetchrow_array();
              return $age;
            }
    

    It prepares the query, executes it, and retrieves the result.

    There's a problem here though. Even though the function works correctly, it's inefficient. Every time it's called, it prepares a new query. Typically, preparing a query is a relatively expensive operation. For example, the database engine may parse and understand the SQL and translate it into an internal format. Since the query is the same every time, it's wasteful to throw away this work when the function returns.

    Here's one solution:

     { my $sth;
              sub age_by_id {
                # Arguments: database handle, person ID number
                my ($dbh, $id) = @_;
    

     if (! defined $sth) {
                  $sth = $dbh->prepare('SELECT age FROM people WHERE id = ?')
                    or die "Couldn't prepare statement: " . $dbh->errstr;
                }
    

     $sth->execute($id) 
                  or die "Couldn't execute statement: " . $sth->errstr;
    

     my ($age) = $sth->fetchrow_array();
                return $age;
              }
            }
    

    There are two big changes to this function from the previous version. First, the $sth variable has moved outside of the function; this tells Perl that its value should persist even after the function returns. Next time the function is called, $sth will have the same value as before.

    Second, the prepare code is in a conditional block. It's only executed if $sth does not yet have a value. The first time the function is called, the prepare code is executed and the statement handle is stored into $sth. This value persists after the function returns, and the next time the function is called, $sth still contains the statement handle and the prepare code is skipped.

    Here's another solution:

     sub age_by_id {
              # Arguments: database handle, person ID number
              my ($dbh, $id) = @_;
              my $sth = $dbh->prepare_cached('SELECT age FROM people WHERE id = ?')
                or die "Couldn't prepare statement: " . $dbh->errstr;
    

     $sth->execute($id) 
                or die "Couldn't execute statement: " . $sth->errstr;
    

     my ($age) = $sth->fetchrow_array();
              return $age;
            }
    

    Here the only change to to replace prepare with prepare_cached. The prepare_cached call is just like prepare, except that it looks to see if the query is the same as last time. If so, it gives you the statement handle that it gave you before.

    Transactions

    Many databases support transactions. This means that you can make a whole bunch of queries which would modify the databases, but none of the changes are actually made. Then at the end you issue the special SQL query COMMIT, and all the changes are made simultaneously. Alternatively, you can issue the query ROLLBACK, in which case all the queries are thrown away.

    As an example of this, consider a function to add a new employee to a database. The database has a table called employees that looks like this:

     FIRSTNAME  LASTNAME   DEPARTMENT_ID
            Gauss      Karl       17
            Smith      Mark       19
            Noether    Emmy       17
            Smith      Jeff       666
            Hamilton   William    17
    

    and a table called departments that looks like this:

     ID   NAME               NUM_MEMBERS
            17   Mathematics        3
            666  Legal              1
            19   Grounds Crew       1
    

    The mathematics department is department #17 and has three members: Karl Gauss, Emmy Noether, and William Hamilton.

    Here's our first cut at a function to insert a new employee. It will return true or false depending on whether or not it was successful:

     sub new_employee {
              # Arguments: database handle; first and last names of new employee;
              # department ID number for new employee's work assignment
              my ($dbh, $first, $last, $department) = @_;
              my ($insert_handle, $update_handle);
    

     my $insert_handle = 
                $dbh->prepare_cached('INSERT INTO employees VALUES (?,?,?)'); 
              my $update_handle = 
                $dbh->prepare_cached('UPDATE departments 
                                         SET num_members = num_members + 1
                                       WHERE id = ?');
    

     die "Couldn't prepare queries; aborting"
                unless defined $insert_handle && defined $update_handle;
    

     $insert_handle->execute($first, $last, $department) or return 0;
              $update_handle->execute($department) or return 0;
              return 1;   # Success
            }
    

    We create two handles, one for an insert query that will insert the new employee's name and department number into the employees table, and an update query that will increment the number of members in the new employee's department in the department table. Then we execute the two queries with the appropriate arguments.

    There's a big problem here: Suppose, for some reason, the second query fails. Our function returns a failure code, but it's too late, it has already added the employee to the employees table, and that means that the count in the departments table is wrong. The database now has corrupted data in it.

    The solution is to make both updates part of the same transaction. Most databases will do this automatically, but without an explicit instruction about whether or not to commit the changes, some databases will commit the changes when we disconnect from the database, and others will roll them back. We should specify the behavior explicitly.

    Typically, no changes will actually be made to the database until we issue a commit. The version of our program with commit looks like this:

     sub new_employee {
              # Arguments: database handle; first and last names of new employee;
              # department ID number for new employee's work assignment
              my ($dbh, $first, $last, $department) = @_;
              my ($insert_handle, $update_handle);
    

     my $insert_handle = 
                $dbh->prepare_cached('INSERT INTO employees VALUES (?,?,?)'); 
              my $update_handle = 
                $dbh->prepare_cached('UPDATE departments 
                                         SET num_members = num_members + 1
                                       WHERE id = ?');
    

     die "Couldn't prepare queries; aborting"
                unless defined $insert_handle && defined $update_handle;
    

     my $success = 1;
              $success &&= $insert_handle->execute($first, $last, $department);
              $success &&= $update_handle->execute($department);
    

     my $result = ($success ? $dbh->commit : $dbh->rollback);
              unless ($result) { 
                die "Couldn't finish transaction: " . $dbh->errstr 
              }
              return $success;
            }
    

    We perform both queries, and record in $success whether they both succeeded. $success will be true if both queries succeeded, false otherwise. If the queries succeded, we commit the transaction; otherwise, we roll it back, cancelling all our changes.

    The problem of concurrent database access is also solved by transactions. Suppose that queries were executed immediately, and that some other program came along and examined the database after our insert but before our update. It would see inconsistent data in the database, even if our update would eventually have succeeded. But with transactions, all the changes happen simultaneously when we do the commit, and the changes are committed automatically, which means that any other program looking at the database either sees all of them or none.

    do

    If you're doing an UPDATE, INSERT, or DELETE there is no data that comes back from the database, so there is a short cut. You can say

     $dbh->do('DELETE FROM people WHERE age > 65');
    

    for example, and DBI will prepare the statement, execute it, and finish it. do returns a true value if it succeeded, and a false value if it failed. Actually, if it succeeds it returns the number of affected rows. In the example it would return the number of rows that were actually deleted. (DBI plays a magic trick so that the value it turns is true even when it is 0. This is bizarre, because 0 is usually false in Perl. But it's convenient because you can use it either as a number or as a true-or-false success code, and it works both ways.)

    AutoCommit

    If your transactions are simple, you can save yourself the trouble of having to issue a lot of commits. When you make the connect call, you can specify an AutoCommit option which will perform an automatic commit operation after every successful query. Here's what it looks like:

     my $dbh = DBI->connect('DBI:Oracle:payroll', 
                                   {AutoCommit => 1},
                                  )
                    or die "Couldn't connect to database: " . DBI->errstr;
    

    Automatic Error Handling

    When you make the connect call, you can specify a RaiseErrors option that handles errors for you automatically. When an error occurs, DBI will abort your program instead of returning a failure code. If all you want is to abort the program on an error, this can be convenient:

     my $dbh = DBI->connect('DBI:Oracle:payroll', 
                                   {RaiseError => 1},
                                  )
                    or die "Couldn't connect to database: " . DBI->errstr;
    

    Don't do This

    People are always writing code like this:

     while ($lastname = <>) {
              my $sth = $dbh->prepare("SELECT * FROM people 
                                       WHERE lastname = '$lastname'");
              $sth->execute();
              # and so on ...
            }
    

    Here we interpolated the value of $lastname directly into the SQL in the prepare call.

    This is a bad thing to do for three reasons.

    First, prepare calls can take a long time. The database server has to compile the SQL and figure out how it is going to run the query. If you have many similar queries, that is a waste of time.

    Second, it will not work if $lastname contains a name like O'Malley or D'Amico or some other name with an '. The ' has a special meaning in SQL, and the database will not understand when you ask it to prepare a statement that looks like

     SELECT * FROM people WHERE lastname = 'O'Malley'
    

    It will see that you have three 's and complain that you don't have a fourth matching ' somewhere else.

    Finally, if you're going to be constructing your query based on a user input, as we did in the example program, it's unsafe to simply interpolate the input directly into the query, because the user can construct a strange input in an attempt to trick your program into doing something it didn't expect. For example, suppose the user enters the following bizarre value for $input:

     x' or lastname = lastname or lastname = 'y
    

    Now our query has become something very surprising:

     SELECT * FROM people WHERE lastname = 'x' 
             or lastname = lastname or lastname = 'y'
    

    The part of this query that our sneaky user is interested in is the second or clause. This clause selects all the records for which lastname is equal to lastname; that is, all of them. We thought that the user was only going to be able to see a few records at a time, and now they've found a way to get them all at once. This probably wasn't what we wanted.

    People go to all sorts of trouble to get around these problems with interpolation. They write a function that puts the last name in quotes and then backslashes any apostrophes that appear in it. Then it breaks because they forgot to backslash backslashes. Then they make their escape function better. Then their code is a big message because they are calling the backslashing function every other line. They put a lot of work into it the backslashing function, and it was all for nothing, because the whole problem is solved by just putting a ? into the query, like this

     SELECT * FROM people WHERE lastname = ?
    

    All my examples look like this. It is safer and more convenient and more efficient to do it this way.

    July 02

    wire and reg (verilog) from http://www.asic-world.com/tidbits/wire_reg.html

    Well I had this doubt when I was learning Verilog: What is the difference between reg and wire? Well I won't tell stories to explain this, rather I will give you some examples to show the difference.

    space.gif

    From the college days we know that wire is something which connects two points, and thus does not have any driving strength. In the figure below, in_wire is a wire which connects the AND gate input to the driving source, clk_wire connects the clock to the flip-flop input, d_wire connects the AND gate output to the flip-flop D input.

    space.gif

    ../images/tidbits/wire.h4.gif

    space.gif

    There is something else about wire which sometimes confuses. wire data types can be used for connecting the output port to the actual driver. Below is the code which when synthesized gives a AND gate as output, as we know a AND gate can drive a load.

    space.gif

    
     1 module wire_example( a, b, y);
     2   input a, b;
     3   output y;
     4 
     5   wire a, b, y;
     6 
     7   assign y = a & b;
     8 
     9 endmodule
    
    You could download file wire_example.v here

    space.gif

    SYNTHESIS OUTPUT

    space.gif

    ../images/tidbits/wire_and.gif

    space.gif

    What this implies is that wire is used for designing combinational logic, as we all know that this kind of logic can not store a value. As you can see from the example above, a wire can be assigned a value by an assign statement. Default data type is wire: this means that if you declare a variable without specifying reg or wire, it will be a 1-bit wide wire.

    space.gif

    Now, coming to reg data type, reg can store value and drive strength. Something that we need to know about reg is that it can be used for modeling both combinational and sequential logic. Reg data type can be driven from initial and always block.

    space.gif

    Reg data type as Combinational element

    space.gif

    
      1 module reg_combo_example( a, b, y);
      2 input a, b;
      3 output y;
      4 
      5 reg   y;
      6 wire a, b;
      7 
      8 always @ ( a or b)
      9 begin	
     10   y = a & b;
     11 end
     12 
     13 endmodule
    
    You could download file reg_combo_example.v here

    space.gif

    SYNTHESIS OUTPUT

    space.gif

    ../images/tidbits/wire_and.gif

    space.gif

    This gives the same output as that of the assign statement, with the only difference that y is declared as reg. There are distinct advantages to have reg modeled as combinational element; reg type is useful when a "case" statement is required (refer to the Verilog section for more on this).

    space.gif

    To model a sequential element using reg, we need to have edge sensitive variables in the sensitivity list of the always block.

    space.gif

    Reg data type as Sequential element

    space.gif

    
      1 module reg_seq_example( clk, reset, d, q);
      2 input clk, reset, d;
      3 output q;
      4   
      5 reg   q;
      6 wire clk, reset, d;
      7 
      8 always @ (posedge clk or posedge reset)
      9 if (reset) begin
     10   q <= 1'b0;
     11 end else begin
     12   q <= d;
     13 end
     14 
     15 endmodule
    
    You could download file reg_seq_example.v here

    space.gif

    SYNTHESIS OUTPUT

    space.gif

    ../images/tidbits/wire_syn.gif

    space.gif

    There is a difference in the way we assign to reg when modeling combinational logic: in this logic we use blocking assignments while modeling sequential logic we use nonblocking ones.

    June 29

    Different between casex and casez from http://www.asic-world.com/verilog/vbehave2.html

    space.gif


    The Conditional Statement if-else

    The if - else statement controls the execution of other statements. In programming language like c, if - else controls the flow of program. When more than one statement needs to be executed for an if condition, then we need to use begin and end as seen in earlier examples.

    space.gif

    Syntax : if

    if (condition)

    statements;

    space.gif

    Syntax : if-else

    if (condition)

    statements;

    else

    statements;

    space.gif

    Syntax : nested if-else-if

    if (condition)

    statements;

    else if (condition)

    statements;

    ................

    ................

    else

    statements;

    space.gif

    ../images/main/bulllet_4dots_orange.gif
    Example- simple if

    space.gif

    
      1 module simple_if();
      2 
      3 reg latch;
      4 wire enable,din;
      5 
      6 always @ (enable or din)
      7 if (enable) begin
      8   latch <= din;
      9 end  
     10 
     11 endmodule
    
    You could download file simple_if.v here

    space.gif

    ../images/main/bulllet_4dots_orange.gif
    Example- if-else

    space.gif

    
      1 module if_else();
      2 
      3 reg dff;
      4 wire clk,din,reset;
      5 
      6 always @ (posedge clk)
      7 if (reset) begin
      8   dff <= 0;
      9 end else  begin
     10   dff <= din;
     11 end
     12 
     13 endmodule
    
    You could download file if_else.v here

    space.gif

    ../images/main/bulllet_4dots_orange.gif
    Example- nested-if-else-if

    space.gif

    
      1 module nested_if();
      2 
      3 reg [3:0] counter;
      4 reg clk,reset,enable, up_en, down_en;
      5 
      6 always @ (posedge clk)
      7 // If reset is asserted
      8 if (reset == 1'b0) begin
      9    counter <= 4'b0000; 
     10 // If counter is enable and up count is asserted
     11 end else if (enable == 1'b1 && up_en == 1'b1) begin
     12    counter <= counter + 1'b1;
     13 // If counter is enable and down count is asserted
     14 end else if (enable == 1'b1 && down_en == 1'b1) begin
     15    counter <= counter - 1'b1;
     16 // If counting is disabled
     17 end else begin
     18    counter <= counter; // Redundant code 
     19 end
     20 
     21 // Testbench code 
     22 initial begin
     23   $monitor ("@‰0dns reset=‰b enable=‰b up=‰b down=‰b count=‰b",
     24              $time, reset, enable, up_en, down_en,counter);
     25   $display("@‰0dns Driving all inputs to know state",$time);
     26   clk = 0;
     27   reset = 0;
     28   enable = 0;
     29   up_en = 0;
     30   down_en = 0;
     31    #3  reset = 1;
     32   $display("@‰0dns De-Asserting reset",$time);
     33    #4  enable = 1;
     34   $display("@‰0dns De-Asserting reset",$time);
     35    #4  up_en = 1;
     36   $display("@‰0dns Putting counter in up count mode",$time);
     37    #10  up_en = 0;
     38   down_en = 1;
     39   $display("@‰0dns Putting counter in down count mode",$time);
     40    #8  $finish;
     41 end
     42 
     43 always  #1  clk = ~clk;
     44 
     45 endmodule
    
    You could download file nested_if.v here

    space.gif

    ../images/main/bulllet_4dots_orange.gif
    Simulation Log- nested-if-else-if

    space.gif

     @0ns Driving all inputs to know state
     @0ns reset=0 enable=0 up=0 down=0 count=xxxx
     @1ns reset=0 enable=0 up=0 down=0 count=0000
     @3ns De-Asserting reset
     @3ns reset=1 enable=0 up=0 down=0 count=0000
     @7ns De-Asserting reset
     @7ns reset=1 enable=1 up=0 down=0 count=0000
     @11ns Putting counter in up count mode
     @11ns reset=1 enable=1 up=1 down=0 count=0001
     @13ns reset=1 enable=1 up=1 down=0 count=0010
     @15ns reset=1 enable=1 up=1 down=0 count=0011
     @17ns reset=1 enable=1 up=1 down=0 count=0100
     @19ns reset=1 enable=1 up=1 down=0 count=0101
     @21ns Putting counter in down count mode
     @21ns reset=1 enable=1 up=0 down=1 count=0100
     @23ns reset=1 enable=1 up=0 down=1 count=0011
     @25ns reset=1 enable=1 up=0 down=1 count=0010
     @27ns reset=1 enable=1 up=0 down=1 count=0001
    

    space.gif

    ../images/main/bulllet_4dots_orange.gif
    Parallel if-else

    In the above example, the (enable == 1'b1 && up_en == 1'b1) is given highest priority and condition (enable == 1'b1 && down_en == 1'b1) is given lowest priority. We normally don't include reset checking in priority as this does not fall in the combo logic input to the flip-flop as shown in the figure below.

    space.gif

    ../images/verilog/if_else.gif

    space.gif

    So when we need priority logic, we use nested if-else statements. On the other hand if we don't want to implement priority logic, knowing that only one input is active at a time (i.e. all inputs are mutually exclusive), then we can write the code as shown below.

    space.gif

    It's known fact that priority implementation takes more logic to implement than parallel implementation. So if you know the inputs are mutually exclusive, then you can code the logic in parallel if.

    space.gif

    
      1 module parallel_if();
      2 
      3 reg [3:0] counter;
      4 wire clk,reset,enable, up_en, down_en;
      5 
      6 always @ (posedge clk)
      7 // If reset is asserted
      8 if (reset == 1'b0) begin
      9    counter <= 4'b0000; 
     10 end else begin
     11   // If counter is enable and up count is mode
     12   if (enable == 1'b1 && up_en == 1'b1) begin
     13     counter <= counter + 1'b1;
     14   end
     15   // If counter is enable and down count is mode
     16   if (enable == 1'b1 && down_en == 1'b1) begin
     17     counter <= counter - 1'b1;
     18   end 
     19 end  
     20 
     21 endmodule
    
    You could download file parallel_if.v here

    space.gif

    ../images/main/bullet_green_ball.gif
    The Case Statement

    The case statement compares an expression to a series of cases and executes the statement or statement group associated with the first matching case:

    space.gif

    • case statement supports single or multiple statements.
    • Group multiple statements using begin and end keywords.

    space.gif

    Syntax of a case statement look as shown below.

    case ()

    < case1 > : < statement >

    < case2 > : < statement >

    .....

    default : < statement >

    endcase

    space.gif

    ../images/main/bulllet_4dots_orange.gif
    Normal Case

    space.gif

    space.gif

    ../images/main/bullet_star_pink.gif
    Example- case

    space.gif

    
      1 module mux (a,b,c,d,sel,y); 
      2 input a, b, c, d; 
      3 input [1:0] sel; 
      4 output y; 
      5 
      6 reg y;
      7 
      8 always @ (a or b or c or d or sel) 
      9 case (sel) 
     10   0 : y = a; 
     11   1 : y = b; 
     12   2 : y = c; 
     13   3 : y = d; 
     14   default : $display("Error in SEL"); 
     15 endcase 
     16     
     17 endmodule
    
    You could download file mux.v here

    space.gif

    ../images/main/bullet_star_pink.gif
    Example- case without default

    space.gif

    
      1 module mux_without_default (a,b,c,d,sel,y);
      2 input a, b, c, d; 
      3 input [1:0] sel; 
      4 output y; 
      5 
      6 reg y;
      7 
      8 always @ (a or b or c or d or sel) 
      9 case (sel) 
     10   0 : y = a; 
     11   1 : y = b; 
     12   2 : y = c; 
     13   3 : y = d; 
     14   2'bxx,2'bx0,2'bx1,2'b0x,2'b1x,
     15   2'bzz,2'bz0,2'bz1,2'b0z,2'b1z : $display("Error in SEL");
     16 endcase 
     17 
     18 endmodule
    
    You could download file mux_without_default.v here

    space.gif

    The example above shows how to specify multiple case items as a single case item.

    space.gif

    The Verilog case statement does an identity comparison (like the === operator); one can use the case statement to check for logic x and z values as shown in the example below.

    space.gif

    ../images/main/bullet_star_pink.gif
    Example- case with x and z

    space.gif

    
      1 module case_xz(enable);
      2 input enable;
      3 
      4 always @ (enable)
      5 case(enable)
      6   1'bz : $display ("enable is floating"); 
      7   1'bx : $display ("enable is unknown"); 
      8   default : $display ("enable is ‰b",enable); 
      9 endcase 
     10 
     11 endmodule
    
    You could download file case_xz.v here

    space.gif

    ../images/main/bulllet_4dots_orange.gif
    The casez and casex statement

    Special versions of the case statement allow the x ad z logic values to be used as "don't care":

    space.gif

    • casez : Treats z as don't care.
    • casex : Treats x and z as don't care.

    space.gif

    ../images/main/bullet_star_pink.gif
    Example- casez

    space.gif

    
      1 module casez_example();
      2 reg [3:0] opcode;
      3 reg [1:0] a,b,c;
      4 reg [1:0] out;
      5 
      6 always @ (opcode or a or b or c)
      7 casez(opcode)
      8   4'b1zzx : begin // Don't care about lower 2:1 bit, bit 0 match with x
      9               out = a; 
     10               $display("@‰0dns 4'b1zzx is selected, opcode ‰b",$time,opcode);
     11             end
     12   4'b01?? : begin
     13               out = b; // bit 1:0 is don't care
     14               $display("@‰0dns 4'b01?? is selected, opcode ‰b",$time,opcode);
     15             end
     16   4'b001? : begin  // bit 0 is don't care
     17               out = c;
     18               $display("@‰0dns 4'b001? is selected, opcode ‰b",$time,opcode);
     19             end
     20   default : begin
     21               $display("@‰0dns default is selected, opcode ‰b",$time,opcode);
     22             end
     23 endcase
     24 
     25 // Testbench code goes here
     26 always  #2  a = $random;
     27 always  #2  b = $random;
     28 always  #2  c = $random;
     29 
     30 initial begin
     31   opcode = 0;
     32    #2  opcode = 4'b101x;
     33    #2  opcode = 4'b0101;
     34    #2  opcode = 4'b0010;
     35    #2  opcode = 4'b0000;
     36    #2  $finish;
     37 end
     38 
     39 endmodule
    
    You could download file casez_example.v here

    space.gif

    ../images/main/bullet_star_pink.gif
    Simulation Output - casez

    space.gif

     @0ns default is selected, opcode 0000
     @2ns 4'b1zzx is selected, opcode 101x
     @4ns 4'b01?? is selected, opcode 0101
     @6ns 4'b001? is selected, opcode 0010
     @8ns default is selected, opcode 0000
    

    space.gif

    ../images/main/bullet_star_pink.gif
    Example- casex

    space.gif

    
      1 module casex_example();
      2 reg [3:0] opcode;
      3 reg [1:0] a,b,c;
      4 reg [1:0] out;
      5 
      6 always @ (opcode or a or b or c)
      7 casex(opcode)
      8   4'b1zzx : begin // Don't care  2:0 bits
      9               out = a; 
     10               $display("@‰0dns 4'b1zzx is selected, opcode ‰b",$time,opcode);
     11             end
     12   4'b01?? : begin // bit 1:0 is don't care
     13               out = b; 
     14               $display("@‰0dns 4'b01?? is selected, opcode ‰b",$time,opcode);
     15             end
     16   4'b001? : begin // bit 0 is don't care
     17               out = c;
     18               $display("@‰0dns 4'b001? is selected, opcode ‰b",$time,opcode);
     19             end
     20   default : begin
     21               $display("@‰0dns default is selected, opcode ‰b",$time,opcode);
     22             end
     23 endcase 
     24 
     25 // Testbench code goes here
     26 always  #2  a = $random;
     27 always  #2  b = $random;
     28 always  #2  c = $random;
     29 
     30 initial begin
     31   opcode = 0;
     32    #2  opcode = 4'b101x;
     33    #2  opcode = 4'b0101;
     34    #2  opcode = 4'b0010;
     35    #2  opcode = 4'b0000;
     36    #2  $finish;
     37 end
     38 
     39 endmodule
    
    You could download file casex_example.v here

    space.gif

    ../images/main/bullet_star_pink.gif
    Simulation Output - casex

    space.gif

     @0ns default is selected, opcode 0000
     @2ns 4'b1zzx is selected, opcode 101x
     @4ns 4'b01?? is selected, opcode 0101
     @6ns 4'b001? is selected, opcode 0010
     @8ns default is selected, opcode 0000
    

    space.gif

    ../images/main/bullet_star_pink.gif
    Example- Comparing case, casex, casez

    space.gif

    
      1 module case_compare;
      2 
      3 reg sel;
      4 
      5 initial begin
      6    #1  $display ("\n     Driving 0");
      7   sel = 0;
      8    #1  $display ("\n     Driving 1");
      9   sel = 1;
     10    #1  $display ("\n     Driving x");
     11   sel = 1'bx;
     12    #1  $display ("\n     Driving z");
     13   sel = 1'bz;
     14    #1  $finish;
     15 end
     16 
     17 always @ (sel)
     18 case (sel)
     19   1'b0 : $display("Normal : Logic 0 on sel");
     20   1'b1 : $display("Normal : Logic 1 on sel");
     21   1'bx : $display("Normal : Logic x on sel");
     22   1'bz : $display("Normal : Logic z on sel");
     23 endcase
     24 
     25 always @ (sel)
     26 casex (sel)
     27   1'b0 : $display("CASEX  : Logic 0 on sel");
     28   1'b1 : $display("CASEX  : Logic 1 on sel");
     29   1'bx : $display("CASEX  : Logic x on sel");
     30   1'bz : $display("CASEX  : Logic z on sel");
     31 endcase
     32 
     33 always @ (sel)
     34 casez (sel)
     35   1'b0 : $display("CASEZ  : Logic 0 on sel");
     36   1'b1 : $display("CASEZ  : Logic 1 on sel");
     37   1'bx : $display("CASEZ  : Logic x on sel");
     38   1'bz : $display("CASEZ  : Logic z on sel");
     39 endcase
     40 
     41 endmodule
    
    You could download file case_compare.v here

    space.gif

    Simulation Output

    space.gif

          Driving 0
     Normal : Logic 0 on sel
     CASEX  : Logic 0 on sel
     CASEZ  : Logic 0 on sel
     
          Driving 1
     Normal : Logic 1 on sel
     CASEX  : Logic 1 on sel
     CASEZ  : Logic 1 on sel
     
          Driving x
     Normal : Logic x on sel
     CASEX  : Logic 0 on sel
     CASEZ  : Logic x on sel
     
          Driving z
     Normal : Logic z on sel
     CASEX  : Logic 0 on sel
     CASEZ  : Logic 0 on sel
    

    X, Z In IF Conditions And CaseX, CaseZ from http://www.see.ed.ac.uk/~gerard/Teach/Verilog/me5cds/me95cdr.html

     

    Logic Levels Within Verilog

      0 - logic zero, false condition

      1 - logic one, true condition

      x - unknown logic value

      z - high impedance

    An x can be any one of a 1, 0, z or change of state. If a one and a zero are both present at a node with comparable strength, the resultant is unknown (x).

    A z represents a high impedance or floating gate condition. It is the weakest level of logic, being very susceptible to change, and primarily occurs when a node is no longer driven.

    X, Z in IF and CASE statements

    Within an IF statement a zero corresponds to a false condition and any other value to true. However, if an unknown (x) or high impedance (z) are compared the result may evaluate to an x or z, being interpreted as a false condition.

    Case expressions may include x's and z's, with the comparison only being successful if there is an exact match between each individual bit (whether it be a 0, 1, x ot z).

    X, Z In CASEX, CASEZ

    Casex and casez are the two variations of the case statement within Verilog. The syntax is almost identical to the case statement, with the only difference being that case is substituted by either casex or casez. The syntax is as follows:

      case_statement :: =

        | case (expression) case_item {case_item} endcase

        | casez (expression) case_item {case_item} endcase

        | case (expression) case_item {case_item} endcase

      case_item :: =

        expression {,expression} : statement_or_null

        | default [:] statement_or_null

    The use of casex and casez allows don't care values to be considered in the comparison. Casez allows for z values to be treated as don't cares, whereas casex allows for both z and x to be treated as don't cares. Only bit values other than the don't care bits are used in the comparison.

    An example of the use of x within a casex statement is given below:

      reg [7:0] value_read, value_held;

      always begin

        value_held = 8'bx1xx01x1

        casex (value_read ^ value_held)

          8'bxx1010x0 : statement1;

          8'b00xx01x0 : statement2;

          8'bx001x0x1 : statement3;

          8'bxx1010x0 : statement4;

        endcase

      end

    Where the ^ character represents the Exclusive-OR function.

    Assuming value_read is given as 11001001 the statement executed is calculated as follows:

      value_held ^ value_read

      = x1xx01x1 ^ 01100110

      = x0xx00x1

    Note: when an x is Exclusive-ORed with any value the result is an x.

    It can be seen that the only statement the match and therefore the one which is executed is statement3.

    rand and srand (Perl)

    Using the Perl rand() function

    Introduction

    The rand() function is used to generate random numbers. By default it generates a number between 0 and 1, however you can pass it a maximum and it will generate numbers between 0 and that number.

    Example 1. Between 0 and 1

    To generate a random decimal number between 0 and 1, use rand() without any parameters.

      #!/usr/bin/perl
      use strict;
      use warnings;
    
      my $random_number = rand();
    
      print $random_number . "\n";
    

    This will give you a random number, like:

      0.521563085335405
    
    Example 2. A bigger range of numbers

    Quite often you don't want a number between 0 and 1, but you want a bigger range of numbers. If you pass rand() a maximum, it will return a decimal number between 0 and that number. Our example below generates a random number between 0 and 100.

      #!/usr/bin/perl
      use strict;
      use warnings;
    
      my $range = 100;
    
      my $random_number = rand($range);
    
      print $random_number . "\n";
    

    The program will produce something like:

      34.0500569277541
    
    Example 3. A random integer

    To generate a random integer, convert the output from rand to an integer, as follows:

      #!/usr/bin/perl
      use strict;
      use warnings;
    
      my $range = 100;
    
      my $random_number = int(rand($range));
    
      print $random_number . "\n";
    

    This program gives you an integer from 0 to 99 inclusive:

      68
    
    Example 4. With an offset

    To generate a random number between, for example, 100 and 150, simply work out the range and add the minimum value to your random number.

      #!/usr/bin/perl
      use strict;
      use warnings;
    
      my $range = 50;
      my $minimum = 100;
    
      my $random_number = int(rand($range)) + $minimum;
    
      print $random_number . "\n";
    

    This program gives you:

      129
     
    For srand:
    • srand EXPR
    • srand

      Sets the random number seed for the rand operator.

      The point of the function is to "seed" the rand function so that rand can produce a different sequence each time you run your program.

      If srand() is not called explicitly, it is called implicitly at the first use of the rand operator. However, this was not the case in versions of Perl before 5.004, so if your script will run under older Perl versions, it should call srand.

      Most programs won't even call srand() at all, except those that need a cryptographically-strong starting point rather than the generally acceptable default, which is based on time of day, process ID, and memory allocation, or the /dev/urandom device, if available.

      You can call srand($seed) with the same $seed to reproduce the same sequence from rand(), but this is usually reserved for generating predictable results for testing or debugging. Otherwise, don't call srand() more than once in your program.

      Do not call srand() (i.e. without an argument) more than once in a script. The internal state of the random number generator should contain more entropy than can be provided by any seed, so calling srand() again actually loses randomness.

      Most implementations of srand take an integer and will silently truncate decimal numbers. This means srand(42) will usually produce the same results as srand(42.1). To be safe, always pass srand an integer.

      In versions of Perl prior to 5.004 the default seed was just the current time. This isn't a particularly good seed, so many old programs supply their own seed value (often time ^ $$ or time ^ ($$ + ($$ << 15)) ), but that isn't necessary any more.

      For cryptographic purposes, however, you need something much more random than the default seed. Checksumming the compressed output of one or more rapidly changing operating system status programs is the usual method. For example:

          srand (time ^ $$ ^ unpack "%L*", `ps axww | gzip -f`);

      If you're particularly concerned with this, see the Math::TrulyRandom module in CPAN.

      Frequently called programs (like CGI scripts) that simply use

          time ^ $$

      for a seed can fall prey to the mathematical property that

          a^b == (a+1)^(b+1)

      one-third of the time. So don't do that.

    June 26

    How to manupulate data of the array (Perl) from http://perldoc.perl.org

    • shift ARRAY

    • shift

      Shifts the first value of the array off and returns it, shortening the array by 1 and moving everything down. If there are no elements in the array, returns the undefined value. If ARRAY is omitted, shifts the @_ array within the lexical scope of subroutines and formats, and the @ARGV array outside of a subroutine and also within the lexical scopes established by the eval STRING , BEGIN {} , INIT {} , CHECK {} , UNITCHECK {} and END {} constructs.

  • unshift ARRAY,LIST

    Does the opposite of a shift. Or the opposite of a push, depending on how you look at it. Prepends list to the front of the array, and returns the new number of elements in the array.

        unshift(@ARGV, '-e') unless $ARGV[0] =~ /^-/;

    Note the LIST is prepended whole, not one element at a time, so the prepended elements stay in the same order. Use reverse to do the reverse.

  •  

  • push ARRAY,LIST

    Treats ARRAY as a stack, and pushes the values of LIST onto the end of ARRAY. The length of ARRAY increases by the length of LIST. Has the same effect as

        for $value (LIST) {
    	$ARRAY[++$#ARRAY] = $value;
        }

    but is more efficient. Returns the number of elements in the array following the completed push.

    • pop ARRAY

    • pop

      Pops and returns the last value of the array, shortening the array by one element.

      If there are no elements in the array, returns the undefined value (although this may happen at other times as well). If ARRAY is omitted, pops the @ARGV array in the main program, and the @_ array in subroutines, just like shift.

    reverse LIST

    In list context, returns a list value consisting of the elements of LIST in the opposite order. In scalar context, concatenates the elements of LIST and returns a string value with all characters in the opposite order.

        print reverse <>;		# line tac, last line first
    
        undef $/;			# for efficiency of <>
        print scalar reverse <>;	# character tac, last line tsrif

    Used without arguments in scalar context, reverse() reverses $_ .

    This operator is also handy for inverting a hash, although there are some caveats. If a value is duplicated in the original hash, only one of those can be represented as a key in the inverted hash. Also, this has to unwind one hash and build a whole new one, which may take some time on a large hash, such as from a DBM file.

        %by_name = reverse %by_address;	# Invert the hash

     

    effective perl programming from http://www.usenix.org/publications/login/2000-7/features/effective.html

    effective perl programming

    hall_joseph

    by Joseph N. Hall
    <joseph@5sigma.com>

    Joseph N. Hall is the author of Effective Perl Programming (Addison-Wesley, 1998). He teaches Perl classes, consults, and plays a lot of golf in his spare time.

    Manual SQL (It Rhymes)

    You Too Can Write an SQL Client

    Lately I've found myself spending an increasing amount of time working with Perl and SQL databases, and sometimes with more than one type of server at a time. One minor aggravation in dealing with different kinds of servers at the same time is that their command-line clients work differently.

    For example, MySQL's command-line client, mysql, has the GNU readline library built into it, which means that you can use the up and down arrows (or Control-N, Control-P) to access the command-line history, and various emacs-like commands to edit the current line. Oracle's SQL*PLUS client, on the other hand, has a lot of nifty features, but no readline library. Ugh.

    Well, I guess if my SQL client(s) don't suit me, I should consider writing my own. And that's exactly what I've done for this article. In years past I probably wouldn't have considered writing my own SQL client, and certainly not as an afternoon "quickie," but as you'll see below, nowadays with Perl it's just a matter of slapping together some modules.

    Starting Up

    First, make sure you have the DBI, Term::ReadLine, and Term::ReadLine::Gnu modules installed, as well as the DBD module(s) for your favorite server(s).

    Our shiny new server-independent SQL client will be called perlsql. Let's start it off like this:

    #!/usr/local/bin/perl -w
    use strict;
    use DBI;
    use Term::ReadLine;
    use File::MkTemp;

    The use DBI directive gives me the DBI module, and use Term::ReadLine gives me an interface to GNU readline-like functionality. File::MkTemp will also come in handy in a bit.

    From the UNIX command line, we'll invoke perlsql something like this:

    persql 'DBI:Oracle:host=localhost;sid=main' scott/tiger

    The first argument is a DBI DSN string. It will, of course, vary (considerably) depending on what server you're connecting to, how it's set up, and what environment you are executing in. The second argument is an Oracle-style username/password identifier. Here's how we process the command line:

    my $dsn = shift;
    die "usage: perlsql dsn [user/password]" unless $dsn;
    my ($user, $passwd) = split(/\/, (shift || ''));

    The shift operator works on @ARGV by default if you don't specify an argument. The username and password default to empty string if none are specified. Now we're ready to connect to the database:

    $dbh = DBI->connect($dsn, $user, $passwd,
    &#nbsp;{ PrintError => 0, RaiseError => 1, AutoCommit => 1 }
    ) or die "can't connect:\n$@";

    This connects us to the database and sets some connect attributes. We turn the PrintError attribute off so that error messages aren't automatically printed. Turning RaiseError on causes DBI to generate an exception when an error is encountered. Turning AutoCommit on automatically commits every statement executed by DBI. Next, let's initialize Term::ReadLine:

    my $term = new Term::ReadLine 'PerlSQ';
    my $OUT = $term->OUT || *STDOUT{IO};

    We're now ready to write some command-processing code.

    The Readline Loop

    I'll go ahead and show you the entire command-processing loop, and then explain it a piece at a time.

    while (defined($_ = $term->readline("$line> ")) ) {
     my $cmd = $_;
     $cmd =~ s/^\s+|\s+$//g;    # lop off whitespace
     next unless $cmd;          # skip blank lines
     my $first = (split(/\s+/, $cmd))[0];
     if ($is_sql_cmd{lc $first}) {
      do_sql($cmd);
     } else {
      if (lc($cmd) eq 'quit') {
       $term->remove_history($term->where_history);
       last;
      } elsif ($cmd =~ /^!/) {
       system substr $cmd, 1;
      } else {
       eval qq(
        package perlsql; no strict; \$save = select(STDOUT);
        \$res = do {$cmd}; select \$save; \$res
       );
       print $@ if $@;
       print "\n";
      }
     }
     $line++;
    }

    The command-processing line is, overall, a while loop that reads a line at a time from our ReadLine terminal. If the user types the end-of-file character, $term->readline returns undef and drops us out of the loop. Inside the loop we first strip leading/trailing white space from the command line and make sure it's not blank. If it's not, we process the line in one of several possible ways.

    First, we check to see if the line is an SQL command. The hash %is_sql_cmd contains a list of SQL commands. This is obviously server-dependent, but simple enough to come up with. I define it like this:

    my %is_sql_cmd = map { $_ => 1 } qw(
     alter analyze associate audit call comment commit
     create delete dissociate drop explain grant insert
     lock noaudit rename revoke rollback savepoint select
     set truncate update
    );

    If it is an SQL command, I pass it to my do_sql subroutine, which I'll explain below. The next possibility is that the user has typed quit. In that case, I delete the current line (containing quit) from the readline history, and exit the loop. Another possibility is a line beginning with an exclamation mark. Those lines get sent to a shell with Perl's system operator.

    If the line doesn't fit one of those descriptions, it's treated as a Perl command. If you supply the eval operator a string, Perl takes the string and executes it as Perl code in the current context. Note that I'm using the generalized qq form of double quote — it just looks better to me than ordinary double quotes if I'm quoting several lines.

    I don't want the Perl code executed in my current package (otherwise, commands typed in by the user might inadvertently mess up the running perlsql program!), so I change to a different package in the eval — perlsql in this case. I turn off strict and make sure the default filehandle is set to STDOUT, then execute the command line in a do block. Then I restore the default filehandle and return the result of executing the command. (Note that this code doesn't actually do anything with the result, $res, but that's a feature that could be added.) If the eval produced an error, the message will be in the $@ variable, so I check that and print it if necessary.

    At the bottom of the loop, I increment the line number counter.

    Processing SQL

    The do_sql subroutine takes a single SQL command as its argument, double-quote interpolates it, executes it, and then displays the result.

    sub do_sql {
     my $sql = shift;
     $sql = eval "package perlsql; no strict; qq\0$sql\0";
     print($@), return if $@;
     print "=> $sql\n";
     my $sth;
     eval {
      $sth = $dbh->prepare($sql);
      my $rv = $sth->execute;
      if ($sql =~ /^\bselect\b/) {
       my ($prec, $names) = @{$sth}{qw(PRECISION NAME_lc)};
       display_result $prec, $names, $sth->fetchall_arrayref;
       print "\n";
      } else {
       print defined($rv) ? ($rv + 0, " rows affected.\n\n") :
             "ok\n\n";

      }
     };
     print $@ if $@;
    }

    Double-quote interpolating SQL command lines is useful because it lets us use Perl variables inside our commands — something like:

    select count(*) from club where state = '$state'

    To double-quote interpolate $sql, I eval it in the perlsql package. Note that the argument to eval is a double-quoted string, and that within that I have another double-quoted string. I use a NUL (\0) as the delimiter for the embedded double-quoted string. After that, I echo the interpolated command to the terminal and then prepare and execute the command.

    I display the results of select statements with the display_result subroutine:

    sub display_result {
     my ($prec, $name, $ary) = @_;
     my $row_format = join(' ', map "%-${_}s", @$prec) . "\n";
     printf $row_format, @$name;
     print join(' ', map { '=' x $_ } @$prec), "\n";
     printf $row_format, @$_ for @$ary;
    }

    The arguments to display_result are array references containing the "precision" of the result columns (how many characters are required to print the contents), the names of the result columns, and the results themselves. The results are a two-dimensional "array of arrays." I use the precision to create a suitable printf format (just printing the data as strings) and then print each row of the result. The whole thing winds up looking like this:

    112> select distinct postal_code from club where name like 'Augusta%'
    => select distinct postal_code from club where name like 'Augusta%'
    postal_code
    ===============
    30904
    04351
    67010
    46701

    For non-select statements, I print the number of rows affected by the command (if available).

    More Powerful Command-line Editing

    One of the annoying things about using SQL command-line clients is that you often need to enter rather long commands. Perhaps you'd like to be able to edit them using a separate editor? No problem! We'll just bind the Control-V key to a subroutine that lets you edit the current line in your favorite editor:

    $term->add_defun('visual', sub {
     my $fn = mktemp("perlsql$$.XXXXXXXX", "/tmp");
     $fn = "/tmp/$fn";
     open F, ">$fn" or die "can't open $fn: $!";
     print F $term->copy_text;
     close F;
     system +($ENV{EDITOR} || 'vi'), $fn;
     if (-r $fn) {
      local $/;
      open F, $fn;
      my $text = <F>;
      close F;
      $text =~ s/[\r\n]+$//;      # no trailing newline
      $term->begin_undo_group;
      $term->delete_text;
      $term->Attribs->{point} = 0;
      $term->insert_text($text);
      $term->Attribs->{point} = length $term->Attribs->
                {line_buffer};

      $term->end_undo_group;
      unlink $fn;
     }
     $term->forced_update_display;
    } );
    $term->bind_key(ord("\cv"), 'visual');

    The Term::ReadLine::Gnu method add_defun registers a subroutine with the readline library. In this case, I've defined it as an anonymous subroutine (with the sub {} operator). I use the mktemp subroutine from File::MkTemp to create a temporary filename, then create a temporary file with that name, write the contents of the current command line into it (from copy_text), and fire up an editor on that file with system. If the editor leaves a readable file, I read the contents back in as a single blob of text (clearing the $/ special variable makes Perl ignore line endings when reading from the file) and use that to set the current command. I found that it was necessary to manipulate the insertion point with Attribs->{point} manually to avoid some weird problems. A call to forced_update_display after everything's done forces the readline library to update the display.

    The bind_key method binds the subroutine that I've registered with the name visual to the Control-V key.

    Cleaning Up

    An END block handles disconnect from the database and cleanup of any temporary files that might have been left behind:

    END {
     print "\n";
     $dbh->disconnect if $dbh;
     unlink </tmp/perlsql$$.*>;
    }

    Features Gone Begging

    I've written a slightly longer version of this program that has a few more frills. It saves the history to a file and restores it on startup, and also reads in a
    ~/.perlsqlrc file written in Perl on startup. You can see it in its entirety at <http://www.perlfaq.com/examples>.

    This short program (the version on my Web site above is only 140 lines long as of this writing) is, I think, an excellent demonstration of how you can quickly create surprisingly powerful and useful things in Perl. It took me only a few hours to write perlsql. Yet, even after that small amount of work, it's a useful database-independent SQL client, and one that knows Perl in addition to everything else!

    The idea behind perlsql isn't a new one — there have been previous attempts at writing DBI/ReadLine clients. The notion hit me all on my own but it was of course not original. The first such well-known DBI-based client was Andreas Koenig's pmsql. A later program was dbimon. dbimon is apparently out of date, but I have also seen a few other more recent Perl-based SQL clients.

    June 24

    Predefined Names (perl) from http://www.cs.cmu.edu/afs/cs/user/rgs/mosaic/pl-predef.html#$|

    Predefined Names

    The following names have special meaning to perl. I could have used alphabetic symbols for some of these, but I didn't want to take the chance that someone would say reset "a-zA-Z" and wipe them all out. You'll just have to suffer along with these silly symbols. Most of them have reasonable mnemonics, or analogues in one of the shells.

    $_
    The default input and pattern-searching space. The following pairs are equivalent:
    	while (<>) {...	# only equivalent in while!
    	while ($_ = <>) {...
    
    	/^Subject:/
    	$_ =~ /^Subject:/
    
    	y/a-z/A-Z/
    	$_ =~ y/a-z/A-Z/
    
    	chop
    	chop($_)
    
    (Mnemonic: underline is understood in certain operations.)

    $.
    The current input line number of the last filehandle that was read. Readonly. Remember that only an explicit close on the filehandle resets the line number. Since <> never does an explicit close, line numbers increase across ARGV files (but see examples under eof). (Mnemonic: many programs use . to mean the current line number.)

    $/
    The input record separator, newline by default. Works like awk's RS variable, including treating blank lines as delimiters if set to the null string. You may set it to a multicharacter string to match a multi-character delimiter. Note that setting it to "\n\n" means something slightly different than setting it to "", if the file contains consecutive blank lines. Setting it to "" will treat two or more consecutive blank lines as a single blank line. Setting it to "\n\n" will blindly assume that the next input character belongs to the next paragraph, even if it's a newline. (Mnemonic: / is used to delimit line boundaries when quoting poetry.)

    $,
    The output field separator for the print operator. Ordinarily the print operator simply prints out the comma separated fields you specify. In order to get behavior more like awk, set this variable as you would set awk's OFS variable to specify what is printed between fields. (Mnemonic: what is printed when there is a , in your print statement.)

    $""
    This is like $, except that it applies to array values interpolated into a double-quoted string (or similar interpreted string). Default is a space. (Mnemonic: obvious, I think.)

    $\
    The output record separator for the print operator. Ordinarily the print operator simply prints out the comma separated fields you specify, with no trailing newline or record separator assumed. In order to get behavior more like awk, set this variable as you would set awk's ORS variable to specify what is printed at the end of the print. (Mnemonic: you set $\ instead of adding \n at the end of the print. Also, it's just like /, but it's what you get "back" from perl.)

    $#
    The output format for printed numbers. This variable is a half-hearted attempt to emulate awk's OFMT variable. There are times, however, when awk and perl have differing notions of what is in fact numeric. Also, the initial value is %.20g rather than %.6g, so you need to set $# explicitly to get awk's value. (Mnemonic: # is the number sign.)

    $%
    The current page number of the currently selected output channel. (Mnemonic: % is page number in nroff.)

    $=
    The current page length (printable lines) of the currently selected output channel. Default is 60. (Mnemonic: = has horizontal lines.)

    $-
    The number of lines left on the page of the currently selected output channel. (Mnemonic: lines_on_page - lines_printed.)

    $~
    The name of the current report format for the currently selected output channel. Default is name of the filehandle. (Mnemonic: brother to $^.)

    $^
    The name of the current top-of-page format for the currently selected output channel. Default is name of the filehandle with "_TOP" appended. (Mnemonic: points to top of page.)

    $|
    If set to nonzero, forces a flush after every write or print on the currently selected output channel. Default is 0. Note that STDOUT will typically be line buffered if output is to the terminal and block buffered otherwise. Setting this variable is useful primarily when you are outputting to a pipe, such as when you are running a perl script under rsh and want to see the output as it's happening. (Mnemonic: when you want your pipes to be piping hot.)

    $$
    The process number of the perl running this script. (Mnemonic: same as shells.)

    $?
    The status returned by the last pipe close, backtick (\`\`) command or system operator. Note that this is the status word returned by the wait() system call, so the exit value of the subprocess is actually ($? >> 8). $? & 255 gives which signal, if any, the process died from, and whether there was a core dump. (Mnemonic: similar to sh and ksh.)

    $&
    The string matched by the last successful pattern match (not counting any matches hidden within a BLOCK or eval enclosed by the current BLOCK). (Mnemonic: like & in some editors.)

    $\`
    The string preceding whatever was matched by the last successful pattern match (not counting any matches hidden within a BLOCK or eval enclosed by the current BLOCK). (Mnemonic: \` often precedes a quoted string.)

    $'
    The string following whatever was matched by the last successful pattern match (not counting any matches hidden within a BLOCK or eval enclosed by the current BLOCK). (Mnemonic: ' often follows a quoted string.) Example:
    	$_ = 'abcdefghi';
    	/def/;
    	print "$\`:$&:$'\n";  	# prints abc:def:ghi
    

    $+
    The last bracket matched by the last search pattern. This is useful if you don't know which of a set of alternative patterns matched. For example:
        /Version: (.*)|Revision: (.*)/ && ($rev = $+);
    
    (Mnemonic: be positive and forward looking.)

    $*
    Set to 1 to do multiline matching within a string, 0 to tell perl that it can assume that strings contain a single line, for the purpose of optimizing pattern matches. Pattern matches on strings containing multiple newlines can produce confusing results when $* is 0. Default is 0. (Mnemonic: * matches multiple things.) Note that this variable only influences the interpretation of ^ and $. A literal newline can be searched for even when $* == 0.

    $0
    Contains the name of the file containing the perl script being executed. Assigning to $0 modifies the argument area that the ps(1) program sees. (Mnemonic: same as sh and ksh.)

    $<digit>
    Contains the subpattern from the corresponding set of parentheses in the last pattern matched, not counting patterns matched in nested blocks that have been exited already. (Mnemonic: like \digit.)

    $[
    The index of the first element in an array, and of the first character in a substring. Default is 0, but you could set it to 1 to make perl behave more like awk (or Fortran) when subscripting and when evaluating the index() and substr() functions. (Mnemonic: [ begins subscripts.)

    $]
    The string printed out when you say "perl -v". It can be used to determine at the beginning of a script whether the perl interpreter executing the script is in the right range of versions. If used in a numeric context, returns the version + patchlevel / 1000. Example:
    	# see if getc is available
            ($version,$patchlevel) =
    		 $] =~ /(\d+\.\d+).*\nPatch level: (\d+)/;
            print STDERR "(No filename completion available.)\n"
    		 if $version * 1000 + $patchlevel < 2016;
    
    or, used numerically,
    	warn "No checksumming!\n" if $] < 3.019;
    
    (Mnemonic: Is this version of perl in the right bracket?)

    $;
    The subscript separator for multi-dimensional array emulation. If you refer to an associative array element as
    	$foo{$a,$b,$c}
    
    it really means
    	$foo{join($;, $a, $b, $c)}
    
    But don't put
    	@foo{$a,$b,$c}		# a slice--note the @
    
    which means
    	($foo{$a},$foo{$b},$foo{$c})
    
    Default is "\034", the same as SUBSEP in awk. Note that if your keys contain binary data there might not be any safe value for $;. (Mnemonic: comma (the syntactic subscript separator) is a semi-semicolon. Yeah, I know, it's pretty lame, but $, is already taken for something more important.)

    $!
    If used in a numeric context, yields the current value of errno, with all the usual caveats. (This means that you shouldn't depend on the value of $! to be anything in particular unless you've gotten a specific error return indicating a system error.) If used in a string context, yields the corresponding system error string. You can assign to $! in order to set errno if, for instance, you want $! to return the string for error n, or you want to set the exit value for the die operator. (Mnemonic: What just went bang?)

    $@
    The perl syntax error message from the last eval command. If null, the last eval parsed and executed correctly (although the operations you invoked may have failed in the normal fashion). (Mnemonic: Where was the syntax error "at"?)

    $<
    The real uid of this process. (Mnemonic: it's the uid you came FROM, if you're running setuid.)

    $>
    The effective uid of this process. Example:
    	$< = $>;	# set real uid to the effective uid
    	($<,$>) = ($>,$<);	# swap real and effective uid
    
    (Mnemonic: it's the uid you went TO, if you're running setuid.) Note: $< and $> can only be swapped on machines supporting setreuid().

    $(
    The real gid of this process. If you are on a machine that supports membership in multiple groups simultaneously, gives a space separated list of groups you are in. The first number is the one returned by getgid(), and the subsequent ones by getgroups(), one of which may be the same as the first number. (Mnemonic: parentheses are used to GROUP things. The real gid is the group you LEFT, if you're running setgid.)

    $)
    The effective gid of this process. If you are on a machine that supports membership in multiple groups simultaneously, gives a space separated list of groups you are in. The first number is the one returned by getegid(), and the subsequent ones by getgroups(), one of which may be the same as the first number. (Mnemonic: parentheses are used to GROUP things. The effective gid is the group that's RIGHT for you, if you're running setgid.)

    Note: $<, $>, $( and $) can only be set on machines that support the corresponding set[re][ug]id() routine. $( and $) can only be swapped on machines supporting setregid().

    $:
    The current set of characters after which a string may be broken to fill continuation fields (starting with ^) in a format. Default is "\ \n-", to break on whitespace or hyphens. (Mnemonic: a "colon" in poetry is a part of a line.)

    $^D
    The current value of the debugging flags. (Mnemonic: value of -D switch.)

    $^F
    The maximum system file descriptor, ordinarily 2. System file descriptors are passed to subprocesses, while higher file descriptors are not. During an open, system file descriptors are preserved even if the open fails. Ordinary file descriptors are closed before the open is attempted.

    $^I
    The current value of the inplace-edit extension. Use undef to disable inplace editing. (Mnemonic: value of -i switch.)

    $^L
    What formats output to perform a formfeed. Default is \f.

    $^P
    The internal flag that the debugger clears so that it doesn't debug itself. You could conceivable disable debugging yourself by clearing it.

    $^T
    The time at which the script began running, in seconds since the epoch. The values returned by the -M , -A and -C filetests are based on this value.

    $^W
    The current value of the warning switch. (Mnemonic: related to the -w switch.)

    $^X
    The name that Perl itself was executed as, from argv[0].

    $ARGV
    contains the name of the current file when reading from <>.

    @ARGV
    The array ARGV contains the command line arguments intended for the script. Note that $#ARGV is the generally number of arguments minus one, since $ARGV[0] is the first argument, NOT the command name. See $0 for the command name.

    @INC
    The array INC contains the list of places to look for perl scripts to be evaluated by the "do EXPR" command or the "require" command. It initially consists of the arguments to any -I command line switches, followed by the default perl library, probably "/usr/local/lib/perl", followed by ".", to represent the current directory.

    %INC
    The associative array INC contains entries for each filename that has been included via "do" or "require". The key is the filename you specified, and the value is the location of the file actually found. The "require" command uses this array to determine whether a given file has already been included.

    $ENV{expr}
    The associative array ENV contains your current environment. Setting a value in ENV changes the environment for child processes.

    $SIG{expr}
    The associative array SIG is used to set signal handlers for various signals. Example:
    	sub handler {	# 1st argument is signal name
    		local($sig) = @_;
    		print "Caught a SIG$sig--shutting down\n";
    		close(LOG);
    		exit(0);
    	}
    
    	$SIG{'INT'} = 'handler';
    	$SIG{'QUIT'} = 'handler';
    	...
    	$SIG{'INT'} = 'DEFAULT';	# restore default action
    	$SIG{'QUIT'} = 'IGNORE';	# ignore SIGQUIT
    
    The SIG array only contains values for the signals actually set within the perl script.

    system vs exec perl from perldoc.perl.org

    • system LIST
    • system PROGRAM LIST

      Does exactly the same thing as exec LIST , except that a fork is done first, and the parent process waits for the child process to complete. Note that argument processing varies depending on the number of arguments. If there is more than one argument in LIST, or if LIST is an array with more than one value, starts the program given by the first element of the list with arguments given by the rest of the list. If there is only one scalar argument, the argument is checked for shell metacharacters, and if there are any, the entire argument is passed to the system's command shell for parsing (this is /bin/sh -c on Unix platforms, but varies on other platforms). If there are no shell metacharacters in the argument, it is split into words and passed directly to execvp , which is more efficient.

      Beginning with v5.6.0, Perl will attempt to flush all files opened for output before any operation that may do a fork, but this may not be supported on some platforms (see perlport). To be safe, you may need to set $| ($AUTOFLUSH in English) or call the autoflush() method of IO::Handle on any open handles.

      The return value is the exit status of the program as returned by the wait call. To get the actual exit value, shift right by eight (see below). See also "exec". This is not what you want to use to capture the output from a command, for that you should use merely backticks or qx//, as described in ""`STRING`"" in perlop. Return value of -1 indicates a failure to start the program or an error of the wait(2) system call (inspect $! for the reason).

      Like exec, system allows you to lie to a program about its name if you use the system PROGRAM LIST syntax. Again, see "exec".

      Since SIGINT and SIGQUIT are ignored during the execution of system, if you expect your program to terminate on receipt of these signals you will need to arrange to do so yourself based on the return value.

          @args = ("command", "arg1", "arg2");
          system(@args) == 0
      	 or die "system @args failed: $?"

      You can check all the failure possibilities by inspecting $? like this:

          if ($? == -1) {
      	print "failed to execute: $!\n";
          }
          elsif ($? & 127) {
      	printf "child died with signal %d, %s coredump\n",
      	    ($? & 127),  ($? & 128) ? 'with' : 'without';
          }
          else {
      	printf "child exited with value %d\n", $? >> 8;
          }

      Alternatively you might inspect the value of ${^CHILD_ERROR_NATIVE} with the W*() calls of the POSIX extension.

      When the arguments get executed via the system shell, results and return codes will be subject to its quirks and capabilities. See ""`STRING`"" in perlop and "exec" for details.

    • exec LIST
    • exec PROGRAM LIST

      The exec function executes a system command and never returns-- use system instead of exec if you want it to return. It fails and returns false only if the command does not exist and it is executed directly instead of via your system's command shell (see below).

      Since it's a common mistake to use exec instead of system, Perl warns you if there is a following statement which isn't die, warn, or exit (if -w is set - but you always do that). If you really want to follow an exec with some other statement, you can use one of these styles to avoid the warning:

          exec ('foo')   or print STDERR "couldn't exec foo: $!";
          { exec ('foo') }; print STDERR "couldn't exec foo: $!";

      If there is more than one argument in LIST, or if LIST is an array with more than one value, calls execvp(3) with the arguments in LIST. If there is only one scalar argument or an array with one element in it, the argument is checked for shell metacharacters, and if there are any, the entire argument is passed to the system's command shell for parsing (this is /bin/sh -c on Unix platforms, but varies on other platforms). If there are no shell metacharacters in the argument, it is split into words and passed directly to execvp , which is more efficient. Examples:

          exec '/bin/echo', 'Your arguments are: ', @ARGV;
          exec "sort $outfile | uniq";

      If you don't really want to execute the first argument, but want to lie to the program you are executing about its own name, you can specify the program you actually want to run as an "indirect object" (without a comma) in front of the LIST. (This always forces interpretation of the LIST as a multivalued list, even if there is only a single scalar in the list.) Example:

          $shell = '/bin/csh';
          exec $shell '-sh';		# pretend it's a login shell

      or, more directly,

          exec {'/bin/csh'} '-sh';	# pretend it's a login shell

      When the arguments get executed via the system shell, results will be subject to its quirks and capabilities. See ""`STRING`"" in perlop for details.

      Using an indirect object with exec or system is also more secure. This usage (which also works fine with system()) forces interpretation of the arguments as a multivalued list, even if the list had just one argument. That way you're safe from the shell expanding wildcards or splitting up words with whitespace in them.

          @args = ( "echo surprise" );
      
          exec @args;               # subject to shell escapes
                                      # if @args == 1
          exec { $args[0] } @args;  # safe even with one-arg list

      The first version, the one without the indirect object, ran the echo program, passing it "surprise" an argument. The second version didn't--it tried to run a program literally called "echo surprise", didn't find it, and set $? to a non-zero value indicating failure.

      Beginning with v5.6.0, Perl will attempt to flush all files opened for output before the exec, but this may not be supported on some platforms (see perlport). To be safe, you may need to set $| ($AUTOFLUSH in English) or call the autoflush() method of IO::Handle on any open handles in order to avoid lost output.

      Note that exec will not call your END blocks, nor will it call any DESTROY methods in your objects.

      conclusion:

    Both Perl's exec() function and system() function execute a system shell command. The big difference is that system() creates a fork process and waits to see if the command succeeds or fails - returning a value. exec() does not return anything, it simply executes the command. Neither of these commands should be used to capture the output of a system call. If your goal is to capture output, you should use the backtick operator:

    $result = `PROGRAM`;

    Pattern matching inside the array

    #! usr/bin/perl

    @a = 'adad';
    foreach(@a){
    if (/adad/){
    print "dadadadda";
    }
    }

    output:

    dadadadda

    but if the code is like below

    #! usr/bin/perl

    $a = 'adad';

    if (/adad/){
    print "dadadadda";
    }

    it don’t have the  output.

    the first one is like pattern matching,as long as the element of a array can match the adad, it will print out the dadadadda

    umask -Perl

    • umask EXPR
    • umask

      Sets the umask for the process to EXPR and returns the previous value. If EXPR is omitted, merely returns the current umask.

      The Unix permission rwxr-x--- is represented as three sets of three bits, or three octal digits: 0750 (the leading 0 indicates octal and isn't one of the digits). The umask value is such a number representing disabled permissions bits. The permission (or "mode") values you pass mkdir or sysopen are modified by your umask, so even if you tell sysopen to create a file with permissions 0777 , if your umask is 0022 then the file will actually be created with permissions 0755 . If your umask were 0027 (group can't write; others can't read, write, or execute), then passing sysopen 0666 would create a file with mode 0640 (0666 &~ 027 is 0640 ).

      Here's some advice: supply a creation mode of 0666 for regular files (in sysopen) and one of 0777 for directories (in mkdir) and executable files. This gives users the freedom of choice: if they want protected files, they might choose process umasks of 022 , 027 , or even the particularly antisocial mask of 077 . Programs should rarely if ever make policy decisions better left to the user. The exception to this is when writing files that should be kept private: mail files, web browser cookies, .rhosts files, and so on.

      If umask(2) is not implemented on your system and you are trying to restrict access for yourself (i.e., (EXPR & 0700) > 0), produces a fatal error at run time. If umask(2) is not implemented and you are not trying to restrict access for yourself, returns undef.

      Remember that a umask is a number, usually given in octal; it is not a string of octal digits. See also "oct", if all you have is a string.

    • Note:

    0 = absolutely nothing

    1 = x other

    2 = w other

    4 = r other

    8 = x group

    16 = w group

    32 = r group

    64 = x user

    128 = w user

    256 = r user

    512 = sticky other

    1024 = sticky group

    2048 = sticky user

    add 'em together to get what you want! :)

    e.g. umask 384 = rw-------

    umask 448 = rwx------

    umask 457 = rwx--x—x

    umask 420 = rw-r--r--

    umask 4095 = rwsrwsrwt –everything

    umask 4096 = --------- --flips all the bits over

    June 19

    perltoot - Tom's object-oriented tutorial for perl(2) from http://perl.about.com/gi/dynamic/offsite.htm?zi=1/XJ/Ya&sdn=perl&cdn=compute&tm=20&gps=247_142_1020_541&f=22&tt=14&bt=0&bts=0&st=31&zu=http%3A//www.xav.com/perl/lib/Pod/perltoot.html

    Alternate Object Representations

    Nothing requires objects to be implemented as hash references. An object can be any sort of reference so long as its referent has been suitably blessed. That means scalar, array, and code references are also fair game.

    A scalar would work if the object has only one datum to hold. An array would work for most cases, but makes inheritance a bit dodgy because you have to invent new indices for the derived classes.

    Arrays as Objects

    If the user of your class honors the contract and sticks to the advertised interface, then you can change its underlying interface if you feel like it. Here's another implementation that conforms to the same interface specification. This time we'll use an array reference instead of a hash reference to represent the object.

        package Person;
        use strict;
        my($NAME, $AGE, $PEERS) = ( 0 .. 2 );
        ############################################
        ## the object constructor (array version) ##
        ############################################
        sub new {
            my $self = [];
            $self->[$NAME]   = undef;  # this is unnecessary
            $self->[$AGE]    = undef;  # as is this
            $self->[$PEERS]  = [];     # but this isn't, really
            bless($self);
            return $self;
        }
        sub name {
            my $self = shift;
            if (@_) { $self->[$NAME] = shift }
            return $self->[$NAME];
        }
        sub age {
            my $self = shift;
            if (@_) { $self->[$AGE] = shift }
            return $self->[$AGE];
        }
        sub peers {
            my $self = shift;
            if (@_) { @{ $self->[$PEERS] } = @_ }
            return @{ $self->[$PEERS] };
        }
        1;  # so the require or use succeeds

    You might guess that the array access would be a lot faster than the hash access, but they're actually comparable. The array is a little bit faster, but not more than ten or fifteen percent, even when you replace the variables above like $AGE with literal numbers, like 1. A bigger difference between the two approaches can be found in memory use. A hash representation takes up more memory than an array representation because you have to allocate memory for the keys as well as for the values. However, it really isn't that bad, especially since as of version 5.004, memory is only allocated once for a given hash key, no matter how many hashes have that key. It's expected that sometime in the future, even these differences will fade into obscurity as more efficient underlying representations are devised.

    Still, the tiny edge in speed (and somewhat larger one in memory) is enough to make some programmers choose an array representation for simple classes. There's still a little problem with scalability, though, because later in life when you feel like creating subclasses, you'll find that hashes just work out better.

    Closures as Objects

    Using a code reference to represent an object offers some fascinating possibilities. We can create a new anonymous function (closure) who alone in all the world can see the object's data. This is because we put the data into an anonymous hash that's lexically visible only to the closure we create, bless, and return as the object. This object's methods turn around and call the closure as a regular subroutine call, passing it the field we want to affect. (Yes, the double-function call is slow, but if you wanted fast, you wouldn't be using objects at all, eh? :-)

    Use would be similar to before:

        use Person;
        $him = Person->new();
        $him->name("Jason");
        $him->age(23);
        $him->peers( [ "Norbert", "Rhys", "Phineas" ] );
        printf "%s is %d years old.\n", $him->name, $him->age;
        print "His peers are: ", join(", ", @{$him->peers}), "\n";

    but the implementation would be radically, perhaps even sublimely different:

        package Person;
        sub new {
             my $that  = shift;
             my $class = ref($that) || $that;
             my $self = {
                NAME  => undef,
                AGE   => undef,
                PEERS => [],
             };
             my $closure = sub {
                my $field = shift;
                if (@_) { $self->{$field} = shift }
                return    $self->{$field};
            };
            bless($closure, $class);
            return $closure;
        }
        sub name   { &{ $_[0] }("NAME",  @_[ 1 .. $#_ ] ) }
        sub age    { &{ $_[0] }("AGE",   @_[ 1 .. $#_ ] ) }
        sub peers  { &{ $_[0] }("PEERS", @_[ 1 .. $#_ ] ) }
        1;

    Because this object is hidden behind a code reference, it's probably a bit mysterious to those whose background is more firmly rooted in standard procedural or object-based programming languages than in functional programming languages whence closures derive. The object created and returned by the new() method is itself not a data reference as we've seen before. It's an anonymous code reference that has within it access to a specific version (lexical binding and instantiation) of the object's data, which are stored in the private variable $self. Although this is the same function each time, it contains a different version of $self.

    When a method like $him->name("Jason") is called, its implicit zeroth argument is the invoking object--just as it is with all method calls. But in this case, it's our code reference (something like a function pointer in C++, but with deep binding of lexical variables). There's not a lot to be done with a code reference beyond calling it, so that's just what we do when we say &{$_[0]}. This is just a regular function call, not a method call. The initial argument is the string ``NAME'', and any remaining arguments are whatever had been passed to the method itself.

    Once we're executing inside the closure that had been created in new(), the $self hash reference suddenly becomes visible. The closure grabs its first argument (``NAME'' in this case because that's what the name() method passed it), and uses that string to subscript into the private hash hidden in its unique version of $self.

    Nothing under the sun will allow anyone outside the executing method to be able to get at this hidden data. Well, nearly nothing. You could single step through the program using the debugger and find out the pieces while you're in the method, but everyone else is out of luck.

    There, if that doesn't excite the Scheme folks, then I just don't know what will. Translation of this technique into C++, Java, or any other braindead-static language is left as a futile exercise for aficionados of those camps.

    You could even add a bit of nosiness via the caller() function and make the closure refuse to operate unless called via its own package. This would no doubt satisfy certain fastidious concerns of programming police and related puritans.

    If you were wondering when Hubris, the third principle virtue of a programmer, would come into play, here you have it. (More seriously, Hubris is just the pride in craftsmanship that comes from having written a sound bit of well-designed code.)


    AUTOLOAD: Proxy Methods

    Autoloading is a way to intercept calls to undefined methods. An autoload routine may choose to create a new function on the fly, either loaded from disk or perhaps just eval()ed right there. This define-on-the-fly strategy is why it's called autoloading.

    But that's only one possible approach. Another one is to just have the autoloaded method itself directly provide the requested service. When used in this way, you may think of autoloaded methods as ``proxy'' methods.

    When Perl tries to call an undefined function in a particular package and that function is not defined, it looks for a function in that same package called AUTOLOAD. If one exists, it's called with the same arguments as the original function would have had. The fully-qualified name of the function is stored in that package's global variable $AUTOLOAD. Once called, the function can do anything it would like, including defining a new function by the right name, and then doing a really fancy kind of goto right to it, erasing itself from the call stack.

    What does this have to do with objects? After all, we keep talking about functions, not methods. Well, since a method is just a function with an extra argument and some fancier semantics about where it's found, we can use autoloading for methods, too. Perl doesn't start looking for an AUTOLOAD method until it has exhausted the recursive hunt up through @ISA, though. Some programmers have even been known to define a UNIVERSAL::AUTOLOAD method to trap unresolved method calls to any kind of object.

    Autoloaded Data Methods

    You probably began to get a little suspicious about the duplicated code way back earlier when we first showed you the Person class, and then later the Employee class. Each method used to access the hash fields looked virtually identical. This should have tickled that great programming virtue, Impatience, but for the time, we let Laziness win out, and so did nothing. Proxy methods can cure this.

    Instead of writing a new function every time we want a new data field, we'll use the autoload mechanism to generate (actually, mimic) methods on the fly. To verify that we're accessing a valid member, we will check against an _permitted (pronounced ``under-permitted'') field, which is a reference to a file-scoped lexical (like a C file static) hash of permitted fields in this record called %fields. Why the underscore? For the same reason as the _CENSUS field we once used: as a marker that means ``for internal use only''.

    Here's what the module initialization code and class constructor will look like when taking this approach:

        package Person;
        use Carp;
        our $AUTOLOAD;  # it's a package global
        my %fields = (
            name        => undef,
            age         => undef,
            peers       => undef,
        );
        sub new {
            my $that  = shift;
            my $class = ref($that) || $that;
            my $self  = {
                _permitted => \%fields,
                %fields,
            };
            bless $self, $class;
            return $self;
        }

    If we wanted our record to have default values, we could fill those in where current we have undef in the %fields hash.

    Notice how we saved a reference to our class data on the object itself? Remember that it's important to access class data through the object itself instead of having any method reference %fields directly, or else you won't have a decent inheritance.

    The real magic, though, is going to reside in our proxy method, which will handle all calls to undefined methods for objects of class Person (or subclasses of Person). It has to be called AUTOLOAD. Again, it's all caps because it's called for us implicitly by Perl itself, not by a user directly.

        sub AUTOLOAD {
            my $self = shift;
            my $type = ref($self)
                        or croak "$self is not an object";
            my $name = $AUTOLOAD;
            $name =~ s/.*://;   # strip fully-qualified portion
            unless (exists $self->{_permitted}->{$name} ) {
                croak "Can't access `$name' field in class $type";
            }
            if (@_) {
                return $self->{$name} = shift;
            } else {
                return $self->{$name};
            }
        }

    Pretty nifty, eh? All we have to do to add new data fields is modify %fields. No new functions need be written.

    I could have avoided the _permitted field entirely, but I wanted to demonstrate how to store a reference to class data on the object so you wouldn't have to access that class data directly from an object method.

    Inherited Autoloaded Data Methods

    But what about inheritance? Can we define our Employee class similarly? Yes, so long as we're careful enough.

    Here's how to be careful:

        package Employee;
        use Person;
        use strict;
        our @ISA = qw(Person);
        my %fields = (
            id          => undef,
            salary      => undef,
        );
        sub new {
            my $that  = shift;
            my $class = ref($that) || $that;
            my $self = bless $that->SUPER::new(), $class;
            my($element);
            foreach $element (keys %fields) {
                $self->{_permitted}->{$element} = $fields{$element};
            }
            @{$self}{keys %fields} = values %fields;
            return $self;
        }

    Once we've done this, we don't even need to have an AUTOLOAD function in the Employee package, because we'll grab Person's version of that via inheritance, and it will all work out just fine.


    Metaclassical Tools

    Even though proxy methods can provide a more convenient approach to making more struct-like classes than tediously coding up data methods as functions, it still leaves a bit to be desired. For one thing, it means you have to handle bogus calls that you don't mean to trap via your proxy. It also means you have to be quite careful when dealing with inheritance, as detailed above.

    Perl programmers have responded to this by creating several different class construction classes. These metaclasses are classes that create other classes. A couple worth looking at are Class::Struct and Alias. These and other related metaclasses can be found in the modules directory on CPAN.

    Class::Struct

    One of the older ones is Class::Struct. In fact, its syntax and interface were sketched out long before perl5 even solidified into a real thing. What it does is provide you a way to ``declare'' a class as having objects whose fields are of a specific type. The function that does this is called, not surprisingly enough, struct(). Because structures or records are not base types in Perl, each time you want to create a class to provide a record-like data object, you yourself have to define a new() method, plus separate data-access methods for each of that record's fields. You'll quickly become bored with this process. The Class::Struct::struct() function alleviates this tedium.

    Here's a simple example of using it:

        use Class::Struct qw(struct);
        use Jobbie;  # user-defined; see below
        struct 'Fred' => {
            one        => '$',
            many       => '@',
            profession => Jobbie,  # calls Jobbie->new()
        };
        $ob = Fred->new;
        $ob->one("hmmmm");
        $ob->many(0, "here");
        $ob->many(1, "you");
        $ob->many(2, "go");
        print "Just set: ", $ob->many(2), "\n";
        $ob->profession->salary(10_000);

    You can declare types in the struct to be basic Perl types, or user-defined types (classes). User types will be initialized by calling that class's new() method.

    Here's a real-world example of using struct generation. Let's say you wanted to override Perl's idea of gethostbyname() and gethostbyaddr() so that they would return objects that acted like C structures. We don't care about high-falutin' OO gunk. All we want is for these objects to act like structs in the C sense.

        use Socket;
        use Net::hostent;
        $h = gethostbyname("perl.com");  # object return
        printf "perl.com's real name is %s, address %s\n",
            $h->name, inet_ntoa($h->addr);

    Here's how to do this using the Class::Struct module. The crux is going to be this call:

        struct 'Net::hostent' => [          # note bracket
            name       => '$',
            aliases    => '@',
            addrtype   => '$',
            'length'   => '$',
            addr_list  => '@',
         ];

    Which creates object methods of those names and types. It even creates a new() method for us.

    We could also have implemented our object this way:

        struct 'Net::hostent' => {          # note brace
            name       => '$',
            aliases    => '@',
            addrtype   => '$',
            'length'   => '$',
            addr_list  => '@',
         };

    and then Class::Struct would have used an anonymous hash as the object type, instead of an anonymous array. The array is faster and smaller, but the hash works out better if you eventually want to do inheritance. Since for this struct-like object we aren't planning on inheritance, this time we'll opt for better speed and size over better flexibility.

    Here's the whole implementation:

        package Net::hostent;
        use strict;
        BEGIN {
            use Exporter   ();
            our @EXPORT      = qw(gethostbyname gethostbyaddr gethost);
            our @EXPORT_OK   = qw(
                                   $h_name         @h_aliases
                                   $h_addrtype     $h_length
                                   @h_addr_list    $h_addr
                               );
            our %EXPORT_TAGS = ( FIELDS => [ @EXPORT_OK, @EXPORT ] );
        }
        our @EXPORT_OK;
        # Class::Struct forbids use of @ISA
        sub import { goto &Exporter::import }
        use Class::Struct qw(struct);
        struct 'Net::hostent' => [
           name        => '$',
           aliases     => '@',
           addrtype    => '$',
           'length'    => '$',
           addr_list   => '@',
        ];
        sub addr { shift->addr_list->[0] }
        sub populate (@) {
            return unless @_;
            my $hob = new();  # Class::Struct made this!
            $h_name     =    $hob->[0]              = $_[0];
            @h_aliases  = @{ $hob->[1] } = split ' ', $_[1];
            $h_addrtype =    $hob->[2]              = $_[2];
            $h_length   =    $hob->[3]              = $_[3];
            $h_addr     =                             $_[4];
            @h_addr_list = @{ $hob->[4] } =         @_[ (4 .. $#_) ];
            return $hob;
        }
        sub gethostbyname ($)  { populate(CORE::gethostbyname(shift)) }
        sub gethostbyaddr ($;$) {
            my ($addr, $addrtype);
            $addr = shift;
            require Socket unless @_;
            $addrtype = @_ ? shift : Socket::AF_INET();
            populate(CORE::gethostbyaddr($addr, $addrtype))
        }
        sub gethost($) {
            if ($_[0] =~ /^\d+(?:\.\d+(?:\.\d+(?:\.\d+)?)?)?$/) {
               require Socket;
               &gethostbyaddr(Socket::inet_aton(shift));
            } else {
               &gethostbyname;
            }
        }
        1;

    We've snuck in quite a fair bit of other concepts besides just dynamic class creation, like overriding core functions, import/export bits, function prototyping, short-cut function call via &whatever, and function replacement with goto &whatever. These all mostly make sense from the perspective of a traditional module, but as you can see, we can also use them in an object module.

    You can look at other object-based, struct-like overrides of core functions in the 5.004 release of Perl in File::stat, Net::hostent, Net::netent, Net::protoent, Net::servent, Time::gmtime, Time::localtime, User::grent, and User::pwent. These modules have a final component that's all lowercase, by convention reserved for compiler pragmas, because they affect the compilation and change a builtin function. They also have the type names that a C programmer would most expect.

    Data Members as Variables

    If you're used to C++ objects, then you're accustomed to being able to get at an object's data members as simple variables from within a method. The Alias module provides for this, as well as a good bit more, such as the possibility of private methods that the object can call but folks outside the class cannot.

    Here's an example of creating a Person using the Alias module. When you update these magical instance variables, you automatically update value fields in the hash. Convenient, eh?

        package Person;
        # this is the same as before...
        sub new {
             my $that  = shift;
             my $class = ref($that) || $that;
             my $self = {
                NAME  => undef,
                AGE   => undef,
                PEERS => [],
            };
            bless($self, $class);
            return $self;
        }
        use Alias qw(attr);
        our ($NAME, $AGE, $PEERS);
        sub name {
            my $self = attr shift;
            if (@_) { $NAME = shift; }
            return    $NAME;
        }
        sub age {
            my $self = attr shift;
            if (@_) { $AGE = shift; }
            return    $AGE;
        }
        sub peers {
            my $self = attr shift;
            if (@_) { @PEERS = @_; }
            return    @PEERS;
        }
        sub exclaim {
            my $self = attr shift;
            return sprintf "Hi, I'm %s, age %d, working with %s",
                $NAME, $AGE, join(", ", @PEERS);
        }
        sub happy_birthday {
            my $self = attr shift;
            return ++$AGE;
        }

    The need for the our declaration is because what Alias does is play with package globals with the same name as the fields. To use globals while use strict is in effect, you have to predeclare them. These package variables are localized to the block enclosing the attr() call just as if you'd used a local() on them. However, that means that they're still considered global variables with temporary values, just as with any other local().

    It would be nice to combine Alias with something like Class::Struct or Class::MethodMaker.

    NOTES

    Object Terminology

    In the various OO literature, it seems that a lot of different words are used to describe only a few different concepts. If you're not already an object programmer, then you don't need to worry about all these fancy words. But if you are, then you might like to know how to get at the same concepts in Perl.

    For example, it's common to call an object an instance of a class and to call those objects' methods instance methods. Data fields peculiar to each object are often called instance data or object attributes, and data fields common to all members of that class are class data, class attributes, or static data members.

    Also, base class, generic class, and superclass all describe the same notion, whereas derived class, specific class, and subclass describe the other related one.

    C++ programmers have static methods and virtual methods, but Perl only has class methods and object methods. Actually, Perl only has methods. Whether a method gets used as a class or object method is by usage only. You could accidentally call a class method (one expecting a string argument) on an object (one expecting a reference), or vice versa.

    From the C++ perspective, all methods in Perl are virtual. This, by the way, is why they are never checked for function prototypes in the argument list as regular builtin and user-defined functions can be.

    Because a class is itself something of an object, Perl's classes can be taken as describing both a ``class as meta-object'' (also called object factory) philosophy and the ``class as type definition'' (declaring behaviour, not defining mechanism) idea. C++ supports the latter notion, but not the former.


    SEE ALSO

    The following manpages will doubtless provide more background for this one: the perlmod manpage, the perlref manpage, the perlobj manpage, the perlbot manpage, the perltie manpage, and the overload manpage.


    AUTHOR AND COPYRIGHT

    Copyright (c) 1997, 1998 Tom Christiansen All rights reserved.

    When included as part of the Standard Version of Perl, or as part of its complete documentation whether printed or otherwise, this work may be distributed only under the terms of Perl's Artistic License. Any distribution of this file or derivatives thereof outside of that package require that special arrangements be made with copyright holder.

    Irrespective of its distribution, all code examples in this file are hereby placed into the public domain. You are permitted and encouraged to use this code in your own programs for fun or for profit as you see fit. A simple comment in the code giving credit would be courteous but is not required.


    COPYRIGHT

    Acknowledgments

    Thanks to Larry Wall, Roderick Schertler, Gurusamy Sarathy, Dean Roehrich, Raphael Manfredi, Brent Halsey, Greg Bacon, Brad Appleton, and many others for their helpful comments.

     perltoot - Tom's object-oriented tutorial for perl

    Talking about perltoot - Tom's object-oriented tutorial for perl (1)from http://perl.about.com/gi/dynamic/offsite.htm?zi=1/XJ/Ya&sdn=perl&cdn=compute&tm=20&gps=247_142_1020_541&f=22&tt=14&bt=0&bts=0&st=31&zu=http%3A//www.xav.com/perl/lib/Pod/perltoot.html

    NAME

    perltoot - Tom's object-oriented tutorial for perl


    DESCRIPTION

    Object-oriented programming is a big seller these days. Some managers would rather have objects than sliced bread. Why is that? What's so special about an object? Just what is an object anyway?

    An object is nothing but a way of tucking away complex behaviours into a neat little easy-to-use bundle. (This is what professors call abstraction.) Smart people who have nothing to do but sit around for weeks on end figuring out really hard problems make these nifty objects that even regular people can use. (This is what professors call software reuse.) Users (well, programmers) can play with this little bundle all they want, but they aren't to open it up and mess with the insides. Just like an expensive piece of hardware, the contract says that you void the warranty if you muck with the cover. So don't do that.

    The heart of objects is the class, a protected little private namespace full of data and functions. A class is a set of related routines that addresses some problem area. You can think of it as a user-defined type. The Perl package mechanism, also used for more traditional modules, is used for class modules as well. Objects ``live'' in a class, meaning that they belong to some package.

    More often than not, the class provides the user with little bundles. These bundles are objects. They know whose class they belong to, and how to behave. Users ask the class to do something, like ``give me an object.'' Or they can ask one of these objects to do something. Asking a class to do something for you is calling a class method. Asking an object to do something for you is calling an object method. Asking either a class (usually) or an object (sometimes) to give you back an object is calling a constructor, which is just a kind of method.

    That's all well and good, but how is an object different from any other Perl data type? Just what is an object really; that is, what's its fundamental type? The answer to the first question is easy. An object is different from any other data type in Perl in one and only one way: you may dereference it using not merely string or numeric subscripts as with simple arrays and hashes, but with named subroutine calls. In a word, with methods.

    The answer to the second question is that it's a reference, and not just any reference, mind you, but one whose referent has been bless()ed into a particular class (read: package). What kind of reference? Well, the answer to that one is a bit less concrete. That's because in Perl the designer of the class can employ any sort of reference they'd like as the underlying intrinsic data type. It could be a scalar, an array, or a hash reference. It could even be a code reference. But because of its inherent flexibility, an object is usually a hash reference.


    Creating a Class

    Before you create a class, you need to decide what to name it. That's because the class (package) name governs the name of the file used to house it, just as with regular modules. Then, that class (package) should provide one or more ways to generate objects. Finally, it should provide mechanisms to allow users of its objects to indirectly manipulate these objects from a distance.

    For example, let's make a simple Person class module. It gets stored in the file Person.pm. If it were called a Happy::Person class, it would be stored in the file Happy/Person.pm, and its package would become Happy::Person instead of just Person. (On a personal computer not running Unix or Plan 9, but something like MacOS or VMS, the directory separator may be different, but the principle is the same.) Do not assume any formal relationship between modules based on their directory names. This is merely a grouping convenience, and has no effect on inheritance, variable accessibility, or anything else.

    For this module we aren't going to use Exporter, because we're a well-behaved class module that doesn't export anything at all. In order to manufacture objects, a class needs to have a constructor method. A constructor gives you back not just a regular data type, but a brand-new object in that class. This magic is taken care of by the bless() function, whose sole purpose is to enable its referent to be used as an object. Remember: being an object really means nothing more than that methods may now be called against it.

    While a constructor may be named anything you'd like, most Perl programmers seem to like to call theirs new(). However, new() is not a reserved word, and a class is under no obligation to supply such. Some programmers have also been known to use a function with the same name as the class as the constructor.

    Object Representation

    By far the most common mechanism used in Perl to represent a Pascal record, a C struct, or a C++ class is an anonymous hash. That's because a hash has an arbitrary number of data fields, each conveniently accessed by an arbitrary name of your own devising.

    If you were just doing a simple struct-like emulation, you would likely go about it something like this:

        $rec = {
            name  => "Jason",
            age   => 23,
            peers => [ "Norbert", "Rhys", "Phineas"],
        };

    If you felt like it, you could add a bit of visual distinction by up-casing the hash keys:

        $rec = {
            NAME  => "Jason",
            AGE   => 23,
            PEERS => [ "Norbert", "Rhys", "Phineas"],
        };

    And so you could get at $rec->{NAME} to find ``Jason'', or @{ $rec->{PEERS} } to get at ``Norbert'', ``Rhys'', and ``Phineas''. (Have you ever noticed how many 23-year-old programmers seem to be named ``Jason'' these days? :-)

    This same model is often used for classes, although it is not considered the pinnacle of programming propriety for folks from outside the class to come waltzing into an object, brazenly accessing its data members directly. Generally speaking, an object should be considered an opaque cookie that you use object methods to access. Visually, methods look like you're dereffing a reference using a function name instead of brackets or braces.

    Class Interface

    Some languages provide a formal syntactic interface to a class's methods, but Perl does not. It relies on you to read the documentation of each class. If you try to call an undefined method on an object, Perl won't complain, but the program will trigger an exception while it's running. Likewise, if you call a method expecting a prime number as its argument with a non-prime one instead, you can't expect the compiler to catch this. (Well, you can expect it all you like, but it's not going to happen.)

    Let's suppose you have a well-educated user of your Person class, someone who has read the docs that explain the prescribed interface. Here's how they might use the Person class:

        use Person;
        $him = Person->new();
        $him->name("Jason");
        $him->age(23);
        $him->peers( "Norbert", "Rhys", "Phineas" );
        push @All_Recs, $him;  # save object in array for later
        printf "%s is %d years old.\n", $him->name, $him->age;
        print "His peers are: ", join(", ", $him->peers), "\n";
        printf "Last rec's name is %s\n", $All_Recs[-1]->name;

    As you can see, the user of the class doesn't know (or at least, has no business paying attention to the fact) that the object has one particular implementation or another. The interface to the class and its objects is exclusively via methods, and that's all the user of the class should ever play with.

    Constructors and Instance Methods

    Still, someone has to know what's in the object. And that someone is the class. It implements methods that the programmer uses to access the object. Here's how to implement the Person class using the standard hash-ref-as-an-object idiom. We'll make a class method called new() to act as the constructor, and three object methods called name(), age(), and peers() to get at per-object data hidden away in our anonymous hash.

        package Person;
        use strict;
        ##################################################
        ## the object constructor (simplistic version)  ##
        ##################################################
        sub new {
            my $self  = {};
            $self->{NAME}   = undef;
            $self->{AGE}    = undef;
            $self->{PEERS}  = [];
            bless($self);           # but see below
            return $self;
        }
        ##############################################
        ## methods to access per-object data        ##
        ##                                          ##
        ## With args, they set the value.  Without  ##
        ## any, they only retrieve it/them.         ##
        ##############################################
        sub name {
            my $self = shift;
            if (@_) { $self->{NAME} = shift }
            return $self->{NAME};
        }
        sub age {
            my $self = shift;
            if (@_) { $self->{AGE} = shift }
            return $self->{AGE};
        }
        sub peers {
            my $self = shift;
            if (@_) { @{ $self->{PEERS} } = @_ }
            return @{ $self->{PEERS} };
        }
        1;  # so the require or use succeeds

    We've created three methods to access an object's data, name(), age(), and peers(). These are all substantially similar. If called with an argument, they set the appropriate field; otherwise they return the value held by that field, meaning the value of that hash key.

    Planning for the Future: Better Constructors

    Even though at this point you may not even know what it means, someday you're going to worry about inheritance. (You can safely ignore this for now and worry about it later if you'd like.) To ensure that this all works out smoothly, you must use the double-argument form of bless(). The second argument is the class into which the referent will be blessed. By not assuming our own class as the default second argument and instead using the class passed into us, we make our constructor inheritable.

    While we're at it, let's make our constructor a bit more flexible. Rather than being uniquely a class method, we'll set it up so that it can be called as either a class method or an object method. That way you can say:

        $me  = Person->new();
        $him = $me->new();

    To do this, all we have to do is check whether what was passed in was a reference or not. If so, we were invoked as an object method, and we need to extract the package (class) using the ref() function. If not, we just use the string passed in as the package name for blessing our referent.

        sub new {
            my $proto = shift;
            my $class = ref($proto) || $proto;
            my $self  = {};
            $self->{NAME}   = undef;
            $self->{AGE}    = undef;
            $self->{PEERS}  = [];
            bless ($self, $class);
            return $self;
        }

    That's about all there is for constructors. These methods bring objects to life, returning neat little opaque bundles to the user to be used in subsequent method calls.

    Destructors

    Every story has a beginning and an end. The beginning of the object's story is its constructor, explicitly called when the object comes into existence. But the ending of its story is the destructor, a method implicitly called when an object leaves this life. Any per-object clean-up code is placed in the destructor, which must (in Perl) be called DESTROY.

    If constructors can have arbitrary names, then why not destructors? Because while a constructor is explicitly called, a destructor is not. Destruction happens automatically via Perl's garbage collection (GC) system, which is a quick but somewhat lazy reference-based GC system. To know what to call, Perl insists that the destructor be named DESTROY. Perl's notion of the right time to call a destructor is not well-defined currently, which is why your destructors should not rely on when they are called.

    Why is DESTROY in all caps? Perl on occasion uses purely uppercase function names as a convention to indicate that the function will be automatically called by Perl in some way. Others that are called implicitly include BEGIN, END, AUTOLOAD, plus all methods used by tied objects, described in the perltie manpage.

    In really good object-oriented programming languages, the user doesn't care when the destructor is called. It just happens when it's supposed to. In low-level languages without any GC at all, there's no way to depend on this happening at the right time, so the programmer must explicitly call the destructor to clean up memory and state, crossing their fingers that it's the right time to do so. Unlike C++, an object destructor is nearly never needed in Perl, and even when it is, explicit invocation is uncalled for. In the case of our Person class, we don't need a destructor because Perl takes care of simple matters like memory deallocation.

    The only situation where Perl's reference-based GC won't work is when there's a circularity in the data structure, such as:

        $this->{WHATEVER} = $this;

    In that case, you must delete the self-reference manually if you expect your program not to leak memory. While admittedly error-prone, this is the best we can do right now. Nonetheless, rest assured that when your program is finished, its objects' destructors are all duly called. So you are guaranteed that an object eventually gets properly destroyed, except in the unique case of a program that never exits. (If you're running Perl embedded in another application, this full GC pass happens a bit more frequently--whenever a thread shuts down.)

    Other Object Methods

    The methods we've talked about so far have either been constructors or else simple ``data methods'', interfaces to data stored in the object. These are a bit like an object's data members in the C++ world, except that strangers don't access them as data. Instead, they should only access the object's data indirectly via its methods. This is an important rule: in Perl, access to an object's data should only be made through methods.

    Perl doesn't impose restrictions on who gets to use which methods. The public-versus-private distinction is by convention, not syntax. (Well, unless you use the Alias module described below in Data Members as Variables.) Occasionally you'll see method names beginning or ending with an underscore or two. This marking is a convention indicating that the methods are private to that class alone and sometimes to its closest acquaintances, its immediate subclasses. But this distinction is not enforced by Perl itself. It's up to the programmer to behave.

    There's no reason to limit methods to those that simply access data. Methods can do anything at all. The key point is that they're invoked against an object or a class. Let's say we'd like object methods that do more than fetch or set one particular field.

        sub exclaim {
            my $self = shift;
            return sprintf "Hi, I'm %s, age %d, working with %s",
                $self->{NAME}, $self->{AGE}, join(", ", @{$self->{PEERS}});
        }

    Or maybe even one like this:

        sub happy_birthday {
            my $self = shift;
            return ++$self->{AGE};
        }

    Some might argue that one should go at these this way:

        sub exclaim {
            my $self = shift;
            return sprintf "Hi, I'm %s, age %d, working with %s",
                $self->name, $self->age, join(", ", $self->peers);
        }
        sub happy_birthday {
            my $self = shift;
            return $self->age( $self->age() + 1 );
        }

    But since these methods are all executing in the class itself, this may not be critical. There are tradeoffs to be made. Using direct hash access is faster (about an order of magnitude faster, in fact), and it's more convenient when you want to interpolate in strings. But using methods (the external interface) internally shields not just the users of your class but even you yourself from changes in your data representation.


    Class Data

    What about ``class data'', data items common to each object in a class? What would you want that for? Well, in your Person class, you might like to keep track of the total people alive. How do you implement that?

    You could make it a global variable called $Person::Census. But about only reason you'd do that would be if you wanted people to be able to get at your class data directly. They could just say $Person::Census and play around with it. Maybe this is ok in your design scheme. You might even conceivably want to make it an exported variable. To be exportable, a variable must be a (package) global. If this were a traditional module rather than an object-oriented one, you might do that.

    While this approach is expected in most traditional modules, it's generally considered rather poor form in most object modules. In an object module, you should set up a protective veil to separate interface from implementation. So provide a class method to access class data just as you provide object methods to access object data.

    So, you could still keep $Census as a package global and rely upon others to honor the contract of the module and therefore not play around with its implementation. You could even be supertricky and make $Census a tied object as described in the perltie manpage, thereby intercepting all accesses.

    But more often than not, you just want to make your class data a file-scoped lexical. To do so, simply put this at the top of the file:

        my $Census = 0;

    Even though the scope of a my() normally expires when the block in which it was declared is done (in this case the whole file being required or used), Perl's deep binding of lexical variables guarantees that the variable will not be deallocated, remaining accessible to functions declared within that scope. This doesn't work with global variables given temporary values via local(), though.

    Irrespective of whether you leave $Census a package global or make it instead a file-scoped lexical, you should make these changes to your Person::new() constructor:

        sub new {
            my $proto = shift;
            my $class = ref($proto) || $proto;
            my $self  = {};
            $Census++;
            $self->{NAME}   = undef;
            $self->{AGE}    = undef;
            $self->{PEERS}  = [];
            bless ($self, $class);
            return $self;
        }
        sub population {
            return $Census;
        }

    Now that we've done this, we certainly do need a destructor so that when Person is destroyed, the $Census goes down. Here's how this could be done:

        sub DESTROY { --$Census }

    Notice how there's no memory to deallocate in the destructor? That's something that Perl takes care of for you all by itself.

    Accessing Class Data

    It turns out that this is not really a good way to go about handling class data. A good scalable rule is that you must never reference class data directly from an object method. Otherwise you aren't building a scalable, inheritable class. The object must be the rendezvous point for all operations, especially from an object method. The globals (class data) would in some sense be in the ``wrong'' package in your derived classes. In Perl, methods execute in the context of the class they were defined in, not that of the object that triggered them. Therefore, namespace visibility of package globals in methods is unrelated to inheritance.

    Got that? Maybe not. Ok, let's say that some other class ``borrowed'' (well, inherited) the DESTROY method as it was defined above. When those objects are destroyed, the original $Census variable will be altered, not the one in the new class's package namespace. Perhaps this is what you want, but probably it isn't.

    Here's how to fix this. We'll store a reference to the data in the value accessed by the hash key ``_CENSUS''. Why the underscore? Well, mostly because an initial underscore already conveys strong feelings of magicalness to a C programmer. It's really just a mnemonic device to remind ourselves that this field is special and not to be used as a public data member in the same way that NAME, AGE, and PEERS are. (Because we've been developing this code under the strict pragma, prior to perl version 5.004 we'll have to quote the field name.)

        sub new {
            my $proto = shift;
            my $class = ref($proto) || $proto;
            my $self  = {};
            $self->{NAME}     = undef;
            $self->{AGE}      = undef;
            $self->{PEERS}    = [];
            # "private" data
            $self->{"_CENSUS"} = \$Census;
            bless ($self, $class);
            ++ ${ $self->{"_CENSUS"} };
            return $self;
        }
        sub population {
            my $self = shift;
            if (ref $self) {
                return ${ $self->{"_CENSUS"} };
            } else {
                return $Census;
            }
        }
        sub DESTROY {
            my $self = shift;
            -- ${ $self->{"_CENSUS"} };
        }

    Debugging Methods

    It's common for a class to have a debugging mechanism. For example, you might want to see when objects are created or destroyed. To do that, add a debugging variable as a file-scoped lexical. For this, we'll pull in the standard Carp module to emit our warnings and fatal messages. That way messages will come out with the caller's filename and line number instead of our own; if we wanted them to be from our own perspective, we'd just use die() and warn() directly instead of croak() and carp() respectively.

        use Carp;
        my $Debugging = 0;

    Now add a new class method to access the variable.

        sub debug {
            my $class = shift;
            if (ref $class)  { confess "Class method called as object method" }
            unless (@_ == 1) { confess "usage: CLASSNAME->debug(level)" }
            $Debugging = shift;
        }

    Now fix up DESTROY to murmur a bit as the moribund object expires:

        sub DESTROY {
            my $self = shift;
            if ($Debugging) { carp "Destroying $self " . $self->name }
            -- ${ $self->{"_CENSUS"} };
        }

    One could conceivably make a per-object debug state. That way you could call both of these:

        Person->debug(1);   # entire class
        $him->debug(1);     # just this object

    To do so, we need our debugging method to be a ``bimodal'' one, one that works on both classes and objects. Therefore, adjust the debug() and DESTROY methods as follows:

        sub debug {
            my $self = shift;
            confess "usage: thing->debug(level)"    unless @_ == 1;
            my $level = shift;
            if (ref($self))  {
                $self->{"_DEBUG"} = $level;         # just myself
            } else {
                $Debugging        = $level;         # whole class
            }
        }
        sub DESTROY {
            my $self = shift;
            if ($Debugging || $self->{"_DEBUG"}) {
                carp "Destroying $self " . $self->name;
            }
            -- ${ $self->{"_CENSUS"} };
        }

    What happens if a derived class (which we'll call Employee) inherits methods from this Person base class? Then Employee->debug(), when called as a class method, manipulates $Person::Debugging not $Employee::Debugging.

    Class Destructors

    The object destructor handles the death of each distinct object. But sometimes you want a bit of cleanup when the entire class is shut down, which currently only happens when the program exits. To make such a class destructor, create a function in that class's package named END. This works just like the END function in traditional modules, meaning that it gets called whenever your program exits unless it execs or dies of an uncaught signal. For example,

        sub END {
            if ($Debugging) {
                print "All persons are going away now.\n";
            }
        }

    When the program exits, all the class destructors (END functions) are be called in the opposite order that they were loaded in (LIFO order).

    Documenting the Interface

    And there you have it: we've just shown you the implementation of this Person class. Its interface would be its documentation. Usually this means putting it in pod (``plain old documentation'') format right there in the same file. In our Person example, we would place the following docs anywhere in the Person.pm file. Even though it looks mostly like code, it's not. It's embedded documentation such as would be used by the pod2man, pod2html, or pod2text programs. The Perl compiler ignores pods entirely, just as the translators ignore code. Here's an example of some pods describing the informal interface:

        =head1 NAME
        Person - class to implement people
        =head1 SYNOPSIS
         use Person;
         #################
         # class methods #
         #################
         $ob    = Person->new;
         $count = Person->population;
         #######################
         # object data methods #
         #######################
         ### get versions ###
             $who   = $ob->name;
             $years = $ob->age;
             @pals  = $ob->peers;
         ### set versions ###
             $ob->name("Jason");
             $ob->age(23);
             $ob->peers( "Norbert", "Rhys", "Phineas" );
         ########################
         # other object methods #
         ########################
         $phrase = $ob->exclaim;
         $ob->happy_birthday;
        =head1 DESCRIPTION
        The Person class implements dah dee dah dee dah....

    That's all there is to the matter of interface versus implementation. A programmer who opens up the module and plays around with all the private little shiny bits that were safely locked up behind the interface contract has voided the warranty, and you shouldn't worry about their fate.


    Aggregation

    Suppose you later want to change the class to implement better names. Perhaps you'd like to support both given names (called Christian names, irrespective of one's religion) and family names (called surnames), plus nicknames and titles. If users of your Person class have been properly accessing it through its documented interface, then you can easily change the underlying implementation. If they haven't, then they lose and it's their fault for breaking the contract and voiding their warranty.

    To do this, we'll make another class, this one called Fullname. What's the Fullname class look like? To answer that question, you have to first figure out how you want to use it. How about we use it this way:

        $him = Person->new();
        $him->fullname->title("St");
        $him->fullname->christian("Thomas");
        $him->fullname->surname("Aquinas");
        $him->fullname->nickname("Tommy");
        printf "His normal name is %s\n", $him->name;
        printf "But his real name is %s\n", $him->fullname->as_string;

    Ok. To do this, we'll change Person::new() so that it supports a full name field this way:

        sub new {
            my $proto = shift;
            my $class = ref($proto) || $proto;
            my $self  = {};
            $self->{FULLNAME} = Fullname->new();
            $self->{AGE}      = undef;
            $self->{PEERS}    = [];
            $self->{"_CENSUS"} = \$Census;
            bless ($self, $class);
            ++ ${ $self->{"_CENSUS"} };
            return $self;
        }
        sub fullname {
            my $self = shift;
            return $self->{FULLNAME};
        }

    Then to support old code, define Person::name() this way:

        sub name {
            my $self = shift;
            return $self->{FULLNAME}->nickname(@_)
              ||   $self->{FULLNAME}->christian(@_);
        }

    Here's the Fullname class. We'll use the same technique of using a hash reference to hold data fields, and methods by the appropriate name to access them:

        package Fullname;
        use strict;
        sub new {
            my $proto = shift;
            my $class = ref($proto) || $proto;
            my $self  = {
                TITLE       => undef,
                CHRISTIAN   => undef,
                SURNAME     => undef,
                NICK        => undef,
            };
            bless ($self, $class);
            return $self;
        }
        sub christian {
            my $self = shift;
            if (@_) { $self->{CHRISTIAN} = shift }
            return $self->{CHRISTIAN};
        }
        sub surname {
            my $self = shift;
            if (@_) { $self->{SURNAME} = shift }
            return $self->{SURNAME};
        }
        sub nickname {
            my $self = shift;
            if (@_) { $self->{NICK} = shift }
            return $self->{NICK};
        }
        sub title {
            my $self = shift;
            if (@_) { $self->{TITLE} = shift }
            return $self->{TITLE};
        }
        sub as_string {
            my $self = shift;
            my $name = join(" ", @$self{'CHRISTIAN', 'SURNAME'});
            if ($self->{TITLE}) {
                $name = $self->{TITLE} . " " . $name;
            }
            return $name;
        }
        1;

    Finally, here's the test program:

        #!/usr/bin/perl -w
        use strict;
        use Person;
        sub END { show_census() }
        sub show_census ()  {
            printf "Current population: %d\n", Person->population;
        }
        Person->debug(1);
        show_census();
        my $him = Person->new();
        $him->fullname->christian("Thomas");
        $him->fullname->surname("Aquinas");
        $him->fullname->nickname("Tommy");
        $him->fullname->title("St");
        $him->age(1);
        printf "%s is really %s.\n", $him->name, $him->fullname;
        printf "%s's age: %d.\n", $him->name, $him->age;
        $him->happy_birthday;
        printf "%s's age: %d.\n", $him->name, $him->age;
        show_census();


    Inheritance

    Object-oriented programming systems all support some notion of inheritance. Inheritance means allowing one class to piggy-back on top of another one so you don't have to write the same code again and again. It's about software reuse, and therefore related to Laziness, the principal virtue of a programmer. (The import/export mechanisms in traditional modules are also a form of code reuse, but a simpler one than the true inheritance that you find in object modules.)

    Sometimes the syntax of inheritance is built into the core of the language, and sometimes it's not. Perl has no special syntax for specifying the class (or classes) to inherit from. Instead, it's all strictly in the semantics. Each package can have a variable called @ISA, which governs (method) inheritance. If you try to call a method on an object or class, and that method is not found in that object's package, Perl then looks to @ISA for other packages to go looking through in search of the missing method.

    Like the special per-package variables recognized by Exporter (such as @EXPORT, @EXPORT_OK, @EXPORT_FAIL, %EXPORT_TAGS, and $VERSION), the @ISA array must be a package-scoped global and not a file-scoped lexical created via my(). Most classes have just one item in their @ISA array. In this case, we have what's called ``single inheritance'', or SI for short.

    Consider this class:

        package Employee;
        use Person;
        @ISA = ("Person");
        1;

    Not a lot to it, eh? All it's doing so far is loading in another class and stating that this one will inherit methods from that other class if need be. We have given it none of its own methods. We rely upon an Employee to behave just like a Person.

    Setting up an empty class like this is called the ``empty subclass test''; that is, making a derived class that does nothing but inherit from a base class. If the original base class has been designed properly, then the new derived class can be used as a drop-in replacement for the old one. This means you should be able to write a program like this:

        use Employee;
        my $empl = Employee->new();
        $empl->name("Jason");
        $empl->age(23);
        printf "%s is age %d.\n", $empl->name, $empl->age;

    By proper design, we mean always using the two-argument form of bless(), avoiding direct access of global data, and not exporting anything. If you look back at the Person::new() function we defined above, we were careful to do that. There's a bit of package data used in the constructor, but the reference to this is stored on the object itself and all other methods access package data via that reference, so we should be ok.

    What do we mean by the Person::new() function -- isn't that actually a method? Well, in principle, yes. A method is just a function that expects as its first argument a class name (package) or object (blessed reference). Person::new() is the function that both the Person->new() method and the Employee->new() method end up calling. Understand that while a method call looks a lot like a function call, they aren't really quite the same, and if you treat them as the same, you'll very soon be left with nothing but broken programs. First, the actual underlying calling conventions are different: method calls get an extra argument. Second, function calls don't do inheritance, but methods do.

            Method Call             Resulting Function Call
            -----------             ------------------------
            Person->new()           Person::new("Person")
            Employee->new()         Person::new("Employee")

    So don't use function calls when you mean to call a method.

    If an employee is just a Person, that's not all too very interesting. So let's add some other methods. We'll give our employee data fields to access their salary, their employee ID, and their start date.

    If you're getting a little tired of creating all these nearly identical methods just to get at the object's data, do not despair. Later, we'll describe several different convenience mechanisms for shortening this up. Meanwhile, here's the straight-forward way:

        sub salary {
            my $self = shift;
            if (@_) { $self->{SALARY} = shift }
            return $self->{SALARY};
        }
        sub id_number {
            my $self = shift;
            if (@_) { $self->{ID} = shift }
            return $self->{ID};
        }
        sub start_date {
            my $self = shift;
            if (@_) { $self->{START_DATE} = shift }
            return $self->{START_DATE};
        }

    Overridden Methods

    What happens when both a derived class and its base class have the same method defined? Well, then you get the derived class's version of that method. For example, let's say that we want the peers() method called on an employee to act a bit differently. Instead of just returning the list of peer names, let's return slightly different strings. So doing this:

        $empl->peers("Peter", "Paul", "Mary");
        printf "His peers are: %s\n", join(", ", $empl->peers);

    will produce:

        His peers are: PEON=PETER, PEON=PAUL, PEON=MARY

    To do this, merely add this definition into the Employee.pm file:

        sub peers {
            my $self = shift;
            if (@_) { @{ $self->{PEERS} } = @_ }
            return map { "PEON=\U$_" } @{ $self->{PEERS} };
        }

    There, we've just demonstrated the high-falutin' concept known in certain circles as polymorphism. We've taken on the form and behaviour of an existing object, and then we've altered it to suit our own purposes. This is a form of Laziness. (Getting polymorphed is also what happens when the wizard decides you'd look better as a frog.)

    Every now and then you'll want to have a method call trigger both its derived class (also known as ``subclass'') version as well as its base class (also known as ``superclass'') version. In practice, constructors and destructors are likely to want to do this, and it probably also makes sense in the debug() method we showed previously.

    To do this, add this to Employee.pm:

        use Carp;
        my $Debugging = 0;
        sub debug {
            my $self = shift;
            confess "usage: thing->debug(level)"    unless @_ == 1;
            my $level = shift;
            if (ref($self))  {
                $self->{"_DEBUG"} = $level;
            } else {
                $Debugging = $level;            # whole class
            }
            Person::debug($self, $Debugging);   # don't really do this
        }

    As you see, we turn around and call the Person package's debug() function. But this is far too fragile for good design. What if Person doesn't have a debug() function, but is inheriting its debug() method from elsewhere? It would have been slightly better to say

        Person->debug($Debugging);

    But even that's got too much hard-coded. It's somewhat better to say

        $self->Person::debug($Debugging);

    Which is a funny way to say to start looking for a debug() method up in Person. This strategy is more often seen on overridden object methods than on overridden class methods.

    There is still something a bit off here. We've hard-coded our superclass's name. This in particular is bad if you change which classes you inherit from, or add others. Fortunately, the pseudoclass SUPER comes to the rescue here.

        $self->SUPER::debug($Debugging);

    This way it starts looking in my class's @ISA. This only makes sense from within a method call, though. Don't try to access anything in SUPER:: from anywhere else, because it doesn't exist outside an overridden method call.

    Things are getting a bit complicated here. Have we done anything we shouldn't? As before, one way to test whether we're designing a decent class is via the empty subclass test. Since we already have an Employee class that we're trying to check, we'd better get a new empty subclass that can derive from Employee. Here's one:

        package Boss;
        use Employee;        # :-)
        @ISA = qw(Employee);

    And here's the test program:

        #!/usr/bin/perl -w
        use strict;
        use Boss;
        Boss->debug(1);
        my $boss = Boss->new();
        $boss->fullname->title("Don");
        $boss->fullname->surname("Pichon Alvarez");
        $boss->fullname->christian("Federico Jesus");
        $boss->fullname->nickname("Fred");
        $boss->age(47);
        $boss->peers("Frank", "Felipe", "Faust");
        printf "%s is age %d.\n", $boss->fullname, $boss->age;
        printf "His peers are: %s\n", join(", ", $boss->peers);

    Running it, we see that we're still ok. If you'd like to dump out your object in a nice format, somewhat like the way the 'x' command works in the debugger, you could use the Data::Dumper module from CPAN this way:

        use Data::Dumper;
        print "Here's the boss:\n";
        print Dumper($boss);

    Which shows us something like this:

        Here's the boss:
        $VAR1 = bless( {
             _CENSUS => \1,
             FULLNAME => bless( {
                                  TITLE => 'Don',
                                  SURNAME => 'Pichon Alvarez',
                                  NICK => 'Fred',
                                  CHRISTIAN => 'Federico Jesus'
                                }, 'Fullname' ),
             AGE => 47,
             PEERS => [
                        'Frank',
                        'Felipe',
                        'Faust'
                      ]
           }, 'Boss' );

    Hm.... something's missing there. What about the salary, start date, and ID fields? Well, we never set them to anything, even undef, so they don't show up in the hash's keys. The Employee class has no new() method of its own, and the new() method in Person doesn't know about Employees. (Nor should it: proper OO design dictates that a subclass be allowed to know about its immediate superclass, but never vice-versa.) So let's fix up Employee::new() this way:

        sub new {
            my $proto = shift;
            my $class = ref($proto) || $proto;
            my $self  = $class->SUPER::new();
            $self->{SALARY}        = undef;
            $self->{ID}            = undef;
            $self->{START_DATE}    = undef;
            bless ($self, $class);          # reconsecrate
            return $self;
        }

    Now if you dump out an Employee or Boss object, you'll find that new fields show up there now.

    Multiple Inheritance

    Ok, at the risk of confusing beginners and annoying OO gurus, it's time to confess that Perl's object system includes that controversial notion known as multiple inheritance, or MI for short. All this means is that rather than having just one parent class who in turn might itself have a parent class, etc., that you can directly inherit from two or more parents. It's true that some uses of MI can get you into trouble, although hopefully not quite so much trouble with Perl as with dubiously-OO languages like C++.

    The way it works is actually pretty simple: just put more than one package name in your @ISA array. When it comes time for Perl to go finding methods for your object, it looks at each of these packages in order. Well, kinda. It's actually a fully recursive, depth-first order. Consider a bunch of @ISA arrays like this:

        @First::ISA    = qw( Alpha );
        @Second::ISA   = qw( Beta );
        @Third::ISA    = qw( First Second );

    If you have an object of class Third:

        my $ob = Third->new();
        $ob->spin();

    How do we find a spin() method (or a new() method for that matter)? Because the search is depth-first, classes will be looked up in the following order: Third, First, Alpha, Second, and Beta.

    In practice, few class modules have been seen that actually make use of MI. One nearly always chooses simple containership of one class within another over MI. That's why our Person object contained a Fullname object. That doesn't mean it was one.

    However, there is one particular area where MI in Perl is rampant: borrowing another class's class methods. This is rather common, especially with some bundled ``objectless'' classes, like Exporter, DynaLoader, AutoLoader, and SelfLoader. These classes do not provide constructors; they exist only so you may inherit their class methods. (It's not entirely clear why inheritance was done here rather than traditional module importation.)

    For example, here is the POSIX module's @ISA:

        package POSIX;
        @ISA = qw(Exporter DynaLoader);

    The POSIX module isn't really an object module, but then, neither are Exporter or DynaLoader. They're just lending their classes' behaviours to POSIX.

    Why don't people use MI for object methods much? One reason is that it can have complicated side-effects. For one thing, your inheritance graph (no longer a tree) might converge back to the same base class. Although Perl guards against recursive inheritance, merely having parents who are related to each other via a common ancestor, incestuous though it sounds, is not forbidden. What if in our Third class shown above we wanted its new() method to also call both overridden constructors in its two parent classes? The SUPER notation would only find the first one. Also, what about if the Alpha and Beta classes both had a common ancestor, like Nought? If you kept climbing up the inheritance tree calling overridden methods, you'd end up calling Nought::new() twice, which might well be a bad idea.

    UNIVERSAL: The Root of All Objects

    Wouldn't it be convenient if all objects were rooted at some ultimate base class? That way you could give every object common methods without having to go and add it to each and every @ISA. Well, it turns out that you can. You don't see it, but Perl tacitly and irrevocably assumes that there's an extra element at the end of @ISA: the class UNIVERSAL. In version 5.003, there were no predefined methods there, but you could put whatever you felt like into it.

    However, as of version 5.004 (or some subversive releases, like 5.003_08), UNIVERSAL has some methods in it already. These are builtin to your Perl binary, so they don't take any extra time to load. Predefined methods include isa(), can(), and VERSION(). isa() tells you whether an object or class ``is'' another one without having to traverse the hierarchy yourself:

       $has_io = $fd->isa("IO::Handle");
       $itza_handle = IO::Socket->isa("IO::Handle");

    The can() method, called against that object or class, reports back whether its string argument is a callable method name in that class. In fact, it gives you back a function reference to that method:

       $his_print_method = $obj->can('as_string');

    Finally, the VERSION method checks whether the class (or the object's class) has a package global called $VERSION that's high enough, as in:

        Some_Module->VERSION(3.0);
        $his_vers = $ob->VERSION();

    However, we don't usually call VERSION ourselves. (Remember that an all uppercase function name is a Perl convention that indicates that the function will be automatically used by Perl in some way.) In this case, it happens when you say

        use Some_Module 3.0;

    If you wanted to add version checking to your Person class explained above, just add this to Person.pm:

        our $VERSION = '1.1';

    and then in Employee.pm could you can say

        use Employee 1.1;

    And it would make sure that you have at least that version number or higher available. This is not the same as loading in that exact version number. No mechanism currently exists for concurrent installation of multiple versions of a module. Lamentably.


    June 17

    Package - Perl

    Chapter 10. Packages

    In this chapter, we get to start having fun, because we get to start talking about software design. If we're going to talk about good software design, we have to talk about Laziness, Impatience, and Hubris, the basis of good software design.

    We've all fallen into the trap of using cut-and-paste when we should have defined a higher-level abstraction, if only just a loop or subroutine.[1] To be sure, some folks have gone to the opposite extreme of defining ever-growing mounds of higher-level abstractions when they should have used cut-and-paste.[2] Generally, though, most of us need to think about using more abstraction rather than less.

    [1] This is a form of False Laziness.

    [2]This is a form of False Hubris.

    Caught somewhere in the middle are the people who have a balanced view of how much abstraction is good, but who jump the gun on writing their own abstractions when they should be reusing existing code.[3]

    [3] You guessed it--this is False Impatience. But if you're determined to reinvent the wheel, at least try to invent a better one.

    Whenever you're tempted to do any of these things, you need to sit back and think about what will do the most good for you and your neighbor over the long haul. If you're going to pour your creative energies into a lump of code, why not make the world a better place while you're at it? (Even if you're only aiming for the program to succeed, you need to make sure it fits the right ecological niche.)

    The first step toward ecologically sustainable programming is simply this: don't litter in the park. When you write a chunk of code, think about giving the code its own namespace, so that your variables and functions don't clobber anyone else's, or vice versa. A namespace is a bit like your home, where you're allowed to be as messy as you like, as long as you keep your external interface to other citizens moderately civil. In Perl, a namespace is called a package. Packages provide the fundamental building block upon which the higher-level concepts of modules and classes are constructed.

    Like the notion of "home", the notion of "package" is a bit nebulous. Packages are independent of files. You can have many packages in a single file, or a single package that spans several files, just as your home could be one small garret in a larger building (if you're a starving artist), or it could comprise several buildings (if your name happens to be Queen Elizabeth). But the usual size of a home is one building, and the usual size of a package is one file. Perl provides some special help for people who want to put one package in one file, as long as you're willing to give the file the same name as the package and use an extension of .pm, which is short for "perl module". The module is the fundamental unit of reusability in Perl. Indeed, the way you use a module is with the use command, which is a compiler directive that controls the importation of subroutines and variables from a module. Every example of use you've seen until now has been an example of module reuse.

    The Comprehensive Perl Archive Network, or CPAN, is where you should put your modules if other people might find them useful. Perl has thrived because of the willingness of programmers to share the fruits of their labor with the community. Naturally, CPAN is also where you can find modules that others have thoughtfully uploaded for everyone to use. See Chapter 22, "CPAN", and www.cpan.org for details.

    The trend over the last 25 years or so has been to design computer languages that enforce a state of paranoia. You're expected to program every module as if it were in a state of siege. Certainly there are some feudal cultures where this is appropriate, but not all cultures are like this. In Perl culture, for instance, you're expected to stay out of someone's home because you weren't invited in, not because there are bars on the windows.[4]

    [4] But Perl provides some bars if you want them, too. See "Handling Insecure Code" in Chapter 23, "Security".

    This is not a book about object-oriented methodology, and we're not here to convert you into a raving object-oriented zealot, even if you want to be converted. There are already plenty of books out there for that. Perl's philosophy of object-oriented design fits right in with Perl's philosophy of everything else: use object-oriented design where it makes sense, and avoid it where it doesn't. Your call.

    In OO-speak, every object belongs to a grouping called a class. In Perl, classes and packages and modules are all so closely related that novices can often think of them as being interchangeable. The typical class is implemented by a module that defines a package with the same name as the class. We'll explain all of this in the next few chapters.

    When you use a module, you benefit from direct software reuse. With classes, you benefit from indirect software reuse when one class uses another through inheritance. And with classes, you get something more: a clean interface to another namespace. Everything in a class is accessed indirectly, insulating the class from the outside world.

    As we mentioned in Chapter 8, "References", object-oriented programming in Perl is accomplished through references whose referents know which class they belong to. In fact, now that you know about references, you know almost everything difficult about objects. The rest of it just "lays under the fingers", as a pianist would say. You will need to practice a little, though.

    One of your basic finger exercises consists of learning how to protect different chunks of code from inadvertently tampering with each other's variables. Every chunk of code belongs to a particular package, which determines what variables and subroutines are available to it. As Perl encounters a chunk of code, it is compiled into what we call the current package. The initial current package is called "main", but you can switch the current package to another one at any time with the package declaration. The current package determines which symbol table is used to find your variables, subroutines, I/O handles, and formats.

    Any variable not declared with my is associated with a package--even seemingly omnipresent variables like $_ and %SIG. In fact, there's really no such thing as a global variable in Perl, just package variables. (Special identifiers like _ and SIG merely seem global because they default to the main package instead of the current one.)

    The scope of a package declaration is from the declaration itself through the end of the enclosing scope (block, file, or eval--whichever comes first) or until another package declaration at the same level, which supersedes the earlier one. (This is a common practice).

    All subsequent identifiers (including those declared with our, but not including those declared with my or those qualified with a different package name) will be placed in the symbol table belonging to the current package. (Variables declared with my are independent of packages; they are always visible within, and only within, their enclosing scope, regardless of any package declarations.)

    Typically, a package declaration will be the first statement of a file meant to be included by require or use. But again, that's by convention. You can put a package declaration anywhere you can put a statement. You could even put it at the end of a block, in which case it would have no effect whatsoever. You can switch into a package in more than one place; a package declaration merely selects the symbol table to be used by the compiler for the rest of that block. (This is how a given package can span more than one file.)

    You can refer to identifiers[5] in other packages by prefixing ("qualifying") the identifier with the package name and a double colon: $Package::Variable. If the package name is null, the main package is assumed. That is, $::sail is equivalent to $main::sail.[6]

    [5] By identifiers, we mean the names used as symbol table keys for accessing scalar variables, array variables, hash variables, subroutines, file or directory handles, and formats. Syntactically speaking, labels are also identifiers, but they aren't put into a particular symbol table; rather, they are attached directly to the statements in your program. Labels cannot be package qualified.

    [6] To clear up another bit of potential confusion, in a variable name like $main::sail, we use the term "identifier" to talk about main and sail, but not main::sail. We call that a variable name instead, because identifiers cannot contain colons.

    The old package delimiter was a single quote, so in old Perl programs you'll see variables like $main'sail and $somepack'horse. But the double colon is now the preferred delimiter, in part because it's more readable to humans, and in part because it's more readable to emacs macros. It also makes C++ programmers feel like they know what's going on--as opposed to using the single quote as the separator, which was there to make Ada programmers feel like they knew what's going on. Because the old-fashioned syntax is still supported for backward compatibility, if you try to use a string like "This is $owner's house", you'll be accessing $owner::s; that is, the $s variable in package owner, which is probably not what you meant. Use braces to disambiguate, as in "This is ${owner}'s house".

    The double colon can be used to chain together identifiers in a package name: $Red::Blue::var. This means the $var belonging to the Red::Blue package. The Red::Blue package has nothing to do with any Red or Blue packages that might happen to exist. That is, a relationship between Red::Blue and Red or Blue may have meaning to the person writing or using the program, but it means nothing to Perl. (Well, other than the fact that, in the current implementation, the symbol table Red::Blue happens to be stored in the symbol table Red. But the Perl language makes no use of that directly.)

    For this reason, every package declaration must declare a complete package name. No package name ever assumes any kind of implied "prefix", even if (seemingly) declared within the scope of some other package declaration.

    Only identifiers (names starting with letters or an underscore) are stored in a package's symbol table. All other symbols are kept in the main package, including all the nonalphabetic variables, like $!, $?, and $_. In addition, when unqualified, the identifiers STDIN, STDOUT, STDERR, ARGV, ARGVOUT, ENV, INC, and SIG are forced to be in package main, even when used for other purposes than their built-in ones. Don't name your package m, s, y, tr, q, qq, qr, qw, or qx unless you're looking for a lot of trouble. For instance, you won't be able to use the qualified form of an identifier as a filehandle because it will be interpreted instead as a pattern match, a substitution, or a transliteration.

    Long ago, variables beginning with an underscore were forced into the main package, but we decided it was more useful for package writers to be able to use a leading underscore to indicate semi-private identifiers meant for internal use by that package only. (Truly private variables can be declared as file-scoped lexicals, but that works best when the package and module have a one-to-one relationship, which is common but not required.)

    The %SIG hash (which is for trapping signals; see Chapter 16, "Interprocess Communication") is also special. If you define a signal handler as a string, it's assumed to refer to a subroutine in the main package unless another package name is explicitly used. Use a fully qualified signal handler name if you want to specify a particular package, or avoid strings entirely by assigning a typeglob or a function reference instead:

    $SIG{QUIT} = "Pkg::quit_catcher"; # fully qualified handler name
    $SIG{QUIT} = "quit_catcher";      # implies "main::quit_catcher"
    $SIG{QUIT} = *quit_catcher;       # forces current package's sub
    $SIG{QUIT} = \&quit_catcher;      # forces current package's sub
    $SIG{QUIT} = sub { print "Caught SIGQUIT\n" };   # anonymous sub
    The notion of "current package" is both a compile-time and run-time concept. Most variable name lookups happen at compile time, but run-time lookups happen when symbolic references are dereferenced, and also when new bits of code are parsed under eval. In particular, when you eval a string, Perl knows which package the eval was invoked in and propagates that package inward when evaluating the string. (You can always switch to a different package inside the eval string, of course, since an eval string counts as a block, just like a file loaded in with do, require, or use.)

    Alternatively, if an eval wants to find out what package it's in, the special symbol __PACKAGE__ contains the current package name. Since you can treat it as a string, you could use it in a symbolic reference to access a package variable. But if you were doing that, chances are you should have declared the variable with our instead so it could be accessed as if it were a lexical.

    10.1. Symbol Tables

    The contents of a package are collectively called a symbol table. Symbol tables are stored in a hash whose name is the same as the package, but with two colons appended. The main symbol table's name is thus %main::. Since main also happens to be the default package, Perl provides %:: as an abbreviation for %main::.

    Likewise, the symbol table for the Red::Blue package is named %Red::Blue::. As it happens, the main symbol table contains all other top-level symbol tables, including itself, so %Red::Blue:: is also %main::Red::Blue::.

    When we say that a symbol table "contains" another symbol table, we mean that it contains a reference to the other symbol table. Since main is the top-level package, it contains a reference to itself, with the result that %main:: is the same as %main::main::, and %main::main::main::, and so on, ad infinitum. It's important to check for this special case if you write code that traverses all symbol tables.

    Inside a symbol table's hash, each key/value pair matches a variable name to its value. The keys are the symbol identifiers, and the values are the corresponding typeglobs. So when you use the *NAME typeglob notation, you're really just accessing a value in the hash that holds the current package's symbol table. In fact, the following have (nearly) the same effect:

    *sym = *main::variable;
    *sym = $main::{"variable"};
    The first is more efficient because the main symbol table is accessed at compile time. It will also create a new typeglob by that name if none previously exists, whereas the second form will not.

    Since a package is a hash, you can look up the keys of the package and get to all the variables of the package. Since the values of the hash are typeglobs, you can dereference them in several ways. Try this:

    foreach $symname (sort keys %main::) {
        local *sym = $main::{$symname};
        print "\$$symname is defined\n" if defined $sym;
        print "\@$symname is nonnull\n" if         @sym;
        print "\%$symname is nonnull\n" if         %sym;
    }
    Since all packages are accessible (directly or indirectly) through the main package, you can write Perl code to visit every package variable in your program. The Perl debugger does precisely that when you ask it to dump all your variables with the V command. Note that if you do this, you won't see variables declared with my since those are independent of packages, although you will see variables declared with our. See Chapter 20, "The Perl Debugger".

    Earlier we said that only identifiers are stored in packages other than main. That was a bit of a fib: you can use any string you want as the key in a symbol table hash--it's just that it wouldn't be valid Perl if you tried to use a non-identifier directly:

    $!@#$%           = 0;         # WRONG, syntax error.
    ${'!@#$%'}       = 1;         # Ok, though unqualified.
    
    ${'main::!@#$%'} = 2;         # Can qualify within the string.
    print ${ $main::{'!@#$%'} }   # Ok, prints 2!
    Assignment to a typeglob performs an aliasing operation; that is,
    *dick = *richard;
    causes variables, subroutines, formats, and file and directory handles accessible via the identifier richard to also be accessible via the symbol dick. If you want to alias only a particular variable or subroutine, assign a reference instead:
    *dick = \$richard;
    That makes $richard and $dick the same variable, but leaves @richard and @dick as separate arrays. Tricky, eh?

    This is how the Exporter works when importing symbols from one package to another. For example:

    *SomePack::dick = \&OtherPack::richard;
    imports the &richard function from package OtherPack into SomePack, making it available as the &dick function. (The Exporter module is described in the next chapter.) If you precede the assignment with a local, the aliasing will only last as long as the current dynamic scope.

    This mechanism may be used to retrieve a reference from a subroutine, making the referent available as the appropriate data type:

    *units = populate() ;         # Assign \%newhash to the typeglob
    print $units{kg};             # Prints 70; no dereferencing needed!
    
    sub populate {
        my %newhash = (km => 10, kg => 70);
        return \%newhash;
    }
    Likewise, you can pass a reference into a subroutine and use it without dereferencing:
    %units = (miles => 6, stones => 11);  
    fillerup( \%units );          # Pass in a reference
    print $units{quarts};         # Prints 4
    
    sub fillerup {
        local *hashsym = shift;   # Assign \%units to the typeglob
        $hashsym{quarts} = 4;     # Affects %units; no dereferencing needed!
    }
    These are tricky ways to pass around references cheaply when you don't want to have to explicitly dereference them. Note that both techniques only work with package variables; they would not have worked had we declared %units with my.

    Another use of symbol tables is for making "constant" scalars:

    *PI = \3.14159265358979;
    Now you cannot alter $PI, which is probably a good thing, all in all. This isn't the same as a constant subroutine, which is optimized at compile time. A constant subroutine is one prototyped to take no arguments and to return a constant expression; see Section 10.4.1, "Inlining Constant Functions" in Chapter 6, "Subroutines", for details. The use constant pragma (see Chapter 31, "Pragmatic Modules") is a convenient shorthand:
    use constant PI => 3.14159;
    Under the hood, this uses the subroutine slot of *PI, instead of the scalar slot used earlier. It's equivalent to the more compact (but less readable):
    *PI = sub () { 3.14159 };
    That's a handy idiom to know anyway--assigning a sub {} to a typeglob is the way to give a name to an anonymous subroutine at run time.

    Assigning a typeglob reference to another typeglob (*sym = \*oldvar) is the same as assigning the entire typeglob, because Perl automatically dereferences the typeglob reference for you. And when you set a typeglob to a simple string, you get the entire typeglob named by that string, because Perl looks up the string in the current symbol table. The following are all equivalent to one another, though the first two compute the symbol table entry at compile time, while the last two do so at run time:

    *sym =   *oldvar;
    *sym =  \*oldvar;       # autodereference
    *sym = *{"oldvar"};     # explicit symbol table lookup
    *sym =   "oldvar";      # implicit symbol table lookup
    When you perform any of the following assignments, you're replacing just one of the references within the typeglob:
    *sym = \$frodo;
    *sym = \@sam;
    *sym = \%merry;
    *sym = \&pippin;
    If you think about it sideways, the typeglob itself can be viewed as a kind of hash, with entries for the different variable types in it. In this case, the keys are fixed, since a typeglob can contain exactly one scalar, one array, one hash, and so on. But you can pull out the individual references, like this:
    *pkg::sym{SCALAR}      # same as \$pkg::sym
    *pkg::sym{ARRAY}       # same as \@pkg::sym
    *pkg::sym{HASH}        # same as \%pkg::sym
    *pkg::sym{CODE}        # same as \&pkg::sym
    *pkg::sym{GLOB}        # same as \*pkg::sym
    *pkg::sym{IO}          # internal file/dir handle, no direct equivalent
    *pkg::sym{NAME}        # "sym" (not a reference)
    *pkg::sym{PACKAGE}     # "pkg" (not a reference)
    You can say *foo{PACKAGE} and *foo{NAME} to find out what name and package the *foo symbol table entry comes from. This may be useful in a subroutine that is passed typeglobs as arguments:
    sub identify_typeglob {
        my $glob = shift;
        print 'You gave me ', *{$glob}{PACKAGE}, '::', *{$glob}{NAME}, "\n";
    }
    
    identify_typeglob(*foo);
    identify_typeglob(*bar::glarch);
    This prints:
    You gave me main::foo
    You gave me bar::glarch
    The *foo{THING} notation can be used to obtain references to individual elements of *foo. See the section Section 10.2.5, "Symbol Table References" in Chapter 8, "References" for details.

    This syntax is primarily used to get at the internal filehandle or directory handle reference, because the other internal references are already accessible in other ways. (The old *foo{FILEHANDLE} is still supported to mean *foo{IO}, but don't let its name fool you into thinking it can distinguish filehandles from directory handles.) But we thought we'd generalize it because it looks kind of pretty. Sort of. You probably don't need to remember all this unless you're planning to write another Perl debugger.

    Talking about PERL -- Packages

    Packages

    Perl provides a mechanism for alternate namespaces to protect packages from stomping on each others variables. By default, a perl script starts compiling into the package known as "main". By use of the package declaration, you can switch namespaces. The scope of the package declaration is from the declaration itself to the end of the enclosing block (the same scope as the local() operator). Typically it would be the first declaration in a file to be included by the "require" operator. You can switch into a package in more than one place; it merely influences which symbol table is used by the compiler for the rest of that block. You can refer to variables and filehandles in other packages by prefixing the identifier with the package name and a single quote. If the package name is null, the "main" package as assumed.

    Only identifiers starting with letters are stored in the packages symbol table. All other symbols are kept in package "main". In addition, the identifiers STDIN, STDOUT, STDERR, ARGV, ARGVOUT, ENV, INC and SIG are forced to be in package "main", even when used for other purposes than their built-in one. Note also that, if you have a package called "m", "s" or "y", the you can't use the qualified form of an identifier since it will be interpreted instead as a pattern match, a substitution or a translation.

    Eval'ed strings are compiled in the package in which the eval was compiled in. (Assignments to $SIG{}, however, assume the signal handler specified is in the main package. Qualify the signal handler name if you wish to have a signal handler in a package.) For an example, examine perldb.pl in the perl library. It initially switches to the DB package so that the debugger doesn't interfere with variables in the script you are trying to debug. At various points, however, it temporarily switches back to the main package to evaluate various expressions in the context of the main package.

    The symbol table for a package happens to be stored in the associative array of that name prepended with an underscore. The value in each entry of the associative array is what you are referring to when you use the *name notation. In fact, the following have the same effect (in package main, anyway), though the first is more efficient because it does the symbol table lookups at compile time:

    	local(*foo) = *bar;
    	local($_main{'foo'}) = $_main{'bar'};
    
    You can use this to print out all the variables in a package, for instance. Here is dumpvar.pl from the perl library:
    	package dumpvar;
    
    	sub main'dumpvar {
    	    ($package) = @_;
    	    local(*stab) = eval("*_$package");
    	    while (($key,$val) = each(%stab)) {
    	        {
    	            local(*entry) = $val;
    	            if (defined $entry) {
    	                print "\$$key = '$entry'\n";
    	            }
    	            if (defined @entry) {
    	                print "\@$key = (\n";
    	                foreach $num ($[ .. $#entry) {
    	                    print "  $num\t'",$entry[$num],"'\n";
    	                }
    	                print ")\n";
    	            }
    	            if ($key ne "_$package" && defined %entry) {
    	                print "\%$key = (\n";
    	                foreach $key (sort keys(%entry)) {
    	                    print "  $key\t'",$entry{$key},"'\n";
    	                }
    	                print ")\n";
    	            }
    	        }
    	    }
    	}
    
    Note that, even though the subroutine is compiled in package dumpvar, the name of the subroutine is qualified so that its name is inserted
    June 16

    Talking about VHDL & Verilog Compared & Contrasted

     


     

    VHDL & Verilog Compared & Contrasted
    Plus Modeled Example Written in
    VHDL, Verilog and C

    Douglas J. Smith
    VeriBest Incorporated
    e-mail: djsmith@veribest.com

    Abstract

    This tutorial is in two parts. The first part takes an unbiased view of VHDL and Verilog by comparing their similarities and contrasting their differences. The second part contains a worked example of a model that computes the Greatest Common Divisor (GCD) of two numbers. The GCD is modeled at the algorithmic level in VHDL, Verilog and for comparison purposes, C. It is then shown modeled at the RTL in VHDL and Verilog.

    1. Introduction

    There are now two industry standard hardware description languages, VHDL and Verilog. The complexity of ASIC and FPGA designs has meant an increase in the number of specialist design consultants with specific tools and with their own libraries of macro and mega cells written in either VHDL or Verilog. As a result, it is important that designers know both VHDL and Verilog and that EDA tools vendors provide tools that provide an environment allowing both languages to be used in unison. For example, a designer might have a model of a PCI bus interface written in VHDL, but wants to use it in a design with macros written in Verilog.

    2. Background

    VHDL (Very high speed integrated circuit Hardware Description Language) became IEEE standard 1076 in 1987. It was updated in 1993 and is known today as "IEEE standard 1076 1993". The Verilog hardware description language has been used far longer than VHDL and has been used extensively since it was launched by Gateway in 1983. Cadence bought Gateway in 1989 and opened Verilog to the public domain in 1990. It became IEEE standard 1364 in December 1995.

    There are two aspects to modeling hardware that any hardware description language facilitates; true abstract behavior and hardware structure. This means modeled hardware behavior is not prejudiced by structural or design aspects of hardware intent and that hardware structure is capable of being modeled irrespective of the design's behavior.

    3. VHDL/Verilog compared & contrasted

    This section compares and contrasts individual aspects of the two languages; they are listed in alphabetical order.

    Capability

    Hardware structure can be modeled equally effectively in both VHDL and Verilog. When modeling abstract hardware, the capability of VHDL can sometimes only be achieved in Verilog when using the PLI. The choice of which to use is not therefore based solely on technical capability but on:

    • personal preferences
    • EDA tool availability
    • commercial, business and marketing issues

    The modeling constructs of VHDL and Verilog cover a slightly different spectrum across the levels of behavioral abstraction; see Figure 1.

    HDL modeling capability

    Figure 1. HDL modeling capability

    Compilation

    VHDL. Multiple design-units (entity/architecture pairs), that reside in the same system file, may be separately compiled if so desired. However, it is good design practice to keep each design unit in it's own system file in which case separate compilation should not be an issue.

    Verilog. The Verilog language is still rooted in it's native interpretative mode. Compilation is a means of speeding up simulation, but has not changed the original nature of the language. As a result care must be taken with both the compilation order of code written in a single file and the compilation order of multiple files. Simulation results can change by simply changing the order of compilation.

    Data types

    VHDL. A multitude of language or user defined data types can be used. This may m ean dedicated conversion functions are needed to convert objects from one type to another. The choice of which data types to use should be considered wisely, especially enumerated (abstract) data types. This will make models easier to write, clearer to read and avoid unnecessary conversion functions that can clutter the code. VHDL may be preferred because it allows a multitude of language or user defined data types to be used.

    Verilog. Compared to VHDL, Verilog data types a re very simple, easy to use and very much geared towards modeling hardware structure as opposed to abstract hardware modeling. Unlike VHDL, all data types used in a Verilog model are defined by the Verilog language and not by the user. There are net data types, for example wire, and a register data type called reg. A model with a signal whose type is one of the net data types has a corresponding electrical wire in the implied modeled circuit. Objects, that is signals, of type reg hold their value over simulation delta cycles and should not be confused with the modeling of a hardware register. Verilog may be preferred because of it's simplicity.

    Design reusability

    VHDL. Procedures and functions may be placed in a package so that they are avail able to any design-unit that wishes to use them.

    Verilog. There is no concept of packages in Verilog. Functions and procedures used within a model must be defined in the module. To make functions and procedures generally accessible from different module statements the functions and procedures must be placed in a separate system file and included using the `include compiler directive.

    Easiest to Learn

    Starting with zero knowledge of either language, Verilog is probably the easiest to grasp and understand. This assumes the Verilog compiler directive language for simulation and the PLI language is not included. If these languages are included they can be looked upon as two additional languages that need to be learned. VHDL may seem less intuitive at first for two primary reasons. First, it is very strongly typed; a feature that makes it robust and powerful for the advanced user after a longer learning phase. Second, there are many ways to model the same circuit, specially those with large hierarchical structures.

    Forward and back annotation

    A spin-off from Verilog is the Standard Delay Format (SDF). This is a general purpose format used to define the timing delays in a circuit. The format provides a bidirectional link between, chip layout tools, and either synthesis or simulation tools, in order to provide more accurate timing representations. The SDF format is now an industry standard in it's own right.

    High level constructs

    VHDL. There are more constructs and features for high-level modeling in VHDL than there are in Verilog. Abstract data types can be used along with the following statements:

    * package statements for model reuse,

    * configuration statements for configuring design structure,

    * generate statements for replicating structure,

    * generic statements for generic models that can be individually characterized, for example, bit width.

    All these language statements are useful in synthesizable models.

    Verilog. Except for being able to parameterize models by overloading parameter constants, there is no equivalent to the high-level VHDL modeling statements in Verilog.

    Language Extensions

    The use of language extensions will make a model non standard and most likely not portable across other design tools. However, sometimes they are necessary in order to achieve the desired results.

    VHDL. Has an attribute called 'foreign that allows architectures and subprograms to be modeled in another language.

    Verilog. The Programming Language Interface (PLI) is an interface mechanism between Verilog models and Verilog software tools. For example, a designer, or more likely, a Verilog tool vendor, can specify user defined tasks or functions in the C programming language, and then call them from the Verilog source description. Use of such tasks or functions make a Verilog model nonstandard and so may not be usable by other Verilog tools. Their use is not recommended.

    Libraries

    VHDL. A library is a store for compiled entities, architectures, packages and configurations. Useful for managing multiple design projects.

    Verilog. There is no concept of a library in Verilog. This is due to it's origins as an interpretive language.

    Low Level Constructs

    VHDL. Simple two input logical operators are built into the language, they are: NOT, AND, OR, NAND, NOR, XOR and XNOR. Any timing must be separately specified using the after clause. Separate constructs defined under the VITAL language must be used to define the cell primitives of ASIC and FPGA libraries.

    Verilog. The Verilog language was originally developed with gate level modeling in mind, and so has very good constructs for modeling at this level and for modeling the cell primitives of ASIC and FPGA libraries. Examples include User Defined Primitive s (UDP), truth tables and the specify block for specifying timing delays across a module.

    Managing large designs

    VHDL. Configuration, generate, generic and package statements all help manage large design structures.

    Verilog. There are no statements in Verilog that help manage large designs.

    Operators

    The majority of operators are the same between the two languages. Verilog does have very useful unary reduction operators that are not in VHDL. A loop statement can be used in VHDL to perform the same operation as a Verilog unary reduction operator. VHDL has the mod operator that is not found in Verilog.

    Parameterizable models

    VHDL. A specific bit width model can be instantiated from a generic n-bit model using the generic statement. The generic model will not synthesize until it is instantiated and the value of the generic given.

    Verilog. A specific width model can be instantiated from a generic n-bit model using overloaded parameter values. The generic model must have a default parameter value defined. This means two things. In the absence of an overloaded value being specified, it will still synthesize, but will use the specified default parameter value. Also, it does not need to be instantiated with an overloaded parameter value specified, before it will synthesize.

    Procedures and tasks

    VHDL allows concurrent procedure calls; Verilog does not allow concurrent task calls.

    Readability

    This is more a matter of coding style and experience than language feature. VHDL is a concise and verbose language; its roots are based on Ada. Verilog is more like C because it's constructs are based approximately 50% on C and 50% on Ada. For this reason an existing C programmer may prefer Verilog over VHDL. Although an existing programmer of both C and Ada may find the mix of constructs somewhat confusing at first. Whatever HDL is used, when writing or reading an HDL model to be synthesized it is important to think about hardware intent.

    Structural replication

    VHDL. The generate statement replicates a number of instances of the same design-unit or some sub part of a design, and connects it appropriately.

    Verilog. There is no equivalent to the generate statement in Verilog.

    Test harnesses

    Designers typically spend about 50% of their time writing synthesizable models and the other 50% writing a test harness to verify the synthesizable models. Test harnesses are not restricted to the synthesizable subset and so are free to use the full potential of the language. VHDL has generic and configuration statements that are useful in test harnesses, that are not found in Verilog.

    Verboseness

    VHDL. Because VHDL is a very strongly typed language models must be coded precisely with defined and matching data types. This may be considered an advantage or disadvantage. However, it does mean models are often more verbose, and the code often longer, than it's Verilog equivalent.

    Verilog. Signals representing objects of different bits widths may be assigned to each other. The signal representing the smaller number of bits is automatically padded out to that of the larger number of bits, and is independent of whether it is the assigned signal or not. Unused bits will be automatically optimized away during the synthesis process. This has the advantage of not needing to model quite so explicitly as in VHDL, but does mean unintended modeling errors will not be identified by an analyzer.

    4. Greatest Common Divisor

    Models of a greatest common divisor circuit is posed as problem and solution exercise. A model written in C is included in addition to VHDL and Verilog for comparison purposes.

    4.1 Problem

    The problem consists of three parts:

    a) Design three algorithmic level models of an algorithm that finds the Greatest Common Divisor (GCD) of two numbers in the software programming language, C, and the two hardware description languages, VHDL and Verilog. Use common test data files to test the algorithm where practically possible. Neither the VHDL nor Verilog models need contain timing. All three models should automatically indicate a pass or fail condition.

    b) Model the GCD algorithm at the RTL level for synthesis in both VHDL and Verilog. The model must be generic so that it can be instantiated with different bit widths. A Load signal should indicate when input data is valid, and a signal cal led Done, should be provided to signify when valid output data is available. The generic model should be verified with 8-bit bus signals.

    c) Write VHDL and Verilog test harnesses for the two models that: 1) use the same test data files used by the algorithmic level models, and 2), instantiates both the RTL and synthesized gate level models so that they are simulated and tested at the same time.

    4.2 Solution

    The solution is broken into three parts corresponding to those of the problem. The solution parts use the following combined test and reference data files.

    file: gcd_test_data.txt                   file: gcd_test_data_hex.txt
      21    49    7           15    31    7      // Decimal     21    49    7
      25    30    5           19    1E    5      // Decimal     25    30    5
      19    27    1           13    1B    1      // Decimal     19    27    1
      40    40   40           28    28   28      // Decimal     40    40   40
     250   190   10           FA    6E    A      // Decimal    250   190   10
       5   250    5            5    FA    5      // Decimal      5   250    5

    4.2.1 Designing algorithmic level models in C, VHDL and Verilog

    The algorithm used to find the greatest common divisor between two numbers is shown in Figure 2.

    GCD Algorithm

    Figure 2. GCD Algorithm

    It works by continually subtracting the smaller of the two numbers, A or B, from the largest until such point the smallest number becomes equal to zero. It does this by continually subtracting B from A while A is greater than B, and then s wapping A and B around when A becomes less than B so that the new value of B can once again be continually subtracted from A. This process continues until B becomes zero.

    C algorithmic model

    The C model first declares integer values for the two inputs A and B, the computed output of the algorithm Y, and the reference output Y_Ref. Integer Y_Ref is the expected GCD result and used to compare with the computed result from the algorithm. The integer Swap is also declared and used in the algorithm to swap the two inputs A and B. A final integer, Passed, is used to indicate a pass (1) or fail (0) condition.

    A file pointer (file_pointer) is defined in order to access the test data file "gcd_test_data.txt". It is opened for read mode only. Integer Passed is initially set to 1 and only set to 0 if the algorithm fails.

    Reading test data file. The test data file contains three numbers on each line corresponding to values of A, B and Y_Ref respectively. A while loop is used to: 1) read each line of the test data file, 2) assign the three values to A, B and Y_Ref respectively, 3) use A and B to compute the GCD output Y, and 4) compare Y with Y_Ref. This while loop continues while there is test data in the test data file.

    Algorithm implementation. The initial if statement is an extra check that both A and B are not zero. The algorithm is then modeled using two while statements. The first, outer-most, while statement checks to see if B has reached zero; if it has the GCD has been found. The second, inner-most, while statement checks to see if A is greater than or equal to B; if it is, it continually subtracts A from B and puts the result back in A. When A becomes less than B the inner most while loop completes, A and B are swapped using Swap, and the outer most while statement rechecks B to see if it has reached zero.

    Testing the result. The algorithm is tested using an if statement which tests to see if the computed result Y is the same as the expected result Y_Ref. If they are different an error message is printed to the screen and Passed assigned the value 0. Finally, when all tests have completed and Passed is still equal to 1 a passed message is printed to the screen.

    C algorithmic level model
    #include <stdio.h>
    main ()
       {
       int A_in, B_in, A, B, Swap, Y, Y_Ref, Passed;
       FILE *file_pointer;
       file_pointer = fopen("gcd_test_data.txt", "r");
       Passed = 1;
       while (!feof(file_pointer))
            {
             /*------------------------------------*/
             /* Read test data from file           */
             /*------------------------------------*/
             fscanf (file_pointer, "%d %d %d\n", &A_in, &B_in, &Y_Ref);
             /*----------------------------------*/
             /* Model GCD algorithm              */
             /*----------------------------------*/
             A = A_in;
             B = B_in;
             if (A != 0 && B != 0)
               {
                while (B != 0)
                  {
                   while (A >= B)
                      {
                       A = A - B;
                       }
                   Swap = A;
                   A = B;
                   B = Swap;
                   }
               }
             else
               {
                A = 0;
               }
             Y = A;
             /*------------------------------*/
             /* Test GCD algorithm           */
             /*------------------------------*/
             if (Y != Y_Ref)
               {
                printf ("Error. A=%d B=%d Y=%d Y_Ref= %d\n", A_in, B_in, Y, Y_Ref);
                Passed = 0;
               }
             }
           if (Passed = 1) printf ("GCD algorithm test passed ok\n");
        }

    VHDL algorithmic level model

    The VHDL model follows exactly the same principle as defined for the C model. When reading the integer values from the test date file they must be read and assigned to a variable; they cannot be read and assigned to a signal. As this is an algorithmic level model defined in a single entity it contains no input or outputs, nor does it contain any internal signals or associated timing. All computations use variables; variables are read from the test data file, the algorithm computes the result and variables are written to a results file.
    VHDL algorithmic level model
    library STD;
    use STD.TEXTIO.all;
    entity GCD_ALG is
    end entity GCD_ALG;
    architecture ALGORITHM of GCD_ALG is
    --------------------------------------------
    -- Declare test data file and results file
    --------------------------------------------
    file TestDataFile: text open
        read_mode is "gcd_ test_data.txt";
    file ResultsFile: text open write_mode is
        "gcd_alg _test_results.txt";
    begin
       GCD: process
          variable A_in, B_in, A, B, Swap, Y, Y_Ref: integer range 0 to 65535;
          variable TestData: line;
          variable BufLine: line;
          variable Passed: bit := '1';
       begin
          while not endfile(TestDataFile) loop
          -------------------------------------
          -- Read test data from file
          -------------------------------------
          readline(TestDataFile, TestData);
          read(TestData, A_in);
          read(TestData, B_in);
          read(TestData, Y_Ref);
          ------------------------------------
          -- Model GCD algorithm
          ------------------------------------
          A := A_in;
          B := B_in;
          if (A /= 0 and B /= 0) then
             while (B /= 0) loop
                while (A >= B) loop
                    A := A - B;
                end loop;
                Swap:= A;
                A := B;
               B := Swap;
             end loop;
          else
             A := 0;
          end if;
          Y := A;
          ---------------------------------
          -- Test GCD algorithm
          ---------------------------------
          if (Y /= Y_Ref) then -- has failed
             Passed := '0';
             write(Bufline, st ring'("GCD Error: A="));
             write(Bufline, A_in);
             write(Bufline, string'(" B="));
             write(Bufline, B_in);
             write(Bufline, string'(" Y="));
             write(Bufline, Y);
             write(Bufline, string'(" Y_Ref="));
             write(Bufline, Y_Ref);
             writeline(ResultsFile, Bufline);
          end if;
        end loop;
      if (Passed = '1') then -- has passed
         write(Bufline, string' ("GCD algorithm test has passed"));
         writeline(ResultsFile, Bufline);
      end if;
     end process;
    end architecture ALGORITHM;

    Verilog algorithmic level model

    The Verilog model also follows the same principle as defined above for the C model. A major difference in this model is that Verilog cannot read decimal integer values from a system file. Data read from a system file must be:

    1) read using one of the two language define system tasks, $readmemh or $readmemb and

    2) stored in a memory, which has specific width and depth. This limits any read data to being in either hexadecimal or binary format. In this case a separate test d ata file is used "gcd_test_data_hex.txt" which has the test data specified in hexadecimal format.

    Verilog algorithmic level model
    module GCD_ALG;
    parameter Width = 8;
    reg [Width-1:0] A_in, B_in, A, B, Y, Y_Ref;
    reg [Width-1:0] A_reg,B_reg,Swap;
    parameter GCD_tests = 6;
    integer N, M;
    reg Passed, FailTime;
    integer SimResults;
    // Declare memory array for test data
    // ----------------------------------
    reg [Width-1:1] AB_Y_Ref_Arr[1:GCD_tests*3];
    //----------------------------------
    // Model GCD algorithm
    //----------------------------------
    always @(A or B)
       begin: GCD
         A = A_in;
         B = B_in;
         if (A != 0 && B != 0)
           while (B != 0)
             while (A >= B) begin
                A = A - B;
                Swap = A;
                A = B;
                B = Swap;
             end
         else
           A = 0;
           Y = A;
       end
    //------------------------------
    // Test GCD algorithm
    //-----------------------------
    initial begin
    // Load contents of
    // "gcd_test_data.txt" into array.
    $readmemh("gcd_test_data_hex.txt", AB_Y_Ref_Arr);
    // Open simulation results file
    SimResults = $fopen("gcd.simres");
    Passed = 1;
    / Set to 0 if fails
    for (N=1; N<=GCD_tests; N=N+1) begin
       A_in = AB_Y_Ref_Arr[(N*3)+1];
       B_in = AB_Y_Ref_Arr[(N*3)+2];
       Y_Ref=AB_Y_Ref_Arr[(N*3)+3];
       #TestPeriod
       if (Y != Y_Ref) begin      // has failed
           Passed = 0;
           $fdisplay (SimResults, " GCD Error:
              A=%d B=%d Y=%d. Y should be %d", A_in, B_in, Y, Y_Ref);
       end
    end
    if (Passed == 1) // has passed
        $fdisplay (SimResults, "GCD algorithm test ha s passed");
        $fclose (SimResults);
        $finish;
    end
    endmodule

    4.2.2 Designing RTL hardware models in VHDL and Verilog

    The models have additional inputs and outputs over and above that of the algorithmic models. They are inputs Clock, Reset_N and Load, and the output Done. When Load is at logic 1 it signifies input data is available on inputs A and B, and are loaded into separate registers whose output signals are called A_hold and B_ hold. The extra output signal, Done, switches to a logic 1 to signify the greate st common divisor has been computed. It takes a number of clock cycles to comput e the GCD and is dependent upon the values of A and B.

    The models are broken down into three process (VHDL)/always (Verilog) statements.

    First process/always statement LOAD_SWAP. Infers two registers which operate as follows:

    1) When Reset_N is at a logic 0, A_hold and B_hold are set to zero.

    2) When not 1) and Load is at logic 1, data on A and B is loaded into A_hold and B_hold.

    3) When not 1) or 2) and A_hold is less than B_hold, values on A_hold and B_hold are swapped, that is, A_hold and B_hold are loaded into B_hold and A_hold respectively.

    4) When not 1), 2) or 3), A_hold is reloaded, that is, it keeps the same value. The value of A_hold - B_hold, from the second process/always statement, is loaded into B_hold.

    Second process/always statement SUBTRACT_TEST. The first if statement tests to see if A_hold is greater than or equal to B_hold. If it is, the subtraction, A_hold - B_hold, occurs and the result assigned to A_New ready to be loaded into B_hold on the next rising edge of the clock signal. If A_hold is less than B_hold, then subtraction cannot occur and A_New is assigned the value B_hold so that a swap occurs after the next rising edge of the clock signal. The second if statement checks to see if the value of B_hold has reached zero. If it has, signal Done is set to logic 1 and the value of A_ hold is passed to the output Y through an inferred multiplexer function.

    It is a requirement of the problem to synthesize the generic model with 8-bit bus signals. This is easily achieved in Verilog model by setting the default parameter value Width to 8. This means it does not need to be separately instantiat ed before it can be synthesized and have the correct bit width. This is not the case in VHDL, which uses a generic. The value of the generic is only specified when the model is instantiated. Although the VHDL model will be instantiated in the test harness, the test harness is not synthesized. Therefore, in order to synthesize an 8-bit GCD circuit a separate synthesizable model must be used to instantiate the RTL level model which specifies the generic, Width, to be 8. The simulation test harness does not need to use this extra model as it too, will specify the generic, Width, to be 8.

    VHDL RTL model
    library IEEE;
    use IEEE.STD_Logic_1164.all, IEEE.Numeric_STD.all;
    entity GCD is
    generic (Width: natural);
    port (Clock,Reset,Load: in std_logic;
       A,B:   in unsigned(Width-1 downto 0);
       Done:  out std_logic;
       Y:     out unsigned(Width-1 downto 0));
    end entity GCD;
    architecture RTL of GCD is
       signal A_New,A_Hold,B_Hold: unsigned(Width-1 downto 0);
       signal A_lessthan_B: std_logic;
    begin
    ----------------------------------------------------
    -- Load 2 input registers and ensure B_Hold < A_Hold
    ---------------------------------------------------
    LOAD_SWAP: process (Clock)
    begin
       if rising_edge(Clock) then
         if (Reset = '0') then
           A_Hold <= (others => '0');
           B_Hold <= (others => '0');
         elsif (Load = '1') then
           A_Hold <= A;
           B_Hold <= B;
         else if (A_lessthan_B = '1') then
           A_Hold <= B_Hold;
           B_Hold <= A_New;
         else A_Hold <= A _New;
         end if;
       end if;
    end process LOAD_SWAP;
    SUBTRACT_TEST: process (A_Hold, B_Hold)
    begin
       -------------------------------------------------------
       -- Subtract B_Hold from A_Hold if A_Hold >= B_Hold
       ------------------------------------------------------
       if (A_Hold >= B_Hold) then
          A_lessthan_B <= '0';
          A_New <= A_Hold - B_Hold;
       else
          A_lessthan_B <= '1';
          A_New <= A_Hold;
       end if;
       -------------------------------------------------
       -- Greatest common divisor found if B_Hold = 0
       -------------------------------------------------
       if (B_Hold = (others => '0')) then
          Done <= '1';
          Y <= A_Hold;
       else
          Done <= '0';
          Y <= (others => '0');
       end if;
    end process SUBTRACT_TEST;
    end architecture RTL;
    Verilog RTL model
    module GCD (Clock, Reset, Load, A, B, Done, Y);
    parameter Width = 8;
    input Clock, Reset, Load;
    input [Width-1:0] A, B;
    output Done;
    output [Width-1:0] Y;
    reg A_lessthan_B, Done;
    reg [Width-1:0] A_New, A_Hold, B_Hold, Y;
    //-----------------------------------------------------
    // Load 2 input registers and ensure B_Hold < A_Hold
    //-----------------------------------------------------
    always @(posedge Clock)
        begin: LOAD_SWAP
           if (Reset) begin
               A_Hold = 0;
               B_Hold = 0;
           end
           else if (Load) begin
               A_Hold = A;
               B_Hold = B;
           end
           else if (A_lessthan_B) begin
               A_Hold = B_Hold;
               B_Hold = A_New;
           end
           else
               A_Hold = A_New;
        end
    always @(A_Hold or B_Hold)
       begin: SUBTRACT_TEST
          //--------------------------------------------------
          // Subtract B_Hold from A_Hold if A_Hold >= B_Hold
          //--------------------------------------------------
          if (A_Hold >= B_Hold) begin
             A_lessthan_ B = 0;
             A_New = A_Hold - B_Hold;
          end
          else begin
             A_lessthan_B = 1;
             A_New = A_Hold;
          end
          //----------------------------------------------
          // Greatest common divisor found if B_Hold = 0
          //----------------------------------------------
          if (B_Hold == 0) begin
             Done = 1;
             Y = A_Hold;
          end
          else begin
             Done = 0;
             Y = 0;
          end
    end
    endmodule

    5. Conclusions

    The reasons for the importance of being able to model hardware in both VHDL and Verilog has been discussed. VHDL and Verilog has been extensively compared and contrasted in a neutral manner . A tutorial has been posed as a problem and solution to demonstrate some language differences and indicated that hardware modeled in one language can also be modeled in the other. Room did not allow test harness models to be included in this tutorial paper, but is shown in the book "HDL Chip Design" [1]. The choice of HDL is shown not to be based on technical capability, but on: personal preferences, EDA tool availability and commercial, business and marketing issues.

    REFERENCES: [1] HDL Chip Design, A Practical Guide for Designing, Synthesizing and Simulating ASICs and FPGAs using VHDL or Verilog by Douglas J Smith, published by Doone Publications.

    String Processing with Regular Expressions - Perl from http://cslibrary.stanford.edu/108/EssentialPerl.html

    String Processing with Regular ExpressionsPerl's most famous strength is in string manipulation with regular expressions. Perl has a million string processing features -- we'll just cover the main ones here. The simple syntax to search for a pattern in a string is...

    ($string =~ /pattern/)  ## true if the pattern is found somewhere in the string

    ("binky" =~ /ink/)  ==> TRUE
    ("binky" =~ /onk/)  ==> FALSE
     

    In the simplest case, the exact characters in the regular expression pattern must occur in the string somewhere. All of the characters in the pattern must be matched, but the pattern does not need to be right at the start or end of the string, and the pattern does not need to use all the characters in the string.

    Character Codes

    The power of regular expressions is that they can specify patterns, not just fixed characters. First, there are special matching characters...
    • a, X, 9 -- ordinary characters just match that character exactly
    • . (a period) -- matches any single character except "\n"
    • \w -- (lowercase w) matches a "word" character: a letter or digit [a-zA-Z0-9]
    • \W -- (uppercase W) any non word character
    • \s -- (lowercase s) matches a single whitespace character -- space, newline, return, tab, form [ \n\r\t\f]
    • \S -- (uppercase S) any non whitespace character
    • \t, \n, \r  -- tab, newline, return
    • \d -- decimal digit [0-9]
    • \   -- inhibit the "specialness" of a character. So, for example, use \. to match a period or \\ to match a slash. If you are unsure if a character has special meaning, such as '@', you can always put a slash in front of it \@ to make sure it is treated just as a character.
    "piiig" =~ /p...g/     ==> TRUE   . = any char (except \n)

    "piiig" =~ /.../       ==> TRUE   need not use up the whole string

    "piiig" =~ /p....g/    ==> FALSE  must use up the whole pattern (the g is not matched)

    "piiig" =~ /p\w\w\wg/  ==> TRUE   \w = any letter or digit

    "p123g" =~ /p\d\d\dg/  ==> TRUE   \d = 0..9 digit

    The modifier "i" after the last / means the match should be case insensitive...

    "PiIIg" =~ /pIiig/     ==> FALSE
    "PiIIg" =~ /pIiig/i    ==> TRUE

    String interpolation works in regular expression patterns. The variable values are pasted into the expression once before it is evaluated. Characters like * and + continue to have their special meanings in the pattern after interpolation, unless the pattern is bracketed with a \Q..\E. The following examples test if the pattern in $target occurs within brackets < > in $string...

    $string =~ /<$target>/        ## Look for <$target>, '.' '*' keep their special meanings in $target

    $string =~ /<\Q$target\E>/    ## The \Q..\E puts a backslash in front of every char,
                                  ## so '.' '*' etc. in $target will not have their special meanings

    Similar to the \Q..\E form, the quotemeta() function returns a string with every character \ escaped. There is an optional "m" (for "match") that comes before the first /. If the "m" is used, then any character can be used for the delimiter instead of / -- so you could use " or # to delimit the pattern. This is handy if what you are trying to match has a lot of /'s in it. If the delimiter is the single quote (') then interpolation is suppressed. The following expressions are all equivalent...

    "piiig" =~ m/piiig/
    "piiig" =~ m"piiig"
    "piiig" =~ m#piiig#
     

    Control Codes

    Things get really interesting when you add in control codes to the regular expression pattern...
    • ?   -- match 0 or 1 occurrences of the pattern to its left
    • *   -- 0 or more occurrences of the pattern to its left
    • +  -- 1 or more occurrences of the pattern to its left
    •   -- (vertical bar)  logical or -- matches the pattern either on its left or right
    • parenthesis ( )  -- group sequences of patterns
    • ^   -- matches the start of the string
    • $   -- matches the end of the string

    Leftmost & Largest

    First, Perl tries to find the leftmost match for the pattern, and second it tries to use up as much of the string as possible -- i.e. let + and * use up as many characters as possible.

    Regular Expression Examples

    The following series gradually demonstrate each of the above control codes. Study them carefully -- small details in regular expressions make a big difference. That's what makes them powerful, but it makes them tricky as well.

    Old joke: What do you call a pig with three eyes? Piiig!

    #### Search for the pattern 'iiig' in the string 'piiig'
    "piiig" =~ m/iiig/ ==> TRUE
    
    #### The pattern may be anywhere inside the string
    "piiig" =~ m/iii/ ==> TRUE
    
    #### All of the pattern must match
    "piiig" =~ m/iiii/ ==> FALSE
    
    #### . = any char but \n
    "piiig" =~ m/...ig/ ==> TRUE
    
    "piiig" =~ m/p.i../ ==> TRUE
    
    #### The last . in the pattern is not matched
    "piiig" =~ m/p.i.../ ==> FALSE
    
    #### \d = digit [0-9]
    "p123g" =~ m/p\d\d\dg/ ==> TRUE
    
    "p123g" =~ m/p\d\d\d\d/ ==> FALSE
    
    #### \w = letter or digit
    "p123g" =~ m/\w\w\w\w\w/ ==> TRUE
    
    #### i+ = one or more i's
    "piiig" =~ m/pi+g/ ==> TRUE
    
    #### matches iii
    "piiig" =~ m/i+/ ==> TRUE
    
    "piiig" =~ m/p+i+g+/ ==> TRUE
    
    "piiig" =~ m/p+g+/ ==> FALSE
    
    #### i* = zero or more i's
    "piiig" =~ m/pi*g/ ==> TRUE
    
    "piiig" =~ m/p*i*g*/ ==> TRUE
    
    #### X* can match zero X's
    "piiig" =~ m/pi*X*g/ ==> TRUE
    
    #### ^ = start, $ = end
    "piiig" =~ m/^pi+g$/ ==> TRUE
    
    #### i is not at the start
    "piiig" =~ m/^i+g$/ ==> FALSE
    
    #### i is not at the end
    "piiig" =~ m/^pi+$/ ==> FALSE
    
    "piiig" =~ m/^p.+g$/ ==> TRUE
    
    "piiig" =~ m/^p.+$/ ==> TRUE
    
    "piiig" =~ m/^.+$/ ==> TRUE
    
    #### g is not at the start
    "piiig" =~ m/^g.+$/ ==> FALSE
    
    #### Needs at least one char after the g
    "piiig" =~ m/g.+/ ==> FALSE
    
    #### Needs at least zero chars after the g
    "piiig" =~ m/g.*/ ==> TRUE
    
    #### | = left or right expression
    "cat" =~ m/^(cat|hat)$/ ==> TRUE
    
    "hat" =~ m/^(cat|hat)$/ ==> TRUE
    
    "cathatcatcat" =~ m/^(cat|hat)+$/ ==> TRUE
    
    "cathatcatcat" =~ m/^(c|a|t|h)+$/ ==> TRUE
    
    "cathatcatcat" =~ m/^(c|a|t)+$/ ==> FALSE
    
    #### Matches and stops at first 'cat'; does not get to 'catcat' on the right
    "cathatcatcat" =~ m/(c|a|t)+/ ==> TRUE
    
    #### ? = optional
    "12121x2121x2" =~ m/^(1x?2)+$/ ==> TRUE
    
    "aaaxbbbabaxbb" =~ m/^(a+x?b+)+$/ ==> TRUE
    
    "aaaxxbbb" =~ m/^(a+x?b+)+$/ ==> FALSE
    
    #### Three words separated by spaces
    "Easy      does it" =~ m/^\w+\s+\w+\s+\w+$/ ==> TRUE
    
    #### Just matches "gates@microsoft" -- \w does not match the "."
    "bill.gates@microsoft.com" =~ m/\w+@\w+/ ==> TRUE
    
    #### Add the .'s to get the whole thing
    "bill.gates@microsoft.com" =~ m/^(\w|\.)+@(\w|\.)+$/ ==> TRUE
    
    #### words separated by commas and possibly spaces
    "Klaatu,   barada,nikto" =~ m/^\w+(,\s*\w+)*$/ ==> TRUE

    Character Classes

    Square brackets can be used to represent a set of characters. For example [aeiouAEIOU] is a one character pattern that matches a vowel. Most characters are not special inside a square bracket and so can be used without a leading backslash (\). \w, \s, and \d work inside a character class, and the dash (-) can be used to express a range of characters, so [a-z] matches lowercase "a" through "z". So the \w code is equivalent to [a-zA-Z0-9]. If the first character in a character class is a caret (^) the set is inverted, and matches all the characters not in the given set. So [^0-9] matches all characters that are not digits.

    The parts of an email address on either side of the "@" are made up of letters, numbers plus dots, underbars, and dashes. As a character class that's just [\w._-].

    "bill.gates_emporer@microsoft.com" =~ m/^[\w._-]+@[\w._-]+$/ ==> TRUE
     

    Match Variables

    If a =~ match expression is true, the special variables $1, $2, ... will be the substrings that matched parts of the pattern in parenthesis -- $1 matches the first left parenthesis, $2 the second left parenthesis, and so on. The following pattern picks out three words separated by whitespace...

    if ("this      and that" =~ /(\w+)\s+(\w+)\s+(\w+)/) {

      ## if the above matches, $1=="this", $2=="and", $3=="that"

    This is a nice way to parse a string -- write a regular expression for the pattern you expect putting parenthesis around the parts you want to pull out. Only use $1, $2, etc. when the if =~ returns true. Other regular-expression systems use \1 and \2  instead of $1 $2, and Perl supports that syntax as well. There are three other special variables: $& (dollar-ampersand) = the matched string, $` (dollar-back-quote) = the string before what was matched, and $' (dollar-quote) = the string following what was matched.

    The following loop rips through a string and pulls out all the email addresses. It demonstrates using a character class, using $1 etc. to pull out parts of the match string, and using $' after the match.

    $str = 'blah blah nick@cs.stanford.edu, blah blah balh billg@microsoft.com blah blah';

    while ($str =~ /(([\w._-]+)\@([\w._-]+))/) { ## look for an email addr
      print "user:$2 host:$3  all:$1\n";         ## parts of the addr
      $str = $';       ## set the str to be the "rest" of the string
    }

    output:
    user:nick host:cs.stanford.edu  all:nick@cs.stanford.edu
    user:billg host:microsoft.com  all:billg@microsoft.com
     

    Substitution

    A slight variation of the match operator can be used to search and replace. Put an "s" in front of the pattern and follow the match pattern with a replacement pattern.

    ## Change all "is" strings to "is not" -- a sure way to improve any document
    $str =~ s/is/is not/ig;

    The replacement pattern can use $1, $2 to refer to parts of the matched string. The "g" modifier after the last / means do the replacement repeatedly in the target string. The modifier "i" means the match should not be case sensitive. The following example finds instances of the letter "r" or "l" followed by a word character, and replaces that pattern with "w" followed by the same word character. Sounds like Tweety Bird...

    ## Change "r" and "l" followed by a word char to "w" followed
    ## by the same word char
    $x = "This dress exacerbates the genetic betrayal that is my Legacy.\n";
    $x =~ s/(r|l)(\w)/w$2/ig;    ## r or l followed by a word char
    ## $x is now "This dwess exacewbates the genetic betwayal that is my wegacy."
     

    The ? Trick

    One problem with * and +, is that they are "greedy" -- they try to use up as many characters as they can. Suppose you are trying to pick out all of the characters between two curly braces { }. The simplest thing would be to use the pattern...

    m/{(.*)}/  -- pick up all the characters between {}'s

    The problem is that if you match against the string "{group 1} xx {group 2}", the * will aggressively run right over the first } and match the second }. So $1 will be "group 1} xx {group 2" instead of "group 1". Fortunately Perl has a nice solution to the too-aggressive-*/+ problem. If a ? immediately follows the * or +, then it tries to find the shortest repetition which works instead of the longest. You need the ? variant most often when matching with .* or \S* which can easily use up more than you had in mind. Use ".*?" to skip over stuff you don't care about, but have something you do care about immediately to its right. Such as..

    m/{(.*?)}/ ## pick up all the characters between {}'s, but stop
               ## at the first }
     

    The old way to skip everything up until a certain character, say }, uses the [^}] construct like this...

    m/{([^}]*)}/ ## the inner [^}] matches any char except }

    I prefer the (.*?) form. In fact, I suspect it was added to the language precisely as an improvement over the [^}]* form.

    Substring

    The index(string, string-to-look-for, start-index) operator searches the first string starting at the given index for an occurrence of the second string. Returns the 0 based index of the first occurrence, or -1 if not found. The following code uses index() to walk through a string and count the number of times "binky" occurs.

    $count = 0;
    $pos = 0;
    while ( ($pos = index($string, "binky", $pos) != -1) {
     $count++;
     $pos++;
    }
     

    The function substr(string, index, length) pulls a substring out of the given string. Substr() starts at the given index and continues for the given length.
     

    Split

    The split operator takes a regular expression, and a string, and returns an array of all the substrings from the original string which were separated by that regular expression. The following example pulls out words separated by commas possibly with whitespace thrown in...

    split(/\s*,\s*/, "dress ,      betrayal    ,  legacy") ## returns the array
       ("dress", "betrayal", "legacy")
     

    Split is often a useful way to pull an enumeration out of some text for processing. If the number -1 is passed as a third argument to split, then it will interpret an instance of the separator pattern at the end of the string as marking a last, empty element (note the comma after the last word)...

    split(/\s*,\s*/, "dress ,      betrayal    ,  legacy,", -1) ## returns the array
       ("dress", "betrayal", "legacy", "")
     

    Character Translate -- tr

    The tr// operator goes through a string and replaces characters with other characters.

    $string =~ tr/a/b/;  -- change all a's to b's
    $string =~ tr/A-Z/a-z/; -- change uppercase to lowercase   (actually lc() is better for this)
     
     

    June 11

    Perl-Subroutine from http://oreilly.com/catalog/lperl3/chapter/ch04.html

    Chapter 4
    Subroutines

    System and User Functions

    We've already seen and used some of the builtin system functions, such as chomp, reverse, print, and so on. But, as other languages do, Perl has the ability to make subroutines, which are user-defined functions.[1] These let us recycle one chunk of code many times in one program.[2]

    The name of a subroutine is another Perl identifier (letters, digits, and underscores, but can't start with a digit) with a sometimes-optional ampersand (&) in front. There's a rule about when you can omit the ampersand and when you cannot; we'll see that rule by the end of the chapter. For now, we'll just use it every time that it's not forbidden, which is always a safe rule. And we'll tell you every place where it's forbidden, of course.

    That subroutine name comes from a separate namespace, so Perl won't be confused if you have a subroutine called &fred and a scalar called $fred in the same program--although there's no reason to do that under normal circumstances.

    Defining a Subroutine

    To define your own subroutine, use the keyword sub, the name of the subroutine (without the ampersand), then the indented[3] block of code (in curly braces) which makes up the body of the subroutine, something like this:

    sub marine {
      $n += 1;  # Global variable $n
      print "Hello, sailor number $n!\n";
    }
    

    Subroutine definitions can be anywhere in your program text, but programmers who come from a background of languages like C or Pascal like to put them at the start of the file. Others may prefer to put them at the end of the file, so that the main part of the program appears at the beginning. It's up to you. In any case, you don't normally need any kind of forward declaration.[4]

    Subroutine definitions are global; without some powerful trickiness, there are no private subroutines.[5] If you have two subroutine definitions with the same name, the later one overwrites the earlier one.[6] That's generally considered bad form, or the sign of a confused maintenance programmer.

    As you may have noticed in the previous example, you may use any global variables within the subroutine body. In fact, all of the variables we've seen so far are globals; that is, they are accessible from every part of your program. This horrifies linguistic purists, but the Perl development team formed an angry mob with torches and ran them out of town years ago. We'll see how to make private variables in the section "Private Variables in Subroutines" later in this chapter.

    Invoking a Subroutine

    Invoke a subroutine from within any expression by using the subroutine name (with the ampersand):[7]

    &marine;  # says Hello, sailor number 1!
    &marine;  # says Hello, sailor number 2!
    &marine;  # says Hello, sailor number 3!
    &marine;  # says Hello, sailor number 4!
    

    Sometimes, we refer to the invocation as calling the subroutine.

    Return Values

    The subroutine is always invoked as part of an expression, even if the result of the expression isn't being used. When we invoked &marine earlier, we were calculating the value of the expression containing the invocation, but then throwing away the result.

    Many times, we'll call a subroutine and actually do something with the result. This means that we'll be paying attention to the return value of the subroutine. All Perl subroutines have a return value--there's no distinction between those that return values and those that don't. Not all Perl subroutines have a useful return value, however.

    Since all Perl subroutines can be called in a way that needs a return value, it'd be a bit wasteful to have to declare special syntax to "return" a particular value for the majority of the cases. So Larry made it simple. Every subroutine is chugging along, calculating values as part of its series of actions. Whatever calculation is last performed in a subroutine is automatically also the return value.

    For example, let's define this subroutine:

    sub sum_of_fred_and_barney {
      print "Hey, you called the sum_of_fred_and_barney subroutine!\n";
      $fred + $barney;  # That's the return value
    }
    

    The last expression evaluated in the body of this subroutine is the sum of $fred and $barney, so the sum of $fred and $barney will be the return value. Here's that in action:

    $fred = 3;
    $barney = 4;
    $c = &sum_of_fred_and_barney; # $c gets 7
    print "\$c is $c.\n";
    $d = 3 * &sum_of_fred_and_barney; # $d gets 21
    print "\$d is $d.\n";
    

    That code will produce this output:

    Hey, you called the sum_of_fred_and_barney subroutine!
    $c is 7.
    Hey, you called the sum_of_fred_and_barney subroutine!
    $d is 21.
    

    That print statement is just a debugging aid, so that we can see that we called the subroutine. You'd take it out when the program is finished. But suppose you added another line to the end of the code, like this:

    sub sum_of_fred_and_barney {
      print "Hey, you called the sum_of_fred_and_barney subroutine!\n";
      $fred + $barney;  # That's not really the return value!
      print "Hey, I'm returning a value now!\n"; # Oops!
    }
    

    In this example, the last expression evaluated is not the addition; it's the print statement. Its return value will normally be 1, meaning "printing was successful,"[8] but that's not the return value we actually wanted. So be careful when adding additional code to a subroutine to ensure that the last expression evaluated will be the desired return value.

    So, what happened to the sum of $fred and $barney in that subroutine? We didn't put it anywhere, so Perl discarded it. If you had requested warnings, Perl (noticing that there's nothing useful about adding two variables and discarding the result) would likely warn you about something like "a useless use of addition in a void context." The term void context is just a fancy of saying that the answer isn't being stored in a variable or used by another function.

    "The last expression evaluated" really means the last expression evaluated, rather than the last line of text. For example, this subroutine returns the larger value of $fred or $barney:

    sub larger_of_fred_or_barney {
      if ($fred > $barney) {
        $fred;
      } else {
        $barney;
      }
    }
    

    The last expression evaluated is the single $fred or $barney, which becomes the return value. We won't know whether the return value will be $fred or $barney until we see what those variables hold at runtime.

    A subroutine can also return a list of values when evaluated in a list context.[9] Suppose you wanted to get a range of numbers (as from the range operator, ..), except that you want to be able to count down as well as up. The range operator only counts upwards, but that's easily fixed:

    sub list_from_fred_to_barney {
      if ($fred < $barney) {
        # Count upwards from $fred to $barney
        $fred..$barney;
      } else {
        # Count downwards from $fred to $barney
        reverse $barney..$fred;
      }
    }
    $fred = 11;
    $barney = 6;
    @c = &list_from_fred_to_barney; # @c gets (11, 10, 9, 8, 7, 6)
    

    In this case, the range operator gives us the list from 6 to 11, then reverse reverses the list, so that it goes from $fred (11) to $barney (6), just as we wanted.

    These are all rather trivial examples. It gets better when we can pass values that are different for each invocation into a subroutine instead of relying on global variables. In fact, that's coming right up.

    Arguments

    That subroutine called larger_of_fred_or_barney would be much more useful if it didn't force us to use the global variables $fred and $barney. That's because, if we wanted to get the larger value from $wilma and $betty, we currently have to copy those into $fred and $barney before we can use larger_of_fred_or_barney. And if we had something useful in those variables, we'd have to first copy those to other variables, say $save_fred and $save_barney. And then, when we're done with the subroutine, we'd have to copy those back to $fred and $barney again.

    Luckily, Perl has subroutine arguments. To pass an argument list to the subroutine, simply place the list expression, in parentheses, after the subroutine invocation, like this:

    $n = &max(10, 15);  # This sub call has two parameters
    

    That list is passed to the subroutine; that is, it's made available for the subroutine to use however it needs to. Of course, this list has to be stored into a variable, so the parameter list (another name for the argument list) is automatically assigned to a special array variable named @_ for the duration of the subroutine. The subroutine can access this variable to determine both the number of arguments and the value of those arguments.

    So, that means that the first subroutine parameter is stored in $_[0], the second one is stored in $_[1], and so on. But--and here's an important note--these variables have nothing whatsoever to do with the $_ variable, any more than $dino[3] (an element of the @dino array) has to do with $dino (a completely distinct scalar variable). It's just that the parameter list must be stored into some array variable for the subroutine to use it, and Perl uses the array @_ for this purpose.

    Now, you could write the subroutine &max to look a little like the subroutine &larger_of_fred_or_barney, but instead of using $a you could use the first subroutine parameter ($_[0]), and instead of using $b, you could use the second subroutine parameter ($_[1]). And so you could end up with code something like this:

    sub max {
      # Compare this to &larger_of_fred_or_barney
      if ($_[0] > $_[1]) { 
        $_[0];
      } else {
        $_[1];
      }
    }
    

    Well, as we said, you could do that. But it's pretty ugly with all of those subscripts, and hard to read, write, check, and debug, too. We'll see a better way in a moment.

    There's another problem with this subroutine. The name &max is nice and short, but it doesn't remind us that this subroutine works properly only if called with exactly two parameters:

    $n = &max(10, 15, 27);  # Oops!
    

    Excess parameters are ignored--since the subroutine never looks at $_[2], Perl doesn't care whether there's something in there or not. And insufficient parameters are also ignored--you simply get undef if you look beyond the end of the @_ array, as with any other array. We'll see how to make a better &max, which works with any number of parameters, later in this chapter.

    The @_ variable is local to the subroutine;[10] if there's a global value in @_, it is saved away before the subroutine is invoked and restored to its previous value upon return from the subroutine.[11] This also means that a subroutine can pass arguments to another subroutine without fear of losing its own @_ variable--the nested subroutine invocation gets its own @_ in the same way. Even if the subroutine calls itself recursively, each invocation gets a new @_, so @_ is always the parameter list for the current subroutine invocation.

    Private Variables in Subroutines

    But if Perl can give us a new @_ for every invocation, can't it give us variables for our own use as well? Of course it can.

    By default, all variables in Perl are global variables; that is, they are accessable from every part of the program. But you can create private variables called lexical variables at any time with the my operator:

    sub max {
      my($a, $b);       # new, private variables for this block
      ($a, $b) = @_;    # give names to the parameters
      if ($a > $b) { $a } else { $b }
    }
    

    These variables are private (or scoped) to the enclosing block; any other $a or $b is totally unaffected by these two. And that goes the other way, too--no other code can access or modify these private variables, by accident or design.[12] So, we could drop this subroutine into any Perl program in the world and know that we wouldn't mess up that program's $a and $b (if any).[13]

    It's also worth pointing out that, inside the if's blocks, there's no semicolon needed after the return value expression. Although Perl allows for the last semicolon in a block to be omitted, in practice that's omitted only when the code is so simple that the block is written in a single line, like the previous ones.

    The subroutine in the previous example could be made even simpler. Did you notice that the list ($a, $b) was written twice? That my operator can also be applied to a list of variables enclosed in parentheses, so it's more customary to combine those first two statements in the subroutine:

    my($a, $b) = @_;  # Name the subroutine parameters
    

    That one statement creates the private variables and sets their values, so the first parameter now has the easier-to-use name $a and the second has $b. Nearly every subroutine will start with a line much like that one, naming its parameters. When you see that line, you'll know that the subroutine expects two scalar parameters, which we'll call $a and $b inside the subroutine.

    The local Operator

    You might consider this next section a giant footnote, but then we couldn't have footnotes on footnotes, so we decided to put it up in the main text. Skip over this text on first reading, and pop right on down to "Variable Length Parameter Lists" below. You won't need any of it to do the exercises or write Perl code for a long time. But someone invariably asks us in class something like "What is that local thing I see in some programs?" so we're including what we normally say as an aside in class for your enjoyment and edification.

    Occasionally, mostly in older code or older Perl books, you'll see the local operator used instead of my. It often looks much the same as my:

    sub max {
      local($a, $b) = @_;  # looks a lot like my
      if ($a > $b) { $a } else { $b }
    }
    

    But local is misnamed, or at least misleadingly named. Our friend Chip Salzenberg says that if he ever gets a chance to go back in a time machine to 1986 and give Larry one piece of advice, he'd tell Larry to call local by the name "save" instead.[14] That's because local actually will save the given global variable's value away, so it will later automatically be restored to the global variable. (That's right: these so-called "local" variables are actually globals!) This save-and-restore mechanism is the same one we've already seen twice now, in the control variable of a foreach loop, and in the @_ array of subroutine parameters.

    What local actually does, then, is to save away a copy of the variable's value in a secret place (called the stack). That value can't be accessed, modified, or deleted[15] while it is saved. Then local sets the variable to an empty value (undef for scalars, or empty list for arrays), or to whatever value is being assigned. When Perl returns from the subroutine,[16] the variable is automatically restored to its original value. In effect, the variable was borrowed for a time and given back (hopefully) before anyone noticed that it was borrowed.

    The Difference Between local and my

    But what if the subroutine called another subroutine, one that did notice that the variable was being borrowed by local? For example:

    $office = "global";  # Global $office
    &say(  );                                # says "global", accessing $office directly
     
    &fred(  );                               # says "fred", dynamic scope,
        # because fred's local $office hides the global
     
    &barney(  );                             # says "global", lexical scope;
        # barney's $office is visible only in that block
     
    sub say { print "$office\n"; }         # print the currently visible $office
    sub fred { local($office) = "fred"; &say(  ); }
    sub barney { my($office) = "barney"; &say(  ); }
    

    First, we call the subroutine &say, which tells us which $office it sees--the global $office. That's normal.

    But then we call Fred's subroutine. Fred has made his own local $office, so he has actually changed the behavior of the &say subroutine; now it tells us what's in Fred's $office. We can't tell whether that's what Fred wanted to do or not without understanding the meaning of his code. But it's a little odd.

    Barney, however, is a little smarter, as well as being shorter, so he uses the shorter (and smarter) operator, my. Barney's variable $office is private, and Barney's private $office can't be accessed from outside his subroutine, so the &say subroutine is back to normal; it can see only the global $office. Barney didn't change the way &say works, which is more like what most programmers would want and expect.

    Now, if you're confused about these two operators at this point, that's to be expected. But any time that you see local, think "save," and that may help. In any new code, just use my, since my variables (lexical variables) are faster than globals--remember, so-called local variables are really globals--and they'll work more like the traditional variables in other modern programming languages. But when you're maintaining someone else's old code, you can't necessarily change every local to my without checking upon whether the programmer was using that save-and-restore functionality.

    Variable-length Parameter Lists

    In real-world Perl code, subroutines are often given parameter lists of arbitrary length. That's because of Perl's "no unnecessary limits" philosophy that we've already seen. Of course, this is unlike many traditional programming languages, which require every subroutine to be strictly typed; that is, to permit only a certain, predefined number of parameters of predefined types. It's nice that Perl is so flexible, but (as we saw with the &max routine earlier) that may cause problems when a subroutine is called with a different number of arguments than the author expected.

    Of course, the subroutine can easily check that it has the right number of arguments by examining the @_ array. For example, we could have written &max to check its argument list like this:[17]

    sub max {
      if (@_ != 2) {
        print "WARNING! &max should get exactly two arguments!\n";
      }
      # continue as before...
      .
      .
      .
    }
    

    That if-test uses the "name" of the array in a scalar context to find out the number of array elements, as we saw in Chapter 3.

    But in real-world Perl programming, this sort of check is hardly ever used; it's better to make the subroutine adapt to the parameters.

    A Better &max Routine

    So let's rewrite &max to allow for any number of arguments:

    $maximum = &max(3, 5, 10, 4, 6);
     
    sub max {
      my($max_so_far) = shift @_;  # the first one is the largest yet seen
      foreach (@_) {               # look at the remaining arguments
        if ($_ > $max_so_far) {    # could this one be bigger yet?
          $max_so_far = $_;
        }
      }
      $max_so_far;
    }
    

    This code uses what has often been called the "high-water mark" algorithm; after a flood, when the waters have surged and receded for the last time, the high-water mark shows where the highest water was seen. In this routine, $max_so_far keeps track of our high-water mark, the largest number yet seen.

    The first line sets $max_so_far to 3 (the first parameter in the example code) by shifting that parameter from the parameter array, @_. So @_ now holds (5, 10, 4, 6), since the 3 has been shifted off. And the largest number yet seen is the only one yet seen: 3, the first parameter.

    Now, the foreach loop will step through the remaining values in the parameter list, from @_. The control variable of the loop is, by default, $_. (But, remember, there's no automatic connection between @_ and $_; it's just a coincidence that they have such similar names.) The first time through the loop, $_ is 5. The if test sees that it is larger than $max_so_far, so $max_so_far is set to 5--the new high-water mark.

    The next time through the loop, $_ is 10. That's a new record high, so it's stored in $max_so_far as well.

    The next time, $_ is 4. The if test fails, since that's no larger than $max_so_far, which is 10, so the body of the if is skipped.

    The next time, $_ is 6, and the body of the if is skipped again. And that was the last time through the loop, so the loop is done.

    Now, $max_so_far becomes the return value. It's the largest number we've seen, and we've seen them all, so it must be the largest from the list: 10.

    Empty Parameter Lists

    That improved &max algorithm works fine now, even if there are more than two parameters. But what happens if there are none?

    At first, it may seem too esoteric to worry about. After all, why would someone call &max without giving it any parameters? But maybe someone wrote a line like this one:

    $maximum = &max(@numbers);
    

    And the array @numbers might sometimes be an empty list; perhaps it was read in from a file that turned out to be empty, for example. So what does &max do in that case?

    The first line of the subroutine sets $max_so_far by using shift on @_, the (now empty) parameter array. That's harmless; the array is left empty, and shift returns undef to $max_so_far.

    Now the foreach loop wants to iterate over @_, but since that's empty, the loop body is executed zero times.

    So in short order, Perl returns the value of $max_so_far--undef--as the return value of the subroutine. In some sense, that's the right answer, because there is no largest value in an empty list.

    Of course, whoever is calling this subroutine should be aware that the return value may be undef--or they could simply ensure that the parameter list is never empty.

    Notes on Lexical (my) Variables

    Those lexical variables can actually be used in any block, not merely in a subroutine's block. For example, they can be used in the block of an if, while, or foreach:

    foreach (1..10) {
      my($square) = $_ * $_;  # private variable in this loop
      print "$_ squared is $square.\n";
    }
    

    The variable $square is private to the enclosing block; in this case, that's the block of the foreach loop. If there's no block, the variable is private to the entire source file. For now, your programs aren't going to use more than one source file, so this isn't an issue. But the important concept is that the scope of a lexical variable's name is limited to the smallest enclosing block or file. The only code that can say $square and mean that variable is the code inside that textual scope. This is a big win for maintainability--if the wrong value is found in $square, the culprit will be found within a limited amount of source code. As experienced programmers have learned (often the hard way), limiting the scope of a variable to a page of code, or even to a few lines of code, really speeds along the development and testing cycle.

    Note also that the my operator doesn't change the context of an assignment:

    my($num) = @_;  # list context, same as ($num) = @_;
    my $num  = @_;  # scalar context, same as $num = @_;
    

    In the first one, $num gets the first parameter, as a list-context assignment; in the second, it gets the number of parameters, in a scalar context. Either line of code could be what the programmer wanted; we can't tell from that one line alone, and so Perl can't warn you if you use the wrong one. (Of course, you wouldn't have both of those lines in the same subroutine, since you can't have two lexical variables with the same name declared in the same scope; this is just an example.) So, when reading code like this, you can always tell the context of the assignment by seeing what the context would be without the word my.

    Of course, you can use my to create new, private arrays as well:[18]

    my @phone_number;
    

    Any new variable will start out empty--undef for scalars, or the empty list for arrays.

    The use strict Pragma

    Perl tends to be a pretty permissive language. But maybe you want Perl to impose a little discipline; that can be arranged with the use strict pragma.

    A pragma is a hint to a compiler, telling it something about the code. In this case, the use strict pragma tells Perl's internal compiler that it should enforce some good programming rules for the rest of this block or source file.

    Why would this be important? Well, imagine that you're composing your program, and you type a line like this one:

    $bamm_bamm = 3;  # Perl creates that variable automatically
    

    Now, you keep typing for a while. After that line has scrolled off the top of the screen, you type this line to increment the variable:

    $bammbamm += 1;  # Oops!
    

    Since Perl sees a new variable name (the underscore is significant in a variable name), it creates a new variable and increments that one. If you're lucky and smart, you've turned on warnings, and Perl can tell you that you used one or both of those global variable names only once in your program. But if you're merely smart, you used each name more than once, and Perl won't be able to warn you.

    To tell Perl that you're ready to be more restrictive, put the use strict pragma at the top of your program (or in any block or file where you want to enforce these rules):

    use strict;  # Enforce some good programming rules
    

    Now, among other restrictions,[19] Perl will insist that you declare every new variable with my:[20]

    my $bamm_bamm = 3;  # New lexical variable
    

    Now if you try to spell it the other way, Perl can complain that you haven't declared any variable called $bammbamm, so your mistake is automatically caught at compile time.

    $bammbamm += 1;  # No such variable: Compile time error
    

    Of course, this applies only to new variables; Perl's builtin variables, such as $_ and @_ never need to be declared.[21]

    If you add use strict to an already-written program, you'll generally get a flood of warning messages, so it's better to use it from the start, when it's needed.

    Most people recommend that programs that are longer than a screenful of text generally need use strict. And we agree.

    From here on, most (but not all) of our examples will be written as if use strict is in effect, even where we don't show it. That is, we'll generally declare variables with my where it's appropriate. But, even though we don't always do so here, we encourage you to include use strict in your programs as often as possible.

    The return Operator

    The return operator immediately returns a value from a subroutine:

    my @names = qw/ fred barney betty dino wilma pebbles bamm-bamm /;
    my $result = &which_element_is("dino", @names);
     
    sub which_element_is {
      my($what, @list) = @_;
      foreach (0..$#list) {  # indices of @list's elements
        if ($what eq $list[$_]) {
          return $_;         # return early once found
        }
      }
      -1;                    # element not found (return is optional here)
    }
    

    This subroutine is being used to find the index of "dino" in the array @names. First, the my declaration names the parameters: there's $what, which is what we're searching for, and @list, a list of values to search within. That's a copy of the array @names, in this case. The foreach loop steps through the indices of @list (the first index is 0, and the last one is $#list, as we saw in Chapter 3).

    Each time through the foreach loop, we check to see whether the string in $what is equal[22] to the element from @list at the current index. If it's equal, we return that index at once. This is the most common use of the keyword return in Perl--to return a value immediately, without executing the rest of the subroutine.

    But what if we never found that element? In that case, the author of this subroutine has chosen to return -1 as a "value not found" code. It would be more Perlish, perhaps, to return undef in that case, but this programmer used -1. Saying return -1 on that last line would be correct, but the word return isn't really needed.

    Some programmers like to use return every time there's a return value, as a means of documenting that it is a return value. For example, you might use return when the return value is not the last line of the subroutine, such as in the subroutine &list_from_fred_to_barney, earlier in this chapter. It's not really needed, but it doesn't hurt anything. However, many Perl programmers believe it's just an extra seven characters of typing. So you'll need to be able to read code written by both kinds of programmers.

    If return is used with no expression, that will return an empty value--undef in a scalar context, or an empty list in a list context. return ( ) does the same, in case you want to be explicit.

    Omitting the Ampersand

    As promised, now we'll tell you the rule for when a subroutine call can omit the ampersand. If the compiler sees the subroutine definition before invocation, or if Perl can tell from the syntax that it's a subroutine call, the subroutine can be called without an ampersand, just like a builtin function. (But there's a catch hidden in that rule, as we'll see in a moment.)

    This means that if Perl can see that it's a subroutine call without the ampersand, from the syntax alone, that's generally fine. That is, if you've got the parameter list in parentheses, it's got to be a function[23] call:

    my @cards = shuffle(@deck_of_cards);  # No & necessary on &shuffle
    

    Or if Perl's internal compiler has already seen the subroutine definition, that's generally okay, too; in that case, you can even omit the parentheses around the argument list:

    sub division {
      $_[0] / $_[1];                   # Divide first param by second
    }
     
    my $quotient = division 355, 113;  # Uses &division
    

    This works because of the rule that parentheses may always be omitted, except when doing so would change the meaning of the code.

    But don't put that subroutine declaration after the invocation, or the compiler won't know what the attempted invocation of division is all about. The compiler has to see the definition before the invocation in order to use the subroutine call as if it were a builtin.

    That's not the catch, though. The catch is this: if the subroutine has the same name as a Perl builtin, you must use the ampersand to call it. With an ampersand, you're sure to call the subroutine; without it, you can get the subroutine only if there's no builtin with the same name:

    sub chomp {
      print "Munch, munch!";
    }
     
    &chomp;  # That ampersand is not optional!
    

    Without the ampersand, we'd be calling the builtin chomp, even though we've defined the subroutine &chomp. So, the real rule to use is this one: until you know the names of all of Perl's builtin functions, always use the ampersand on function calls. That means that you will use it for your first hundred programs or so. But when you see someone else has omitted the ampersand in their own code, it's not necessarily a mistake; perhaps they simply know that Perl has no builtin with that name.[24]

    When programmers plan to call their subroutines as if they were calling Perl's builtins, often when writing modules, they often use prototypes to tell Perl about the parameters to expect. Making modules is an advanced topic, though; when you're ready for that, see Perl's documentation (in particular, the perlmod and perlsub documents) for more information about subroutine prototypes and making modules.

    Exercises

    See Appendix A for answers to the following exercises:

    1. [12] Write a subroutine, called &total, which returns the total of a list of numbers. Hint: the subroutine should not perform any I/O; it should simply process its parameters and return a value to its caller. Try it out in this sample program, which merely exercises the subroutine to see that it works. The first group of numbers should add up to 25.

      my @fred = qw{ 1 3 5 7 9 }; my $fred_total = &total(@fred); print "The total of \@fred is $fred_total.\n"; print "Enter some numbers on separate lines: "; my $user_total = &total(<STDIN>); print "The total of those numbers is $user_total.\n";

    2. [5] Using the subroutine from the previous problem, make a program to calculate the sum of the numbers from 1 to 1000.

    1. In Perl, we don't generally make the distinction that Pascal programmers are used to, between functions, which return a value, and procedures, which don't. But a subroutine is always user-defined, while a function may or may not be. That is, the word function may be used as a synonym for subroutine, or it may mean one of Perl's builtin functions. That's why this chapter is titled Subroutines, because it's about the ones you can define, not the builtins. Mostly.

    2. The code examples used in this book are recycled from at least 40% post-consumer programming, and are at least 75% recyclable into your programs when properly decomposed.

    3. Okay, purists, we admit it: the curly braces are part of the block, properly speaking. And Perl doesn't require the indentation of the block--but your maintenance programmer will. So please be stylish.

    4. Unless your subroutine is being particularly tricky and declares a "prototype," which dictates how a compiler will parse and interpret its invocation arguments. This is rare--see the perlsub manpage for more information.

    5. If you wish to be powerfully tricky, read the Perl documentation about coderefs stored in private (lexical) variables.

    6. A warnable offense, however.

    7. And frequently a pair of parentheses, even if empty. As written, the subroutine inherits the caller's @_ value, which we'll be discussing shortly. So don't stop reading here, or you'll be writing code with unintended effects!

    8. The return value of print is true for a successful operation and false for a failure. We'll see how to determine the kind of failure later in Chapter 11, Filehandles and File Tests.

    9. You can detect whether a subroutine is being evaluated in a scalar or list context using the wantarray function, which lets you easily write subroutines with specific list or scalar context values.

    10. Unless there's an ampersand in front of the name for the invocation, and no parentheses (or arguments) afterward, in which case the @_ array is inherited from the caller's context. That's generally a bad idea, but is occasionally useful.

    11. You might recognize that this is the same mechanism as used with the control variable of the foreach loop, as seen in the previous chapter. In either case, the variable's value is saved and automatically restored by Perl. We'll see this again with the local operator later in this chapter.

    12. Advanced programmers will realize that a lexical variable may be accessible by reference from outside its scope, but never by name.

    13. Of course, if that program already had a subroutine called &max, we'd mess that up.

    14. We would tell Larry to buy stock in Yahoo!, but Chip is more idealistic than we are.

    15. Or damaged, defiled, read, checked, touched, seen, changed, or printed, for that matter. There's no way from within Perl to get at the saved value.

    16. Or when it finishes execution of the smallest enclosing block or file, to be more precise.

    17. As soon as you learn about warn (in Chapter 11), you'll see that you can use it to turn improper usage like this into a proper warning. Or perhaps you'll decide that this case is severe enough to warrant using die, described in the same chapter.

    18. Or hashes, which we'll see in the next chapter.

    19. To learn about the other restrictions, see the documentation for strict. The documentation for any pragma is filed under that pragma's name, so the command perldoc strict (or your system's native documentation method) should find it for you. In brief, the other restrictions require that strings be quoted in most cases, and that references be true (hard) references. Neither of these restrictions should affect beginners in Perl.

    20. There are some other ways to declare variables, too.

    21. And, at least in some circumstances, $a and $b won't need to be declared, because they're used internally by sort. So if you're testing this feature, use other variable names than those two. The fact that use strict doesn't forbid these two is one of the most frequently reported non-bugs in Perl.

    22. You noticed that we used the string equality test, eq, instead of the numeric equality test, ==, didn't you?

    23. In this case, the function is the subroutine &shuffle. But it may be a built-in function, as we'll see in a moment.

    24. Then again, maybe it is a mistake; you can search the perlfunc and perlop manpages for that name, though, to see whether it's the same as a builtin. And Perl will usually be able to warn you about this, when you have warnings turned on.

    June 10

    Perl – exists EXP form http://perldoc.perl.org/functions/exists.html

    exists EXPR

    Given an expression that specifies a hash element or array element, returns true if the specified element in the hash or array has ever been initialized, even if the corresponding value is undefined. The element is not autovivified if it doesn't exist.

        print "Exists\n" 	if exists $hash{$key};
        print "Defined\n" 	if defined $hash{$key};
        print "True\n"      if $hash{$key};
    
        print "Exists\n" 	if exists $array[$index];
        print "Defined\n" 	if defined $array[$index];
        print "True\n"      if $array[$index];

    A hash or array element can be true only if it's defined, and defined if it exists, but the reverse doesn't necessarily hold true.

    Given an expression that specifies the name of a subroutine, returns true if the specified subroutine has ever been declared, even if it is undefined. Mentioning a subroutine name for exists or defined does not count as declaring it. Note that a subroutine which does not exist may still be callable: its package may have an AUTOLOAD method that makes it spring into existence the first time that it is called -- see perlsub.

        print "Exists\n" 	if exists &subroutine;
        print "Defined\n" 	if defined &subroutine;

    Note that the EXPR can be arbitrarily complicated as long as the final operation is a hash or array key lookup or subroutine name:

        if (exists $ref->{A}->{B}->{$key}) 	{ }
        if (exists $hash{A}{B}{$key}) 	{ }
    
        if (exists $ref->{A}->{B}->[$ix]) 	{ }
        if (exists $hash{A}{B}[$ix]) 	{ }
    
        if (exists &{$ref->{A}{B}{$key}})   { }

    Although the deepest nested array or hash will not spring into existence just because its existence was tested, any intervening ones will. Thus $ref->{&quot;A&quot;} and $ref->{&quot;A&quot;}->{&quot;B&quot;} will spring into existence due to the existence test for the $key element above. This happens anywhere the arrow operator is used, including even:

        undef $ref;
        if (exists $ref->{"Some key"})	{ }
        print $ref; 	    # prints HASH(0x80d3d5c)

    This surprising autovivification in what does not at first--or even second--glance appear to be an lvalue context may be fixed in a future release.

    Use of a subroutine call, rather than a subroutine name, as an argument to exists() is an error.

        exists &sub;	# OK
        exists &sub();	# Error