Thursday, July 21, 2011

on perl's open()

There's a minor brouhaha in the Perl community these days regarding perlopentut. One bone of contention regards the style of Perl that is shown to readers of the tutorial - ostensibly they're beginners to the language.


Another contentious aspect is that Tom Christiansen, the original author of the tutorial, sees the proposed changes as stomping across his authorship. This latter point is complicated a bit by the license on the tutorial, which allows anyone to make changes to it. (Tom's "I'll take my toys and go home," knee-jerk reaction was greeted with the equally inflammatory, "Neener neener.")




The "Modern Perl" movers-and-shakers would like to see something like this:


open FH, '>some-file.txt' or die "open some-file.txt: $!\n";


replaced with this:


open my $fh, '>', 'some-file.txt' or die "open some-file.txt: $!\n";


I personally prefer the three-arg form of open. I could tell you that it's because I like my filehandles to close themselves when they go out of scope, or that I care about the safeness of the three-arg over the two-arg. But really it's that I find FH makes my code look like it's shouting special magic at me, whereas $fh just tells me to relax: filehandles can be safely hidden inside a scalar. The $, which newcomers to Perl find so perlpexing, and which detractors of the language find so jangly to look at, is a soothing reassurance to my eyes.


Seriously. That's why I like the three-arg form of open(). Because it looks nicer to me. If I were interviewing, I'd mumble stuff about open() being safer in the three- than two-arg form. But I look at code all day long, so I choose to look at what I find to be pretty code.


Recently one of my colleagues pointed out a bug in one of the write-to-a-file routines that I dashed off inside some script. Despite being a perl developer for 6 years, and having 11 years experience with the language, I'd written:

open my $fh, '>', '/tmp/wherever' or die "open /tmp/wherever: $!\n";
...
print $fh, "here's the data I want logged\n";
...

There's a subtle bug there, which perl can't find. Here's two programs you can run side-by-side to compare. First, let's write to a file using the two-arg form of open():

1 #!/usr/bin/perl
2 
3 use strict;
4 use warnings;
5 
6 open FH, '>two-arg.txt' or die "two-arg.txt: $!\n";
7 print FH, "hello world\n";
8 close FH;

And here's the same program using the three-arg form of open():

1 #!/usr/bin/perl
2 
3 use strict;
4 use warnings;
5 
6 open my $fh, '>', 'three-arg.txt' or die "three-arg.txt: $!\n";
7 print $fh, "hello world\n";
8 close $fh;

Can you figure out the bug before hitting the jump?


Here's the output from the first one:

belden@skretting:~$ perl 2-arg.pl
No comma allowed after filehandle at 2-arg.pl line 7.

Whoops! That's right - I did mean for line 7 to look like this:
print FH "hello world\n";

Looking at the three-arg form of the script, we can see that line 7 also has a ',' after $fh. Let's see what perl does:

belden@skretting:~$ perl 3-arg.pl
GLOB(0x912f880)hello world

It does something completely unexpected: perl acts like I've done this:
print STDOUT $fh, "hello world\n";

The "GLOB(0x912f880)" is the stringification of $fh.

In the bit of work code that I wrote, STDOUT had actually been re-opened to a log file. My output was bizarre, and going to a completely unexpected place. This effect is easy for a seasoned Perl developer to find and understand - but could be hours of frustration for someone new to the language who just wants to write a script that runs out of cron and logs to a file.

Ricardo Signes asks:

Is there a reason, though, that the new documentation should not be encouraged to suggest three-arg open and anonymous filehandles as a default position?

And indeed, there is such a reason. Perl's print() is a fickle beast:

print FILEHANDLE LIST
print LIST
print Prints a string or a list of strings.  Returns true if successful.

If I'm teaching someone new to programming how to write to a file, I'd like to show them the way that is most likely to result in a meaningful error when they mistakenly drop in an extra comma. Stuffing filehandles inside scalars doesn't accomplish that goal.

Christian Walde points out that we need to remember the intended audience of perlopentut:


'perlopentut' is a tutorial, a teaching aid that guides by example. As such i think it is fair to assume that it is meant to be aimed at newbies.


And I think it's fair to assume that newbies will forget that it is

print(FH @list);

and not:

print(FH, @list);

Particularly when you start having newbie code like this:

print $fh $var1, ',', $var2, ',', "\n";

Here's a version of the original program snippet from above, revised to use bareword filehandles and the 3-arg form of open.

1 #!/usr/bin/perl
2 
3 use strict;
4 use warnings;
5 
6 open FH, '>', 'three-arg.txt' or die "three-arg.txt: $!\n";
7 print FH, "hello world\n";
8 close FH;


There you are: the benefits of the three-arg form open, the benefit of the bareword filehandle, and the drawbacks of both as well!

I think any revision of perlopentut to show 3-arg open should keep in mind that newbies will screw up their print() statements, and try to keep them from losing their minds over it. Sticking in the close() seems like a small price to pay in a tutorial if it helps wrong code get corrected faster.

No comments: