Wednesday, November 9, 2011

Creating and maintaining Perl command-line scripts

image
In bioinformatics 90% of Perl scripts are command-line scripts and they should be written fast (because our bosses want results at the end of the day). Of course, Perl is perfect for this task, also called "quick and dirty" scripts and you may think will be fine create a messy script that will be used just once, right?

But a problem with "quick and dirty" scripts is that they usually live more than one run. Sometimes it becomes part of your mainstream pipeline and maintain that script is a pain.
Well, so how can we avoid unmaintainable command-line scripts?



The answer is: CREATE A HABIT.

So, you should always start a new script thinking that it will be a mainstream script. The problem with mainstream code is that we waste time thinking how to create a good code structure that will be extensible in the future. This is the main reason we always prefer create a "quick and dirty" script.

Instead of waste time thinking in a good structure for your script, why not use a framework for command-line script? That's the idea behind App::Cmd module.

This module allows to create toolkit scripts. For example, instead of create a script called 'create_dna_sequence.pl' and 'create_protein_sequence.pl' we could create a toolkit called 'create_sequence.pl' and call commands after like in:
[stratus@darkside blog]$ ./create_sequence.pl
Available commands:
commands: list the application's commands
help: display a command's help screen
dna: Create a DNA sequence
protein: Create a Protein sequence
[stratus@darkside blog]$ ./create_sequence.pl dna
GCATGCATGCATGCTAGCTAGCTAG
[stratus@darkside blog]$ ./create_sequence.pl protein
Required option missing: name
create_sequence.pl protein [-?n] [long options...]
-n --name Protein name
-? --usage --help Prints this usage information.
[stratus@darkside blog]$ ./create_sequence.pl protein --name myprot
>myprot
MGNCLYPVADDNSTKLAIKEDFLIDFP
view raw cmd1 hosted with ❤ by GitHub

But maybe you like to use Modern Perl style. Don't worry, there is MooseX::App::Cmd module that marries App::Cmd with MooseX::Getopt, so you can define command line options as Moose attributes.
I always use Moose and MooseX::Declare, so the code for a simple command-line script like the previous would be:

Really simple, right? And we can always expand the script adding more functions.

When I create a empty file with '.pl' extension in my VIm it always load the template below: So, even for simple scripts I have a well structured script that will be easy to maintain.

#!/usr/bin/env perl
use Moose;
use MooseX::Declare;
use 5.10.0;
# Define base class extending MooseX::App::Cmd
class MyApp
{
extends 'MooseX::App::Cmd';
}
# Create each toolkit command as a MyApp::Command::<COMMAND_NAME> class
class MyApp::Command::Foo
{
extends 'MooseX::App::Cmd::Command';
# Class attributes (program options - MooseX::Getopt)
has 'input_file' => (
is => 'rw',
isa => 'Str',
traits => ['Getopt'],
cmd_aliases => 'i',
required => 1,
documentation => 'Input file path',
);
# Description of this command in first help
sub abstract { 'Describe foo'; }
# method used to run the command
method execute ($opt,$args) {
say "Input file is " . $self->input_file;
}
}
# Run app
class main {
MyApp->run;
}
view raw cmd_template.pl hosted with ❤ by GitHub

No comments:

Post a Comment