[Perl] Read file and store data in arrays

I use c++ to do most of my analysis. Sometimes, I might want to do some small calculations on the results from c++, and it can be executed faster with a script. I've used bash script to get certain information in c++ output files. While bash script might be handy to use, it can only do integer arithmetic. I have two columns of data, in which I want to manipulate the values in the second column. After realizing that bash script wouldn't do floating-point calculation for me, perl became my next option. To be honest, I've never used it, never seen one single perl script before this evening. All the credits of my Perl script go to the forums on the web, and people who contributed to those threads. What I wanted to do is simple, but I found it is really hard to get a direct solution by googling around. I use pieces of instructions and examples from more than 10 webpages. My hope is that by putting together what I learnt tonight, newbies to Perl would understand how to:
  1. read in data from a file
  2. and store elements in arrays
  3. do simple calculations
  4. output result
Maybe the reason that there's no direct solution is that there's a much easier way out there than using Perl. But anyway, here's the script: Perl is similar to c++, one big difference is that for every variable, a "$" sign need to come before it. First ask user to input the file that will be read in:
   print "Enter the input file: ";
   my $filename = <STDIN>;
my is used to declare a variable locally. With the use of "use strict", if we use a variable without defining it, Perl will show an error message. <STDIN> is what being typed by the user at the command prompt. It stands for standard input, and can be abbreviated by using simple <>. One thing you might already noticed is that unlike bash script, Perl requires a ";" at the end of a command, which is the same as c++.

After getting the input file name, we first read in each line of the file:
   open (FILE, "$filename") || die "Cannot open file: $!";
   my @array = <FILE>;
   close(FILE)

Each line in the file will be stored in the array. In Perl, an array is defined with a @ sign, and a scalar is defined with a $ in front of the variable name:
   my $line
line will be used to store one value.
   my @ind
ind will be used as an array.

Now that we have each line stored in array, we want to extract the elements and assign them to array @ind and @msd:
   foreach $line (@array){
          chomp($line);
         ($blank,${ind[$index]},${msd[$index]})=split(/        /,$line);
         $index = $index + 1;
   }
In the foreach loop, we first use chomp function to remove the newline character in the string. When reading data from user or from a file, usually there's a newline character at the end of the string. My input file looks like this:
        1        3.5555
        2        4.0233
        3        3.5099
        ..        .........
        ..        ..........

There's a huge blank space before the first column (it has to do with the way my c++ code output the results). The split function allows you to define what the deliminator is in each line of your array. I was able to store each column in respective array by storing part of the blank space in front in another variable $blank, which I had no intention to use afterwards.

After storing the data into those arrays, I can do calculations with those numbers.
If I would have a c++ code to do the same thing, it'd take me just a couple of minutes. But I like the convenience that scripting language could give me: execute the program without compiling it. Compiling a code meaning to add an addition executable file to the folder. If you have more than one executable, you need to name them in a reasonable fashion so that you know which one does which function. Usually, in this case I just recompile the code so that I'm certain the executable I'm using is the right one. Using scripting program not only reduces the number of files in the folder, but also eliminates the hassle of compiling codes.

Maybe the awk function could do this a lot easier than what I'm doing here. I'll try to expand my horizon to this field .... sometime when I finish my ChemE degree at Penn State.

Why computer simulations?

  • Simulations are relatively simple, inexpensive, and everything can be measured in principle.
  • Not to reproduce experimental results (exception: testing the accuracy of potentials)
  • Help to understand experimental results or to propose new experiments
  • Test of theoretical predictions or theories
  • Investigating systems on a level of detail which is not possible in real experiments or analytical theories (local structure, mechanics .., etc.)
  • Create new materials
Computer simulations often use an atomistic approach to find answers to interesting effect, technical problems, and theoretical predictions. At atomistic level simulation, particles interacting with each other is described by two methods:
ab initio calculation:
- positions of atoms -> electronic
- structure of the system -> potential
- computationally very expensive
Effective potential:
- Forms of potential is ad hoc.
- Parameters are obtained from fitting experimental data or restuls from ab initio calculations
- Cheap
The bottleneck for effective potentials (classical MD simulation) is about the time scale and accuracy of the potential, which have to be overcome so that the simulation can develop in their full capacity.

LAMMPS - A free open-source MD package

LAMMPS stands for "Large-scale Atomic/Molecular Massively Parallel Simulator", which is a molecular dynamics simulation package distributed by National Sandia Lab. LAMMPS incorporate MPI so it could run in parallel or on single processor.

The reason that I chose LAMMPS to simulate my ionomers for several reasons:

  1. It could run coarse-grained/united-atom simulations - LAMMPS includes many widely used empirical potentials.
  2. All codes are written in C++, so I am able to go into the code and do modification - the dihedral potential that I use for my ionomers are not included in the potentials LAMMPS provides. So I modified a similar potential in LAMMPS to describe the dihedral interactions.
  3. Maintained and developed by experts - This is the most important. LAMMPS has a newer version almost every year. Many new useful functions/features are added and useful force fields are incorporated. It also has an very active mail-list.

To keep LAMMPS a simple and fast simulator, LAMMPS does not help you post-process data. This is very different from other MD package like Gromacs or NAMD, which performs analyses of your simulation. Because it doesn't have a GUI interface, so LAMMPS won't visualize your simulation either. There are a few tools that allow you to do some pre/post-process, but I found it is much easier to do analyses in my own C++ codes.

To know more specifics about LAMMPS, the official website has all you need to know, and is always up-to-date.

Contact me

Mail me or stop by at:
   115 Fenske Lab
   Pennsylvania State University
   University Park, PA 16802

Call me at my office:
   (814) 863-2879

Or just email me:
   kxl281[at]psu.edu