[Python] Read data into array

This article is a follow-up of my previous perl script to manipulate data in a file. After I posted my question on a social website, I've received a lot of people encouraging me to do it in Python.

Some advantages of Python over Perl are: easier to maintain, easier for other people to read, and even google programmers use it. And there is claim out there saying that if you know C++, you could easily pick up Python. (Alright, I have to admit that the main reason is that googlers use it ..)

So I started learning Python and trying to write a Python script to do the same task:

  1. read in data from a file
  2. and store elements in arrays
  3. do simple calculations
  4. output result

In this Python script, it also takes input file name in the argument. First we need to import two modules:

   import os
   import sys
The os module allows us to work with files and directories. The sys module enables use to take arguments.

   filename = sys.argv[1]
As in C++, the first argument (argv[0]) is the name of your script. Here we assign the variable filename to our input file.

   lines = [line.strip() for line in open(filename).readlines()]
This statement reads in lines from the file, and strip off the spaces before and after each line. I personally found this command pretty intuitive. One thing I need to get used to is the use of for ... in . This is how for loop is used in Python, which is very different from C++. Now each of the lines in the file are stored in lines. To access each element in it:
   for i in lines:
       row = i.split()
       id.append(int(row[0]))
       y.append(float(row[1]))
The split() function split each elements into respective items (row[0] and row[1] in this case). The split() function can take delimiters, i.e. split('\t'). However I found it works the best just leaving it blank. The id and y arrays are used to store values in each column. We need to claim them before they are being used:
  id = []
  y  = []
The [] denotes empty array.

Once we have the values stored in arrays, we can easily manipulate them. To output the results, we could use the open function:

  output = open("output.txt",'w')
  for i in range(len(y))
        output.write("%2d\t %12f\n" %(id[i],y[i]))
  output.close
The % sign helps us output the format we want.

I might spend more time writing this Python script than I did my for perl one, but it is easier to follow. And also because the indent rules python requires, so the code looks more neat.

Hooray! My first Python code!!

[Perl] Read file and store data in arrays

I use c++ to do most of my analysis. Sometimes, I might want to do some small calculations on the results from c++, and it can be executed faster with a script. I've used bash script to get certain information in c++ output files. While bash script might be handy to use, it can only do integer arithmetic. I have two columns of data, in which I want to manipulate the values in the second column. After realizing that bash script wouldn't do floating-point calculation for me, perl became my next option. To be honest, I've never used it, never seen one single perl script before this evening. All the credits of my Perl script go to the forums on the web, and people who contributed to those threads. What I wanted to do is simple, but I found it is really hard to get a direct solution by googling around. I use pieces of instructions and examples from more than 10 webpages. My hope is that by putting together what I learnt tonight, newbies to Perl would understand how to:
  1. read in data from a file
  2. and store elements in arrays
  3. do simple calculations
  4. output result
Maybe the reason that there's no direct solution is that there's a much easier way out there than using Perl. But anyway, here's the script: Perl is similar to c++, one big difference is that for every variable, a "$" sign need to come before it. First ask user to input the file that will be read in:
   print "Enter the input file: ";
   my $filename = <STDIN>;
my is used to declare a variable locally. With the use of "use strict", if we use a variable without defining it, Perl will show an error message. <STDIN> is what being typed by the user at the command prompt. It stands for standard input, and can be abbreviated by using simple <>. One thing you might already noticed is that unlike bash script, Perl requires a ";" at the end of a command, which is the same as c++.

After getting the input file name, we first read in each line of the file:
   open (FILE, "$filename") || die "Cannot open file: $!";
   my @array = <FILE>;
   close(FILE)

Each line in the file will be stored in the array. In Perl, an array is defined with a @ sign, and a scalar is defined with a $ in front of the variable name:
   my $line
line will be used to store one value.
   my @ind
ind will be used as an array.

Now that we have each line stored in array, we want to extract the elements and assign them to array @ind and @msd:
   foreach $line (@array){
          chomp($line);
         ($blank,${ind[$index]},${msd[$index]})=split(/        /,$line);
         $index = $index + 1;
   }
In the foreach loop, we first use chomp function to remove the newline character in the string. When reading data from user or from a file, usually there's a newline character at the end of the string. My input file looks like this:
        1        3.5555
        2        4.0233
        3        3.5099
        ..        .........
        ..        ..........

There's a huge blank space before the first column (it has to do with the way my c++ code output the results). The split function allows you to define what the deliminator is in each line of your array. I was able to store each column in respective array by storing part of the blank space in front in another variable $blank, which I had no intention to use afterwards.

After storing the data into those arrays, I can do calculations with those numbers.
If I would have a c++ code to do the same thing, it'd take me just a couple of minutes. But I like the convenience that scripting language could give me: execute the program without compiling it. Compiling a code meaning to add an addition executable file to the folder. If you have more than one executable, you need to name them in a reasonable fashion so that you know which one does which function. Usually, in this case I just recompile the code so that I'm certain the executable I'm using is the right one. Using scripting program not only reduces the number of files in the folder, but also eliminates the hassle of compiling codes.

Maybe the awk function could do this a lot easier than what I'm doing here. I'll try to expand my horizon to this field .... sometime when I finish my ChemE degree at Penn State.

Why computer simulations?

  • Simulations are relatively simple, inexpensive, and everything can be measured in principle.
  • Not to reproduce experimental results (exception: testing the accuracy of potentials)
  • Help to understand experimental results or to propose new experiments
  • Test of theoretical predictions or theories
  • Investigating systems on a level of detail which is not possible in real experiments or analytical theories (local structure, mechanics .., etc.)
  • Create new materials
Computer simulations often use an atomistic approach to find answers to interesting effect, technical problems, and theoretical predictions. At atomistic level simulation, particles interacting with each other is described by two methods:
ab initio calculation:
- positions of atoms -> electronic
- structure of the system -> potential
- computationally very expensive
Effective potential:
- Forms of potential is ad hoc.
- Parameters are obtained from fitting experimental data or restuls from ab initio calculations
- Cheap
The bottleneck for effective potentials (classical MD simulation) is about the time scale and accuracy of the potential, which have to be overcome so that the simulation can develop in their full capacity.

LAMMPS - A free open-source MD package

LAMMPS stands for "Large-scale Atomic/Molecular Massively Parallel Simulator", which is a molecular dynamics simulation package distributed by National Sandia Lab. LAMMPS incorporate MPI so it could run in parallel or on single processor.

The reason that I chose LAMMPS to simulate my ionomers for several reasons:

  1. It could run coarse-grained/united-atom simulations - LAMMPS includes many widely used empirical potentials.
  2. All codes are written in C++, so I am able to go into the code and do modification - the dihedral potential that I use for my ionomers are not included in the potentials LAMMPS provides. So I modified a similar potential in LAMMPS to describe the dihedral interactions.
  3. Maintained and developed by experts - This is the most important. LAMMPS has a newer version almost every year. Many new useful functions/features are added and useful force fields are incorporated. It also has an very active mail-list.

To keep LAMMPS a simple and fast simulator, LAMMPS does not help you post-process data. This is very different from other MD package like Gromacs or NAMD, which performs analyses of your simulation. Because it doesn't have a GUI interface, so LAMMPS won't visualize your simulation either. There are a few tools that allow you to do some pre/post-process, but I found it is much easier to do analyses in my own C++ codes.

To know more specifics about LAMMPS, the official website has all you need to know, and is always up-to-date.

Contact me

Mail me or stop by at:
   115 Fenske Lab
   Pennsylvania State University
   University Park, PA 16802

Call me at my office:
   (814) 863-2879

Or just email me:
   kxl281[at]psu.edu

Send attachment using mutt

Update [7/27/2011]: It doesn't work well with file containing texts (i.e. .cpp files). When I send a code or a data file, the file I received in the Attachment folder has texts all messed up.

I'm trying to use gnuplot to generate plots when I'm working on the cluster.  While both my laptop and PC at the office are Window OS, the most common way is to use file transfer software such as WinSCP. Although WinSCP is pretty handy to use, it is sometimes a pain in the ass to progress through many sub folders and then download it to the correct place on the Windows machine.

Most of the times the plot downloaded goes directly onto a powerpoint slide where I share my progress with my advisor through dropbox.  So I came up with this idea: Send the image as attachment to dropbox! This requires you to have a dropbox account and a email software installed on the remote linux machine.

I use mutt to send the attachment to my dropbox folder.  Dropbox has this function called: Send to dropbox. You'll be able to create an email address that will link to a folder called "Attachment".  Once you set up this "send to dropbox", the Attachment folder will be created. Below is a short script to send a file to my dropbox:

This script takes one argument: The path to the file you want to transfer to your dropbox ($1). It usually takes about 30 seconds until dropbox receives the file.  The time for using WinSCP and using this script are pretty much the same.  I'm happy about this script simply because it allows me not to move my hand over to the mouse.  After all, I'm more a key-board person.

Create/Remove shortcuts in Linux

To access a sub-sub-sub-sub-..... folder every time you log on to do your work might be a pain in the ass.  Although combining the first letter of consecutive folder names and 'tab' can make it faster.  If you are accessing the specific folder frequently, there is a way to save the hustle: Symbolic links. Symbolic links are similar to the shortcuts on Windows OS. It creates a link pointing to a file or a folder.

To create a symbolic link:
   ln -s [file/folder to link] [path of the link]
For example, to create a shortcut at my home directory that points to a folder ~/group/kxl281/Lammps/PEO/LJC/10/prog/dielectric_const/, use the syntax:
   ln -s ~/group/kxl281/Lammps/PEO/LJC/10/prog/dielectric_const/ ~/

The ~/ stands for the home directory. Now, I don't have to navigate to the original path to work on the dielectric_const folder. I could simply access it through the link.
The symbolic link also works for files. You could make a shortcut to access the file without making a duplicate.

To remove a symbolic link:
   rm [filename]


Note that if the symbolic link points to a folder, there shouldn't be a '/' at the end of the file name. For example, if I want to remove the shortcut I created above pointing to the dielectric_const folder:
   rm dielectric_const
without the slash (/).

Electrolytes in Lithium ion batteries

Lithium ion battery have been widely used in electronic gadgets due to its high energy/power density. Current electrolytes for Li-ion batteries are mostly organic solvents. Although they are good conducting medium for Li ion, the organic solvents are toxic and volatile. Besides, those liquid electrolytes require hard casing and separator to ensure good contacts among battery elements, therefore the flexibility of the battery as well as the gadget is limited. The problem aforementioned can be solved if a non-toxic polymer electrolyte is used.

PEO-salt electrolytes have two drawbacks: 1) ionic conductivity is small, and 2) concentration polarization in PEO degrades cell performance after several charge/discharge cycle. It reduces the capacity of the battery by creating depletion regions near electrodes. Many studies focused on improving the conductivity of polymer electrolytes. Using larger anions to decrease the attraction between anion and cation, therefore resulting in higher ratio of solvated Li ion in PEO. Another highly focused research area is to dope ceramic fillers to provide extra Lithium hopping sites.

Our research collaboration focuses on the second drawback. The material in which only cations conduct is called a single-ion conductor. In single-ion conductors, anions are covalently bound to the polymer backbone, therefore the transference number is unity. By reducing anion mobility, single ion conductors prevent concentration polarization and improve the lifetime of batteries. The single-ion conductor is a type of ionomer. In contrast to most ionomers, an ionomer with PEO as the non-ionic part can solvate the cations. The conductivity for PEO-based ionomer is nevertheless less promising, and people haven't paid much attention on this material.
Our collaboration aim to understand the morphology, structure and dynamics of PEO-based Li, Na, and Cs samples. By systematically varying the PEO spacer length and ion type, logistic analyses could be made.

Courtesy of Kokonad Sinha
                 Animation depicts Lithium ion hopping via PEO segmental motion

Molecular dynamics simulation

MD (molecular dynamics) simulation is a computational technique that describes the movement of atoms based on their interactions.The interactions among atoms can be described by empirical potentials (force field) or quantum chemical models, or a mix of the two. The most commonly seen MD in polymer or bio- physics uses force fields, and follows classical mechanics: solving Newton's equations of motion numerically to get the atom trajectory. By tracking where the atoms have been, structural and dynamical information can be extracted.

This type of classical MD has force field describing both intra- and inter-molecular interactions. Usually the force field refers to a set of empirical equations to describe the most essential part of the interaction. For example, the bonding between two atoms are often described in harmonic potential:

Bonding potential = K ( x - x0

Depending on the complexity of the potentials, MD simulation could achieve different levels of realism. The force fields that use harmonic equation for bonding is capable of agreeing to some extent the structural and dynamical information, but incapable of describing any chemical reactions. Ab initio based MD simulation could provide very accurate electron distribution and bond breaking/forming, but the trade of is that the large amount of computational time required.

It is important to know what properties one's interested in finding out using MD simulation, and choose the force field accordingly. There is always a compromise between accuracy and time/length scales of MD simulation.


Read more about MD: MD simulation on wikipedia

About Me

Hi, my name is Kan-Ju Lin. I am currectly a Ph.D candidate in Chemical Engineering department at Penn State. My advisor, Dr Janna Maranas, leads our group investigating molecular mobility in soft materials. While our group does both experimental and computaional stuides, my research is mainly computational, with insights provided from experimentalists.

My research focuses on ion motion and transport mechanism in solid state polymer electrlytes (SPEs), especially for battery application. I use molecular dynamics simulation to investigate polymer-cation correlation at molecular level. By collaborating with research groups from Penn State and UPenn, we conduct our investigations via various approaches: X-ray scattering, dielectric spectroscopy, NMR, FTIR, ab initio, quasi-elastic neutron scattering (QENS), and MD simulation. These techniques provide information a wide range of length scales and time scales, and also reveal phenomena from different perspectives. Connecting results among different techniques, and come up with comprehensive pictures on these materials are the main challenges/fun of my research.

While most studies on SPEs are devoted to PEO/salt systems, our research collaboration focuses on PEO-based ionomers. The major advantage of ionomers over PEO/salt is to prevent electrolyte concentration polarization, and consequently improves the lifetime of the batteries. For more about ionomers, please see the ionomer session.

To learn more about me and my research, check out the links above. If you would like to discuss the techniques I use and my research, or any comment, feel free to contact me at kxl281[at]psu.edu.

Resume

Education

Sep. 2007~ Present The Pennsylvania State University, University Park, PA 16801
- Doctoral candidate in Chemical Engineering
Sep. 2002~ July 2006 National Tsing Hua University, Hsin-Chu 30013, Taiwan
- Bachelor of Science in Chemical Engineering

Research/working experience

Jan. 2008~ Present Graduate Research Assistant - Penn State
Jan. 2011~ May 2011
Sep. 2009~ Dec. 2009
Graduate Teaching Assistant - Penn State
Summer 2009,2010,2011 Graduate Mentor - Penn State
Sep. 2006~ Mar. 2007 Research Assistant (Computer-Aided Engineering Lab)
-National Tsing Hua Univeristy, Hsin-Chu, Taiwan
Mar. 2006~ July 2006 Research Intern (Energy & Resource Lab)
-Industrial Technology Research Insititute, Hsin-Chu, Taiwan

I have four years of experience working with Linux and programming. I developed my own codes in C++ to analyze the simulation trajectories. The simulations are carried out on Penn State high performance computing systems using LAMMPS. All the communications between me and the clusters are via Linux systems.

To know more about me, please download the complete resume.