python for data analysis (numpy, scipy, pandas, ipython)

Posted by

I am a big fan of John Cook’s blog, and as a result I’ve been intrigued about using Python for data analysis since his review of Wes McKinney’s Book of the same name (Wes wrote the Pandas module).

I was, however, unprepared for the trial of setting up the Python suite that forms the core of the scientific python workbench.

Obtaining python itself is easy, once you have homebrew installed (for a tutorial, see here).

It didn’t link for me, so it took two steps — but there are four steps below, as first I check up on brew (if you don’t see “Your system is raring to brew” following $ brew doctor then follow the doc’s instructions until you do), and then install and link up Python:

$ brew update
$ brew doctor
$ brew install python 
$ brew link python

Finally, I edited my PATH as follows, to point python to my installed packages:

$ sudo vi /etc/paths

adding /usr/local/share/python to the bottom of the list. And I added `PATH=/usr/local/share/python:$PATH` to my .zshrc:

$ vi .zshrc

So far so simple.

Getting the scientific kit down is where things can potentially get frustrating — though it must be said that the python community is extremely helpful. The first thing to do is to remove the version of numpy that you almost certainly already have installed. To find out where it is, open your python interpreter ($ python) and do the following:


>>> import numpy

>>> print numpy.__file__

If you are able to import `numpy`, the second line will tell you where it lives. Yours is probably something like mine:


$ cd /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/ 

$ sudo rm -r numpy

The first thing you are going to want is a fortran compiler.


$ brew install gfortran

Next, we will pull down a few dependencies — these are straightforward:


$ pip install readline

$ pip install nose

Both `easy_install` and `pip` are likely to fail when installing `numpy` and `scipy`, so we need to do something else. Thankfully, the ever-helpful python community has crafted a nice work around. To make it work, we need to link ourselves into alternative homebrew formulas:


$ brew tap samueljohn/python

$ brew install numpy

$ brew install scipy

After a bit of waiting, you ought to have both `numpy` and `scipy` installed. You can test these out with:


$ brew test numpy --verbose

$ brew test scipy --verbose

Don’t worry if there are a small number of fails and skips (I’ve never obtained a perfect install and it’s not yet been a problem). Now for the plotting functions — everyone seems to like `matplotlib`, so that’s what we’ll take. Due to a bug, they say that this is best done from sources:


$ cd $HOME
$ git clone https://github.com/matplotlib/matplotlib.git
$ cd matplotlib
$ python setup.py build
$ python setup.py install

Now for the tricky bit — getting `IPython` up and running. First, we want to install `qt`. This post notes a few approaches — with a bit of luck, you should be able to just brew it down:

$ brew install qt

Obtaining `pyqt` may be a little tricky. To get it to jag, I had to give my `sip` a shake.

$ brew upgrade sip
$ brew unlink sip && brew link sip

As of OSX 10.8.2 (mountain lion) I require PySide.

$ brew install PySide

and now:

$ brew install pyqt

Next, `pip` down IPython:

$ pip install ipython

… and Cython:

$ pip install Cython

Brew up some zeromq:

$ brew install zeromq

Nearly there … just `pip` down `pyzmq’, `pygments`, `tornado`, ‘python-dateutil`, and `pandas`:

$ pip install pyzmq
$ pip install pygments
$ pip install tornado
$ pip install python-dateutil
$ pip install pandas

now test to see if it all works with:

$ ipython qtconsole

With any luck, `import pandas` will work just fine. You now have some fairly heavy duty kit for crunching some data.

4 comments

    1. there’s some truth to that — i need to find a way to relax that actually involves more relaxing :) i have been thinking about one that uses the new skills — any pet ideas you think would make a good little project? I have been thinking about seasonality.

  1. Not sure if you have tried python xy (http://code.google.com/p/pythonxy/)?

    Its a lot simpler than what you have above – basically you download their whole distribution and then it installs all of the components for you. More importantly someone else has done all the hard work to make sure all of the components “play nicely” with each other…

Comments are closed.