I am a big fan of John Cook’s blog, and as a result I’ve been intrigued about using Python for data analysis since his review of Wes McKinney’s Book of the same name (Wes wrote the Pandas module).
I was, however, unprepared for the trial of setting up the Python suite that forms the core of the scientific python workbench.
Obtaining python itself is easy, once you have homebrew installed (for a tutorial, see here).
It didn’t link for me, so it took two steps — but there are four steps below, as first I check up on brew (if you don’t see “Your system is raring to brew” following $ brew doctor then follow the doc’s instructions until you do), and then install and link up Python:
$ brew update $ brew doctor $ brew install python $ brew link python
Finally, I edited my PATH as follows, to point python to my installed packages:
$ sudo vi /etc/paths
adding /usr/local/share/python to the bottom of the list. And I added `PATH=/usr/local/share/python:$PATH` to my .zshrc:
$ vi .zshrc
So far so simple.
Getting the scientific kit down is where things can potentially get frustrating — though it must be said that the python community is extremely helpful. The first thing to do is to remove the version of numpy that you almost certainly already have installed. To find out where it is, open your python interpreter ($ python) and do the following:
>>> import numpy >>> print numpy.__file__
If you are able to import `numpy`, the second line will tell you where it lives. Yours is probably something like mine:
$ cd /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/ $ sudo rm -r numpy
The first thing you are going to want is a fortran compiler.
$ brew install gfortran
Next, we will pull down a few dependencies — these are straightforward:
$ pip install readline $ pip install nose
Both `easy_install` and `pip` are likely to fail when installing `numpy` and `scipy`, so we need to do something else. Thankfully, the ever-helpful python community has crafted a nice work around. To make it work, we need to link ourselves into alternative homebrew formulas:
$ brew tap samueljohn/python $ brew install numpy $ brew install scipy
After a bit of waiting, you ought to have both `numpy` and `scipy` installed. You can test these out with:
$ brew test numpy --verbose $ brew test scipy --verbose
Don’t worry if there are a small number of fails and skips (I’ve never obtained a perfect install and it’s not yet been a problem). Now for the plotting functions — everyone seems to like `matplotlib`, so that’s what we’ll take. Due to a bug, they say that this is best done from sources:
$ cd $HOME $ git clone https://github.com/matplotlib/matplotlib.git $ cd matplotlib $ python setup.py build $ python setup.py install
Now for the tricky bit — getting `IPython` up and running. First, we want to install `qt`. This post notes a few approaches — with a bit of luck, you should be able to just brew it down:
$ brew install qt
Obtaining `pyqt` may be a little tricky. To get it to jag, I had to give my `sip` a shake.
$ brew upgrade sip $ brew unlink sip && brew link sip
As of OSX 10.8.2 (mountain lion) I require PySide.
$ brew install PySide
and now:
$ brew install pyqt
Next, `pip` down IPython:
$ pip install ipython
… and Cython:
$ pip install Cython
Brew up some zeromq:
$ brew install zeromq
Nearly there … just `pip` down `pyzmq’, `pygments`, `tornado`, ‘python-dateutil`, and `pandas`:
$ pip install pyzmq $ pip install pygments $ pip install tornado $ pip install python-dateutil $ pip install pandas
now test to see if it all works with:
$ ipython qtconsole
With any luck, `import pandas` will work just fine. You now have some fairly heavy duty kit for crunching some data.
This is your first post after Christmas.
you need serious help!!!
there’s some truth to that — i need to find a way to relax that actually involves more relaxing :) i have been thinking about one that uses the new skills — any pet ideas you think would make a good little project? I have been thinking about seasonality.
Not sure if you have tried python xy (http://code.google.com/p/pythonxy/)?
Its a lot simpler than what you have above – basically you download their whole distribution and then it installs all of the components for you. More importantly someone else has done all the hard work to make sure all of the components “play nicely” with each other…
No, i hadn’t … Totally unaware. Thanks for the link