in looking into Python-based workflow systems to be used in highly parallelizable material science applications that will run on [Titan](http://en.wikipedia.org/wiki/Titan_(supercomputer)), we came across [ruffus](https://code.google.com/p/ruffus/)

----
there is a lot to wrap one's head around in this project so first i am just going to do a simple little experiment to see how it will coordinate activities on my local machine

----
after that i think i should figure out how to simulate getting it to run on multiple nodes. without access to a supercomputer, I am not sure how I'll do this.

----
since we'll be interfacing with [SGE](http://en.wikipedia.org/wiki/Oracle_Grid_Engine) and ruffus supposedly offers support for SGE, I think we'll be in good shape. i just need to understand how it works instead of expecting magic

----
i think i'll do a fractal image generation as a problem to solve and test this out

----
I'll probably borrow almost exclusively from [James Tauber's work on Littlewood](https://thoughtstreams.io/jtauber/littlewood-fractals/)

----
The first thing I am going to do is [fork littlewood](https://github.com/eldarion/littlewood) and then create a [parallel](https://github.com/eldarion/littlewood/tree/parallel) branch.

Here is where I'll add my work to extend things to have parallelization via ruffus.

----
Starting `parallel.py` the first thing I'll do is pull in parts of `roots.py` and `heatmap.py` and pull every thing into discrete functions.

----
There is parity now (verified by running) but I did remove the writing roots to a file and instead keeping them in memory. My thinking is that this will make it easier to parallelize the roots generation. We'll see.

----
I have replaced the normal sequential run of the top level `roots_for_degree` and `heatmap` functions with making them tasks and running them in the `ruffus.pipeline_run`.

Still no parallelization. That's next. This just simulates exactly how the script ran before.

Interesting thing to note, that my previous runs of 16 degree, size 200 fractals, took approx. 33-36 seconds. Now replaced to run in the pipeline they take 45 seconds. Seems like a good bit of overhead, but perhaps it isn't linear, and will more than make up for running roots in parallel.

----
Finished the [conversion](https://github.com/eldarion/littlewood/blob/parallel/parallel.py), running with multiple processes certain speeds up the roots calculation a bit, but for my Mac Book Air, with only 2 cores, it's hard to see that much of a difference.

I wonder what it would do on larger machines with more CPU and more cores. Adjust how many processes to run in parallel by changing the multiprocessing argument at the bottom of the file.


----
The next phase of this research is figuring out how ruffus interfaces and spawns work off to SGE.

----
Picking up on this adventure, I am going to try to use [SAGA-python](http://saga-project.github.com/bliss/) as an abstraction layer of a different distributed execution environments to run external processes.

----
I had originally thought SAGA-python was on googlecode which I was a bit dismayed by because I already see a change that I want to contribute back in the code. So happy to learn it's on Github.

----
First step is going to be pulling out:

https://gist.github.com/paltman/79c47bd60ed674af4116

Into a stand alone script and then replace the internals of that function with a SAGA execution of that new script.

----
Working with SAGA-python is very hard.

----
Finally got remote execution via SAGA/ssh with SFTP transfers in a ruffus job, but it was painful.

Also, it seems very flakey. 

----
Flakey in the since that i am getting IO size mismatch errors that are non-deterministic. My connection is via localhost so I would not expect any network issues.

----
Ok, the flakiness was all my doing.

I had copy and pasted from some example code and didn't correct for my particular use case.

----
I do wonder if these remote notes will be prepared with a run time environment, or will I be responsible for bootstrapping all of them.

----
Another thing to sort out is how to choose a node.

----
With a PBS cluster, you can queue the task, so presumedly the cluster is choosing the node for you.

----
I guess the point of "grid" computing is that you don't pick a node, but rather you address a cluster.

----
I do wonder how you bootstrap nodes in your cluster. Surely we don't have to bootstrap an execution environment every time.

----
I'd like for it to be a split between something like providing an image with a virtualenv prepped with numpy and any other lib that might be needed, but then the code that needs to execute we copy over at run time like:

[https://github.com/eldarion/littlewood/blob/parallel/parallel.py#L74](https://github.com/eldarion/littlewood/blob/parallel/parallel.py#L74)

----
If this is the case, this perhaps when [I do this](https://github.com/eldarion/littlewood/blob/parallel/parallel.py#L78) it will be fine to be hard coded.

----
I plan on going through the SAGA-python interface, but reading about [SGE](http://gridscheduler.sourceforge.net/howto/basic_usage.html) makes me wonder just how this bootstrapping is done.

----
Looking at [StarCluster](http://star.mit.edu/cluster/) by MIT to help provision clusters on EC2 for testing.

This [screencast](http://www.youtube.com/watch?feature=player_embedded&v=vC3lJcPq1FY) captures what looks to be a pretty awesome tool for launching SGE clusters on EC2

----
So I got a 10-node cluster fired up with little problem.

----
Now it's just getting the SGE plugin in SAGA-python working. So far just all kinds of SSH connection issues.