Now it's just getting the SGE plugin in SAGA-python working. So far just all kinds of SSH connection issues.
So I got a 10-node cluster fired up with little problem.
I plan on going through the SAGA-python interface, but reading about SGE makes me wonder just how this bootstrapping is done.
I'd like for it to be a split between something like providing an image with a virtualenv prepped with numpy and any other lib that might be needed, but then the code that needs to execute we copy over at run time like:
I do wonder how you bootstrap nodes in your cluster. Surely we don't have to bootstrap an execution environment every time.
I guess the point of "grid" computing is that you don't pick a node, but rather you address a cluster.
With a PBS cluster, you can queue the task, so presumedly the cluster is choosing the node for you.
Another thing to sort out is how to choose a node.
I do wonder if these remote notes will be prepared with a run time environment, or will I be responsible for bootstrapping all of them.
Ok, the flakiness was all my doing.
I had copy and pasted from some example code and didn't correct for my particular use case.
Flakey in the since that i am getting IO size mismatch errors that are non-deterministic. My connection is via localhost so I would not expect any network issues.
Finally got remote execution via SAGA/ssh with SFTP transfers in a ruffus job, but it was painful.
Also, it seems very flakey.
Working with SAGA-python is very hard.
First step is going to be pulling out:
Into a stand alone script and then replace the internals of that function with a SAGA execution of that new script.
I had originally thought SAGA-python was on googlecode which I was a bit dismayed by because I already see a change that I want to contribute back in the code. So happy to learn it's on Github.
Picking up on this adventure, I am going to try to use SAGA-python as an abstraction layer of a different distributed execution environments to run external processes.
The next phase of this research is figuring out how ruffus interfaces and spawns work off to SGE.
Finished the conversion, running with multiple processes certain speeds up the roots calculation a bit, but for my Mac Book Air, with only 2 cores, it's hard to see that much of a difference.
I wonder what it would do on larger machines with more CPU and more cores. Adjust how many processes to run in parallel by changing the multiprocessing argument at the bottom of the file.
I have replaced the normal sequential run of the top level
heatmap functions with making them tasks and running them in the
Still no parallelization. That's next. This just simulates exactly how the script ran before.
Interesting thing to note, that my previous runs of 16 degree, size 200 fractals, took approx. 33-36 seconds. Now replaced to run in the pipeline they take 45 seconds. Seems like a good bit of overhead, but perhaps it isn't linear, and will more than make up for running roots in parallel.
There is parity now (verified by running) but I did remove the writing roots to a file and instead keeping them in memory. My thinking is that this will make it easier to parallelize the roots generation. We'll see.
i think i'll do a fractal image generation as a problem to solve and test this out
since we'll be interfacing with SGE and ruffus supposedly offers support for SGE, I think we'll be in good shape. i just need to understand how it works instead of expecting magic
after that i think i should figure out how to simulate getting it to run on multiple nodes. without access to a supercomputer, I am not sure how I'll do this.
there is a lot to wrap one's head around in this project so first i am just going to do a simple little experiment to see how it will coordinate activities on my local machine