Data Enclave in the Cloud: February 2010

One of the tasks associated with bring up an ACI is configuring it for a particular group of researchers. There are (at least) three steps that have to be taken:

create user accounts
make required statistical software available
copy the specified dataset(s) to the ACI

My plan was to create an archive containing the following pieces:

a list of users for whom accounts needed to be created
a list of statistical packages to be activated
an inner archive containing the dataset(s)
a script which would use the two lists to create accounts and activate software, and then expand the dataset archive

The ACI creation script would then pass that archive to the instance via the '--user-data-file' option of the 'ec2-run-instances' script. At first run, the ACI will fetch the user data, unpack it, and execute the included script.

Unfortunately, there is a 16,384-byte limit on the size of the user data, which meant that this approach was not practical.

My current idea is to create two archives. The main archive is as described above; the second is a bootstrap archive that will contain:

a pointer to a location from which the main archive should be fetched
a set of credentials that can be used to do that fetching
a script that can use those credentials to fetch the specified archive

The bootstrap archive will be small enough to be passed to the instance via the '--user-data-file' option of the 'ec2-run-instances' script. At first run, the ACI will fetch the bootstrap archive, unpack it, and execute its included script, fetching the main archive. Once the main archive has been fetched, the credentials will be destroyed, the just-fetched archive will be unpacked, and its included script executed.

I'm working on that process now.

Data Enclave in the Cloud

Tuesday, February 23, 2010

Customizing an ACI

Search This Blog

Followers

Blog Archive