One of the tasks associated with bring up an ACI is configuring it for a particular group of researchers. There are (at least) three steps that have to be taken:
- create user accounts
- make required statistical software available
- copy the specified dataset(s) to the ACI
My plan was to create an archive containing the following pieces:
- a list of users for whom accounts needed to be created
- a list of statistical packages to be activated
- an inner archive containing the dataset(s)
- a script which would use the two lists to create accounts and activate software, and then expand the dataset archive
The ACI creation script would then pass that archive to the instance via the '--user-data-file' option of the 'ec2-run-instances' script. At first run, the ACI will fetch the user data, unpack it, and execute the included script.
Unfortunately, there is a 16,384-byte limit on the size of the user data, which meant that this approach was not practical.
My current idea is to create two archives. The main archive is as described above; the second is a bootstrap archive that will contain:
- a pointer to a location from which the main archive should be fetched
- a set of credentials that can be used to do that fetching
- a script that can use those credentials to fetch the specified archive
The bootstrap archive will be small enough to be passed to the instance via the '--user-data-file' option of the 'ec2-run-instances' script. At first run, the ACI will fetch the bootstrap archive, unpack it, and execute its included script, fetching the main archive. Once the main archive has been fetched, the credentials will be destroyed, the just-fetched archive will be unpacked, and its included script executed.
I'm working on that process now.