Setting Up¶
Configuration¶
TuCluster comes with a default configuration suitable for development, but for running in production
you will want to create a configuration file. This is a simple JSON file. You should create an environment
variable called TUCLUSTER_CONFIG
which points to its’ location.
Tucluster will override the default configuration with the contents of this file.
Example config file:
{
"MONGODB": {
"db": "tucluster",
"host": "127.0.0.1",
"port": 27017
},
"MODEL_DATA_DIR": "/path/to/data/dir",
"TUFLOW_PATH": "tuflow",
"ANUGA_ENV": "anuga"
}
Here is an explanation of the keys in the above config:
MONGODB: | This defines the database connection details. When you installed mongodb, you should’ve
created a database; in the above configuration we have called this “tucluster”.
If you secured your database, you can add a |
---|---|
MODEL_DATA_DIR: | This is path to the root directory to where all model data should be put. This is the user uploaded input data and the result data created by e.g. tuflow. Therefore, you should ensure there is enough space to support your modelling needs. All the nodes in your cluster need to be able to access this path, so you will typically use some form of distributed file system. You should choose a file system with the lowest possible latency and that is easily scaled as your model output grows. Once Tucluster is running, you should never interact directly with this folder. TuCluster will manage the storage, upload and download of files. Manually adding/removing files could cause exceptions in TuClusters’ execution. |
TUFLOW_PATH: | Path to the Tuflow executable to use on each worker node. This implies that it should be the same on each node. If Tuflow is available globally on the system PATH, just enter the executable name |
ANUGA_ENV: | Name of the conda environment in which ANUGA is installed. |
Running TuCluster¶
You should use a production-ready web server such as GUnicorn to run tucluster. Running with GUnicorn is easy:
gunicorn tucluster.app
This will run tucluster on port 8000. You would normally configure a proxy such as NGinx to allow external requests.
You will then need to run the tucluster celery app on each worker node:
celery -A qflow worker -l info
Run the above command on each server which you wish to execute tuflow models (remember; you will need to install tucluster on each of these nodes aswell!).
You are now ready to start interacting with TuCluster
Note on Using ANUGA¶
Running python scripts on the server is potentially dangerous and you should ensure that your webserver
runs as a non-root user and restrict its permissions. Essentially, the only place it needs to be able to write files
is the directory you set as the MODEL_DATA_DIR
in your configuration file.
Developers of ANUGA scripts should be aware of these restrictions and should set the datadir
appropriately.
The best advice here is to set it relative to the script file. E.g:
domain.set_datadir(os.path.dirname(__file__), 'results')
ANUGA scripts should also be carefully developed so they do not hang. E.g. causing the script to display a matplotlib figure or other GUI elements will prevent the script from terminating until the window has been manually closed. You should not do this! Any figures should be output directly to file.