An early version of JupyterHub is now available within the SAIL Databank for a limited number of users. As our collective experience of working with JupyterHub evolves and our technical offering, this guide will get updated in line with it.
This is a development piece of software, and these have risks associated with them. You will need to manually back up your notebooks to Git or by downloading them to your SAIL desktop.
Table of Contents |
---|
Getting started
- Note: Access to JupyterHub is only available for users on projects that have paid for additional processing power.
- If you have requested JupyterHub and this has been confirmed you will need to log on to the SAIL Gateway as normal. Once within the Windows 10 environment, open up your Browser and navigate to - httphttps://jupyterhub.sail.k8spk.chi.swanserp.ac.uk/
- Click the orange ' login with Keycloak' button and follow the instructions. Your login is the same as the one you use to log into the SAIL Gateway.
- Following this, you will be logged into JupyterHub and will see a list of notebooks to choose from. Assuming that you are on a GPU project you will see a minimum of 3 options:
- The first is a basic (non-GPU) notebook. This has python and R kernels installed, and also allows you to launch VS Code and RStudio from within the notebook if you prefer a more fully-featured IDE.
- The second is a notebook with the same features as the first that will attach itself to a GPU, but that does not contain any GPU drivers or related python libraries. You will have to install all of your own GPU drivers from within the notebook if you select this. We do not recommend the use of this option, and it will likely be removed in a future release.
- The third is a GPU-attached notebook with CUDA 1112.6.2, Tensorflow, and other common python ML libraries preinstalled. It is configured to automatically surface your specific project GPU to Tensorflow within the notebook. This notebook notebook only supports Python. It also includes VS Code and Tensorboard, as well as an extension for monitoring your GPU resource usage.
- If you are on more than 1 GPU project (e.g. project 1234 with GPU and project 1653 with GPU) you will see separate options in the notebook image list for each project. In this case, the list will look something like this:
- Standard Jupyter notebook
- Standard Jupyter notebook with GPU for project 1234
- GPU-enabled Jupyter notebook with GPU for project 1234
- Standard Jupyter notebook with GPU for project 1653
- GPU-enabled Jupyter notebook with GPU for project 1653
...
- Open a Terminal window from the homepage.
- If you see a line like "bash: __conda_exe: command not found" at the top of the Terminal window, type "conda init", close the Terminal window, and then launch it again from the homepage.
- Create a new conda environment to install your package in,
conda create --name myshinynewenv nb_conda_kernelsipykernel ibm_db
*UPDATE 02/08/2023: If you want to connect to DB2 you need to install the ibm_db package at the same time as the ipykernel package due to conflicting Python minor versions*
- Activate your new environment,
conda activate myshinynewenv
- Install your package, e.g.
conda install -c conda-forge recordlinkage
- You can also install from pip in the traditional way, ensuring you activate the environment (step 3) that you want to install the package into first.
- When you close the Terminal window and return to the homepage you should see a new python kernel with the same name as your new conda environment.
...
There are two options for moving files from your SAIL desktop and into your Jupyterhub Notebook.
- If the file is under 8mb 8gb you can simply drag and drop it into the browser window.
- Or you can sync it via GitLab (very much the recommended option).
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
import ibm_db import ibm_db_dbi import pandas as pd db = 'PR_SAIL' hostname = 'db2.database.ukserp.ac.uk' port = '60070' protocol = 'TCPIP' uid = 'YOUR USERNAME HERE' pwd = 'YOUR PASSWORD HERE' security = 'ssl' ssl_client_keystoredb = '/db2conndb2-connection/chi.kdb' ssl_client_keystash = '/db2conndb2-connection/chi.sth' conn_str = ("DATABASE={0};" "HOSTNAME={1};" "PORT={2};" "PROTOCOL={3};" "Security={4};" "UID={5};" "PWD={6};" "SSLClientKeystoredb={7};" "SSLClientKeystash={8};").format(db, hostname, port, protocol, security, uid, pwd, ssl_client_keystoredb, ssl_client_keystash) conn = ibm_db.connect(conn_str, '', '') pd_conn = ibm_db_dbi.Connection(conn) q = 'SELECT * FROM syscat.tables LIMIT 5' df = pd.read_sql(q, pd_conn) print(df) |
After the initial connection steps above, to run a query a and read it the result straight into a Pandas DataFrame all you have to do is:
...