An early version of JupyterHub is now available within the SAIL Databank for a limited number of users. As our collective experience of working with JupyterHub evolves and our technical offering, this guide will get updated in line with it.
This is a development piece of software, and these have risks associated with them. You will need to manually back up your notebooks to Git or by downloading them to your SAIL desktop.
Table of Contents |
---|
Getting started
- Note: Access to JupyterHub is only available for users on projects that have paid for additional processing power.
- If you have requested JupyterHub and this has been confirmed you will need to log on to the SAIL Gateway as normal. Once within the Windows 10 environment, open up your Browser and navigate to - httphttps://jupyterhub.sail.k8spk.chi.swanserp.ac.uk/
- You may be prompted with an error message warning you that the site is not safe, we wish to assure you that it is safe but whilst in development, this site will appear to be unsafe. Please follow the browsers navigation options as detailed below
...
- Click the orange ' login with Keycloak' button and follow the instructions. Your login is the same as the one you use to log into the SAIL Gateway.
- Following this, you will be logged into JupyterHub and will see a list of notebooks to choose from. Assuming that you are on a GPU project you will see a minimum of 3 options:
- The first is a basic (non-GPU) notebook. This has python and R kernels installed, and also allows you to launch VS Code and RStudio from within the notebook if you prefer a more fully-featured IDE.
- The second is a notebook with the same features as the first that will attach itself to a GPU, but that does not contain any GPU drivers or related python libraries. You will have to install all of your own GPU drivers from within the notebook if you select this.
...
- We do not recommend the use of this option, and it will likely be removed in a future release.
- The third is a GPU-attached notebook with CUDA
...
- 12.
...
- 2, Tensorflow, and other common python ML libraries preinstalled. It is configured to automatically surface your specific project GPU to Tensorflow within the notebook. This
...
- notebook only supports Python. It also includes VS Code and Tensorboard, as well as an extension for monitoring your GPU resource usage.
...
- If you are on more than 1 GPU project (e.g. project 1234 with GPU and project 1653 with GPU) you will see separate options in the notebook image list for each project. In this case, the list will look something like this:
- Standard Jupyter notebook
- Standard Jupyter notebook with GPU for project 1234
- GPU-enabled Jupyter notebook with GPU for project 1234
- Standard Jupyter notebook with GPU for project 1653
- GPU-enabled Jupyter notebook with GPU for project 1653
All of your notebooks have the same underlying file system, so choosing a different notebook doesn't affect any access to the files that you keep on JupyterHub.
...
The notebooks are configured so that any new conda environment you create (see Installing Library Packages in Anaconda) will automatically create can be configured to show a corresponding kernel launcher on your Jupyter homepage. For this reason, we recommend that users primarily install packages via Anaconda, rather than pip or CRAN, where possible.
Python
- Open a Terminal window from the homepage.
- If you see a line like "bash: __conda_exe: command not found" at the top of the Terminal window, type "conda init", close the Terminal window, and then launch it again from the homepage.
- Create a new conda environment to install your package in,
conda create --name myshinynewenv ipykernel ibm_db
*UPDATE 02/08/2023: If you want to connect to DB2 you need to install the ibm_db package at the same time as the ipykernel package due to conflicting Python minor versions*
- Activate your new environment,
conda activate myshinynewenv
- Install your package, e.g.
conda install -c conda-forge recordlinkage
- You can also install from pip in the traditional way, ensuring you activate the environment (step 3) that you want to install the package into first.
- When you close the Terminal window and return to the homepage you should see a new python kernel with the same name as your new conda environment.
...
- Open a Terminal window from the homepage.
- If you want to use your new package within R Studio then just install it into the base environment.
- If you want to use it within Jupyter R then create a new env as in steps 2 and 3 in the Python section above.
- Install your package, e.g.
conda install
-c r
r-terra
- You can also install from CRAN in the traditional R way.
If you prefer a GUI...
- On the top menu bar, go to Settings → Conda Package Manager
- This may take several minutes to load, but when it does you'll see your current environments and all of the packages installed in them.
- You can create new environments and install packages via this visual UI.
Using GIT
The notebooks are set up to be able to pull and push to the SAIL Gitlab. You can set this up how you set it up on your SAIL Desktop (see How to use GITLAB within SeRP, and instructions on the internal SAIL wiki).
...
There are two options for moving files from your SAIL desktop and into your Jupyterhub Notebook.
- If the file is under 8mb 8gb you can simply drag and drop it into the browser window.
- Or you can sync it via GitLab (very much the recommended option).
...
The notebooks all have the necessary drivers and libraries for connection to DB2 pre-installed. They currently only support ODBC drivers.
The following are some code examples showing how you can connect directly to DB2.
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
import ibm_db import ibm_db_dbi import pandas as pd db = 'PR_SAIL' hostname = 'db2.database.ukserp.ac.uk' port = '60070' protocol = 'TCPIP' uid = 'YOUR USERNAME HERE' pwd = 'YOUR PASSWORD HERE' security = 'ssl' ssl_client_keystoredb = '/db2conndb2-connection/chi.kdb' ssl_client_keystash = '/db2conndb2-connection/chi.sth' conn_str = ("DATABASE={0};" "HOSTNAME={1};" "PORT={2};" "PROTOCOL={3};" "Security={4};" "UID={5};" "PWD={6};" "SSLClientKeystoredb={7};" "SSLClientKeystash={8};").format(db, hostname, port, protocol, security, uid, pwd, ssl_client_keystoredb, ssl_client_keystash) conn = ibm_db.connect(conn_str, '', '') stmtpd_conn = ibm_db_dbi.exec_immediateConnection(conn, ") q = 'SELECT * FROM syscat.tables LIMIT 3") print "Number of affected rows: ", ibm_db.num_rows(stmt) |
...
5'
df = pd.read_sql(q, pd_conn)
print(df) |
After the initial connection steps above, to run a query and read the result straight into a Pandas DataFrame all you have to do is:
Code Block | ||
---|---|---|
| ||
q = 'SELECT * FROM syscat.columns LIMIT 2'
df = pd.read_sql(q, pd_conn)
print(df) |
R
TBC
Related articles
Content by Label | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...