You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 14 Next »

An early version of JupyterHub is now available within the SAIL Databank for a limited number of users. As our collective experience of working with JupyterHub evolves and our technical offering, this guide will get updated in line with it.  

This is a development piece of software, and these have risks associated with them. You will need to manually back up your notebooks to Git or by downloading them to your SAIL desktop.

Getting started

  1. Note: Access to JupyterHub is only available for users on projects that have paid for additional processing power. 
  2. If you have requested JupyterHub and this has been confirmed you will need to log on to the SAIL Gateway as normal. Once within the Windows 10 environment, open up your Browser and navigate to - http://jupyterhub.sail.k8s.chi.swan.ac.uk/ 
  3. Click the orange ' login with Keycloak' button and follow the instructions. Your login is the same as the one you use to log into the SAIL Gateway.
  4. Following this, you will be logged into JupyterHub and will see a list of notebooks to choose from. Assuming that you are on a GPU project you will see a minimum of 3 options:
    1. The first is a basic (non-GPU) notebook. This has python and R kernels installed, and also allows you to launch VS Code and RStudio from within the notebook if you prefer a more fully-featured IDE.
    2. The second is a notebook with the same features as the first that will attach itself to a GPU, but that does not contain any GPU drivers or related python libraries. You will have to install all of your own GPU drivers from within the notebook if you select this. We do not recommend the use of this option, and it will likely be removed in a future release.
    3. The third is a GPU-attached notebook with CUDA 11.6.2, Tensorflow, and other common python ML libraries preinstalled. It is configured to automatically surface your specific project GPU to Tensorflow within the notebook. This notebook only supports Python. It also includes VS Code and Tensorboard, as well as an extension for monitoring your GPU resource usage.
  5. If you are on more than 1 GPU project (e.g. project 1234 with GPU and project 1653 with GPU) you will see separate options in the notebook image list for each project. In this case, the list will look something like this:
    1. Standard Jupyter notebook
    2. Standard Jupyter notebook with GPU for project 1234
    3. GPU-enabled Jupyter notebook with GPU for project 1234
    4. Standard Jupyter notebook with GPU for project 1653
    5. GPU-enabled Jupyter notebook with GPU for project 1653

All of your notebooks have the same underlying file system, so choosing a different notebook doesn't affect any access to the files that you keep on JupyterHub.

A request on notebook choice

We would like to strongly suggest that users only select GPU-enabled notebooks if they are intending to run workloads that will actually use the GPU. If a user is just doing data cleaning or exploratory analysis then the 'Standard Jupyter Notebook' is more than sufficient for this. It is also very important that users on multiple GPU projects select the correct notebook for the project that they will be working on. We will be monitoring resource usage to help us better understand the usage patterns and usage rates of projects with GPUs.

Inside the notebook

After selecting your notebook, your server will start up (after a very short delay) and the traditional Jupyter Lab interface will load. This should be an interface that is familiar. Our notebooks offer some special features, including:

  • CloudBeaver – this lets you connect to DB2 and run SQL directly in the window. 
  • VS Code - a fully featured IDE with several extensions pre-installed.
  • RStudio - the RStudio IDE.

Installing new packages

The notebooks are configured so that any new conda environment you create (see Installing Library Packages in Anaconda) can be configured to show a corresponding kernel launcher on your Jupyter homepage. For this reason, we recommend that users primarily install packages via Anaconda, rather than pip or CRAN, where possible.

Python

  1. Open a Terminal window from the homepage.
  2. Create a new conda environment to install your package in, conda create --name myshinynewenv ipykernel
  3. Activate your new environment, conda activate myshinynewenv
  4. Install your package, e.g. conda install -c conda-forge recordlinkage
  5. You can also install from pip in the traditional way, ensuring you activate the environment (step 3) that you want to install the package into first.
  6. When you close the Terminal window and return to the homepage you should see a new python kernel with the same name as your new conda environment.

R

  1. Open a Terminal window from the homepage.
  2. If you want to use your new package within R Studio then just install it into the base environment.
  3. If you want to use it within Jupyter R then create a new env as in steps 2 and 3 in the Python section above.
  4. Install your package, e.g. conda install -c r r-terra
  5. You can also install from CRAN in the traditional R way.

If you prefer a GUI...

  1. On the top menu bar, go to Settings → Conda Package Manager
  2. This may take several minutes to load, but when it does you'll see your current environments and all of the packages installed in them.
  3. You can create new environments and install packages via this visual UI.

Using GIT

The notebooks are set up to be able to pull and push to the SAIL Gitlab. You can set this up how you set it up on your SAIL Desktop (see How to use GITLAB within SeRP, and instructions on the internal SAIL wiki).

Getting files from your SAIL desktop into your JupyterHub notebook

There are two options for moving files from your SAIL desktop and into your Jupyterhub Notebook.

  1. If the file is under 8mb you can simply drag and drop it into the browser window.
  2. Or you can sync it via GitLab (very much the recommended option).

For files too large to choose either of these options, please log a ticket with the Helpdesk and we will help to find a solution for you.

Getting files from JupyterHub onto your SAIL desktop

As above, you have 2 options:

  1. Sync via Gitlab (recommended)
  2. Right-click on the thing you want to download and click 'Download'.

Seeing 'hidden' files and folders

File and folder names that start with a leading '.' are hidden by default in Jupyter. To view them in the file tree select 'View' → 'Show hidden files' from the top menu.

Changing your theme

We have pre-installed a number of Jupyter themes. You can change your theme by selecting 'Settings' → 'Theme' → pick a theme from the list. You will find 'Settings' in the top menu.

Connecting to DB2

The notebooks all have the necessary drivers and libraries for connection to DB2 pre-installed. They currently only support ODBC drivers.

The following are some code examples showing how you can connect directly to DB2.

Python

import ibm_db
import pandas as pd

db = 'PR_SAIL'
hostname = 'db2.database.ukserp.ac.uk'
port = '60070'
protocol = 'TCPIP'
uid = 'YOUR USERNAME HERE'
pwd = 'YOUR PASSWORD HERE'
security = 'ssl'
ssl_client_keystoredb = '/db2conn/chi.kdb'
ssl_client_keystash = '/db2conn/chi.sth'

conn_str = ("DATABASE={0};"
			"HOSTNAME={1};"
			"PORT={2};"
			"PROTOCOL={3};"
			"Security={4};"
			"UID={5};"
			"PWD={6};"
			"SSLClientKeystoredb={7};"
			"SSLClientKeystash={8};").format(db, hostname, port, protocol, uid, pwd, ssl_client_keystoredb, ssl_client_keystash)

conn = ibm_db.connect(conn_str, '', '') 
pd_conn = ibm_db_dbi.Connection(conn)

q = 'SELECT * FROM syscat.tables LIMIT 5'
df = pd.read_sql(q, pd_conn)
print(df)

After the initial connection steps above, to run a query and read the result straight into a Pandas DataFrame all you have to do is:

q = 'SELECT * FROM syscat.columns LIMIT 2'
df = pd.read_sql(q, pd_conn)
print(df)



R

TBC







  • No labels