You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

This guide describes the best practice for using Jupyter Notebooks (or Jupyter Lab) on your SAIL Desktop. These are different from JupyterHub (for that guide, please see JupyterHub Guide). Still, unless your project is paying for JupyterHub access and you are using the JupyterHub URL, you will want to follow the guide on this page.


The instructions are split into two sections - the first is a basic set of instructions for setting up and using Jupyter Notebooks (referred to here as 'JNBs'). The second is the same instructions explaining the reasoning behind each step.



For further help and Jupyter notebook support, you can join the Slack channel #supportgroup_jupyter 



Background

Jupyter Notebooks is a web-based application for creating code in a simplified, streamlined single document. It allows you to add text using markdown cells as well as code cells – for a number of different programming languages including Python and R, as well as some other nice features like exporting as pdf or html document or turning into slides. 

 

While it is possible to install Jupyter as standalone software, it is also included as part of Anaconda which we will cover in this guide, since Anaconda is already available within the SAIL gateway and existing SAIL users have experience of configuring Jupyter Notebooks via this method.  

 

The Anaconda suite distributes R and Python programming packages built for data science and hosts a range of different software packages including R studio ( R ), Spyder (Python) etc. It also comes with conda, an environment and package manager.   

 

Using an environment manager allows you to have multiple different working environments which avoids problems arising from conflicts between packages you may use for different analytical tasks. Eg if there’s an update to a package (e.g. ggplot), it checks whether there are any other conflicts that might arise when you’ve installed it. It also allows you to clone an environment so if someone else wants to run your script, cloning the environment will ensure that you’re both working from the same versions. 



Basic Instructions

Initial setup

This is for people who have not previously configured Anaconda or Jupyter on their desktops.

  1. Log on to your SAIL desktop.

  2. First, create a folder on your P Drive where you will save your conda environments. 

    Go to P drive > yourusername

    Create new folder eg conda-envs 

    This folder will be the home for all your conda environments.

  3. Open Anaconda Prompt, a command line window which you will use to control your conda environments

    Click on the Start menu (the little Windows icon on the bottom left) → Anaconda3 (64-bit) → Anaconda Prompt (Anaconda3).
    This will open a command line window with a line of text like:
    1. (base) C:\Users\<your username will be here>


  4. Now we need to make an environment to use.

    1. Option 1: create a new (empty) environment and install packages/libraries you want to use

      In the Anaconda Prompt window, type the following and then hit 'Enter':
      1. conda create -p P:\<your username here>\<name of new folder>\<name of your new environment> –-channel=anaconda --channel=conda-forge nb_conda_kernels pandas numpy jupyterlab

        For example, if my username is 'leal' and I want to create an environment called 'mynewenv' in a folder called 'conda-envs', the command I would run would be:

        conda create -p P:\leal\conda-envs\mynewenv –-channel=anaconda --channel=conda-forge nb_conda_kernels pandas numpy jupyterlab



      2. Wait a little while until the window asks you whether to proceed - hit 'y' on your keyboard and then press 'Enter'.



      3. Wait while your new environment is created and all requested packages are installed.



      4. You might get a pop-up saying, "this app has been blocked by your system administrator" this is fine, and it all worked. Just click 'Close' on the message.
         



    2. Option 2: clone the base environment and copy all the installed packages into your own environment

      In the Anaconda Prompt window, type the following and then hit 'Enter':
      1. conda create --prefix <your username here>\<name of new folder>\<name of your new environment> --clone base 

        For example, if my username is 'daviesj' and I want to create an environment called 'ranch' in a folder called 'myconda', the command I would run would be:

      2. conda create -p P:\leal\myconda\ranch --clone base


      3. You will see a few lines of code followed by the number of packages and files that are going to be copied (see screenshot below)

      4. Wait while your new environment is created and all requested packages are installed.



      5. This process may take a few hours to run so advise that you go do something else and come back later!

      6. You might get a pop-up saying, "this app has been blocked by your system administrator" this is fine, and it all worked. Just click 'Close' on the message.
         



      7. When it is finished you will see the message done, as seen in the see screen shot below, and some instructions for how to activate the environment you have just created (see Step 5). 




  5. Next you need to activate your environment

    Type the following in the command line window:
    1. conda activate P:\<your username here>\<name of your new environment>
    2. So, if my username is 'leal' and I created an environment called 'mynewenv' in a folder called 'conda-envs', I would use the command:
      1. conda activate P:\leal\conda-envs\mynewenv


    3. You'll know when the environment is activated because the environment name and path will be in brackets as in the image above.

      You can also type
      conda env list and you will see an asterisks next to the active environment.


  6. Congratulations, you can now move on to the next part of the guide which will instruct you how to install the specific packages/libraries you want to use.



Installing Packages

General users

Jupyter notebooks are already installed with conda, but you will likely want to install some specific libraries or packages that you will use when you are programming code.

We strongly recommend users try installing from Anaconda sources for package installation. You should only use pip if a package isn't available from Anaconda channels.

The steps below assume you've completed the initial setup and have an environment saved on your P Drive (P:) which you have activated in an open command line window.

  1. First install the jupyter notebook packages
    conda install –-channel=anaconda --channel=conda-forge --channel=r nb_conda_kernels jupyterlab

  2. To install additional packages, and you are unsure of the command line syntax to use: 
    go to Google (outside of your SAIL desktop) and search '<name of the package you want to install> anaconda'.
    1. For example, if I want to install the package 'recordlinkage', I would search on Google for 'recordlinkage anaconda'.

  3. Select the Google result from anaconda.org; this should take you directly to the Anaconda page for the package.

  4. On the page, a command will tell you how to install it. Sticking with the recordlinkage example, the webpage shows me that the command to install is:
    1. conda install -c conda-forge recordlinkage

  5. Return to your SAIL desktop and type this installation command into your Anaconda Prompt window, hitting 'Enter'.

  6. Wait for Anaconda to ask you if you want to proceed - hit 'y' on your keyboard and then press 'Enter'.

  7. Your package is installed! 

Special steps for R users

If you want to use R in a Jupyter environment, then you will need to install some specific packages to enable R programming.

By following these instructions, your R packages will be successfully installed into your created Anaconda environment rather than in a shared location that RStudio uses, which can cause problems down the line.

The steps below assume you've completed the initial setup and have an environment saved on your P Drive (P:) which you have activated in an open command line window

  1. Install base R and an R kernel (-← this is what allows you to use R within Jupyter) as well as a predefined collection of commonly used R libraries called r-ressentials
    conda install –-channel=anaconda --channel=conda-forge --channel=r r-base=4.1.3 r-irkernel nb_conda_kernels jupyterlab r-essentials

    For example, if my username is 'leal' and I want to use R within my environment called 'renv' which has been created in a folder called 'conda-envs', the command I would run would be:
    1. conda install -p P:\leal\conda-envs\renv –-channel=anaconda --channel=conda-forge --channel=r r-base=4.1.3 r-irkernel nb_conda_kernels jupyterlab r-essentials

  2. Wait a little while until the window asks you whether to proceed - hit 'y' on your keyboard and then press 'Enter'.


  3. Wait while all your requested packages are installed.


  4. Next, you need to install a package that will force R to look in your Anaconda environment for installed packages. To do this, in the same command line window we've been using, type the following and hit 'Enter':
    1. conda install -c conda-forge conda-ecosystem-user-package-isolation

    2. Type 'y' to agree to the installation and wait while it completes.

  5. Your Anaconda window will probably go a bit weird after installing that last one, so just quit it (click the "X" in the top right) and then launch a new Anaconda Prompt window (as in step 2).
    1. remember to activate the environment when you open a new Anaconda Prompt window!
    2. You'll know when the environment is activated because the environment name and path will be in brackets as show in Step 5 of the basic instructions above.

  6. Install any other packages you want to use.

    note that if you have created your new environment by cloning the base environment, you will have the most commonly used packages pre-loaded in your environment.

    You can see what libraries are already installed in your active environment by typing “conda list” into the command prompt window.  
    Note that all r libraries are prefixed with ‘r-‘ in the list.

    If you are unsure of the command line syntax to use: 
    go to Google (outside of your SAIL desktop) and search '<name of the package you want to install> anaconda'.

    For example, one of the first packages you will likely need to install is the RODBC package as this will enable you to connect to DB2 from within your R session.
    if you search RODBC and anaconda, you should be able to easily locate the anaconda url 
    https://anaconda.org/conda-forge/r-rodbc 



    Also, note that multiple packages can be installed on one command line, just add a space, then the package name, e.g. to install rodbc, broom and janitor packages

    conda install -c conda-forge r-rodbc r-broom r-janitor 

  7. Once you have the package installed in your environment, you can load the library from within your R Jupyter notebook as usual (i.e. library(RODBC)). See below for creating a jupyter notebook.

  8. Congratulations! You can now move on to the "Starting Jupyter" section.

Installing Packages 'on the fly'

Note that, once you are already within Jupyter your anaconda prompt window will be occupied and so it is sometimes useful to have an additional anaconda prompt window open alongside (for example if you are installing a number of new packages). Don’t forget to activate your environment for installing packages into!




Starting Jupyter


This assumes you have done the initial setup and saved an environment on your P: (pronounced 'P drive').

Suppose you've just completed the initial setup and have your new environment activated in an open command line window. In that case, you can skip steps 1-3.

  1. Log on to your SAIL desktop.
  2. Click on the Start menu (the little Windows icon on the bottom left) → Anaconda3 (64-bit) → Anaconda Prompt (Anaconda3). This will open a command line window with a line of text like:
    1. (base) C:\Users\<your username will be here>
  3. Activate your conda environment by typing the following in the same command line window:
    1. conda activate P:\<your username here>\<name of your new environment>
    2. So, if my username is 'leal' and I created an environment called 'mynewenv' in a folder called 'conda-envs', I would use the command:
      1. conda activate P:\leal\conda-envs\mynewenv
    3. You will know when the environment is activated because the window will show a line of text like:
      1. (P:\<your username>\<your environment name>) C:\Users\<your username>

  4. VERY IMPORTANT: Before starting Jupyter, you will need to first change the working directory so that can either i) access previously saved notebooks, notebooks created by your colleagues or ii) create new notebooks that are saved in the desired place, eg in the SAIL project folder so you can easily share with your collaborators. 

    You will see the current working directory in the command prompt window after your conda environment name, it is likely a C Drive file path.  If you launch jupyter within the C Drive, you will be encumbered by not being able to see any other files (such as images, documents) within the jupyter directory.

    Furthermore, the notebooks will almost definitely contain outputs that are project specific, and so we must remember to adhere to SAIL policy and make sure we use the S Drive folder for project specific outputs.
    Consider that, unlike pure sql scripts or r scripts which aren't likely to contain results/data/outputs, jupyter notebooks contain both code and outputs, and therefore, MUST NOT BE SAVED ON THE P DRIVE. 

    1. To change the file path to a different drive, type the following:

      1. S: (Hit enter)
        or

        P: (Hit enter) 

        (S for SAIL policy for project specific) 

    2. You may also want to change your directory to a specific folder (though you can access any sub-directories within the drive you have specified above). Type the following and press Enter:
      1. cd <your username here>
        So if my username is 'leal', I would type:
        cd leal

    3. Optional: You might want to navigate to the specific folder in which you'll be working/saving this work, but that is out of the scope of this simple guide.

  5. We are now ready to start Jupyter. In the command line window, type and hit Enter with either of the following commands:
    1. jupyter lab
      1. This gives you a more modern Jupyter interface.
    2. jupyter notebook
      1. This will give you the 'classic' Jupyter interface.


  6. Jupyter will automatically open in a Microsoft Edge tab. You can navigate wherever you want to save your notebooks, create folders, make your notebooks, etc.
    1. You must leave the Anaconda Prompt window open while using Jupyter, though you may not need to look at it, it is helpful to check logs should you have any errors or issues. 

  7. To ensure that you're using the correct environment kernel in Jupyter, you need to pay attention when creating notebooks.

    1. In the modern interface:
      1. Under the 'Notebook' heading in the launcher tab, select the one with the name
        Python \[conda env: <name of your conda env here>\]*


    2. In the 'classic' interface:
      1. Click on 'New' in the top right.
      2. In the drop-down window that opens, make sure you choose the option called 
        Python \[conda env: <name of your conda env here>\]*

  8. When you're done and want to exit Jupyter, click on the 'Anaconda Prompt' window on the taskbar, click somewhere in the window, and press Ctrl+C twice.
    Please wait a few seconds; Jupyter should shut down, making it safe to close your notebook.




Instructions with explanations

The reasoning for each step is shown in red text.

Initial setup

This is for people who have not previously configured Anaconda or Jupyter on their desktops.

  1. Log on to your SAIL desktop.
  2. Click on the Start menu (the little Windows icon on the bottom left) → Anaconda3 (64-bit) → Anaconda Prompt (Anaconda3). This will open a command line window with a line of text like:
    1. (base) C:\Users\<your username will be here>
  3. Now we need to make an environment to use. In this command line window, type the following and then hit 'Enter':
    1. conda create -p P:\<your username here>\<name of your new environment> –-channel=anaconda --channel=conda-forge nb_conda_kernels pandas numpy jupyterlab
      -p is short for --prefix which tells conda that your environment name is prefixed by the file path

    2. For example, if my username is 'leal' and I want to create an environment called 'mynewenv' in a folder called 'conda-envs', the command I would run would be:
      1. conda create -p P:\leal\conda-envs\mynewenv –-channel=anaconda --channel=conda-forge nb_conda_kernels pandas numpy jupyterlab
    3. There is already a conda environment available to SAIL users but this base environment is saved on the C Drive (C:). Since your user account doesn't have permission to write to the C drive this will be a problem if you want to install new libraries and packages. 
      We therefore need to create an environment on our P Drive (P:) 
      because you cannot install packages to the base environment.

    4. We use the '-p' flag to tell Anaconda to put all your environment files and folders in a specific location and not in its default location on the C:.
    5. The environment path has to be somewhere that won't get cleared during the maintenance window, which rules out the C:, and also somewhere you have permission to write to. 
      This is why we use our P:\<username> folder.
    6. The --channel flags tell Anaconda where to look for the libraries that we want it to install in our new environment
    7. The libraries we want to be installed by default are:
      1. nb_conda_kernels - This extension enables a Jupyter Notebook or JupyterLab application in one conda environment to access kernels for Python, R, and other languages found in other environments. When a kernel from an external environment is selected, the kernel conda environment is automatically activated before the kernel is launched. This allows you to utilize different versions of Python, R, and other languages from a single Jupyter installation.
      2. pandas and NumPy - are very useful Python libraries for dealing with data.
      3. jupyterlab - the complete Jupyter distribution is needed so we can use Jupyter.
  4. Wait while your new environment is created and all requested packages are installed.
  5. You might get a pop-up saying, "this app has been blocked by your system administrator" this is fine, and it all worked. Just click 'Close' on the message.
    1. This is related to python.exe permissions; it is a known issue but shouldn't cause any problems.
  6. After your environment is created, you need to activate it by typing the following in the same command line window:
    1. conda activate P:\<your username here>\<name of your new environment>
    2. So, if my username is 'leal' and I created an environment called 'mynewenv' in a folder called 'conda-envs', I would use the command:
      1. conda activate P:\leal\conda-envs\mynewenv
    3. You need to activate an environment every time you open a fresh Anaconda Prompt window; otherwise, Anaconda will try and use the base environment.
  7. Congratulations, you can now move on to the next part of the guide.

Starting Jupyter

This assumes that you have done the initial setup and have an environment saved on your P: (pronounced 'P drive').

Suppose you've just completed the initial setup and have your new environment activated in an open command line window. In that case, you can skip steps 1-3.

  1. Log on to your SAIL desktop.
  2. Click on the Start menu (the little Windows icon on the bottom left) → Anaconda3 (64-bit) → Anaconda Prompt (Anaconda3). This will open a command line window with a line of text like:
    1. (base) C:\Users\<your username will be here>
  3. Activate your conda environment by typing the following in the same command line window:
    1. conda activate P:\<your username here>\<name of your new environment>
    2. So, if my username is 'leal' and I created an environment called 'mynewenv' in a folder called 'conda-envs', I would use the command:
      1. conda activate P:\leal\conda-envs\mynewenv
    3. You will know when the environment is activated because the window will show a line of text like:
      1. (P:\<your username>\<your environment name>) C:\Users\<your username>
    4. You need to activate an environment every time you open a fresh Anaconda Prompt window; otherwise, Anaconda will try and use the base environment.
  4. VERY IMPORTANT: Before starting Jupyter, we must ensure we're on the P: in the command line window. To do this, type the following into the window and press Enter:
    1. P:
    2. You cannot change drive letters from within Jupyter. If you start Jupyter when you're on the C: it will only be able to save your notebooks on the C:. The C: gets cleared whenever
      your desktop restarts, so you'd lose all your notebooks if you saved them here.
  5. Then type the following and press Enter:
    1. cd <your username here>
    2. So if my username is 'leal', I would type:
      1. cd leal
    3. Jupyter needs to be able to write to wherever it starts, and making sure it starts in your user folder avoids several common "permission denied" problems.
  6. (Optional) You might want to navigate to the specific folder in which you'll be working/saving this work, but that is out of the scope of this simple guide.
  7. We are now ready to start Jupyter. In the command line window, type and hit Enter with either of the following commands:
    1. jupyter notebook
      1. This will give you the 'classic' Jupyter interface.
    2. jupyter lab
      1. This gives you a more modern Jupyter interface.
  8. Jupyter will automatically open in a Microsoft Edge tab. You can navigate wherever you want to save your notebooks, create folders, make your notebooks, etc.
    1. You need to leave the Anaconda Prompt window open while you're using Jupyter.
      1. If you close the Anaconda Prompt window, it shuts down the local server that Jupyter is using, crashing Jupyter.
  9. To ensure that you're using the correct environment kernel in Jupyter, you need to pay attention when creating notebooks.
    1. In the 'classic' interface:
      1. Click on 'New' in the top right.
      2. In the drop-down window that opens, make sure you choose the option called 
        1. Python \[conda env: <name of your conda env here>\]*
    2. In the modern interface:
      1. Under the 'Notebook' heading in the launcher tab, select the one with the name
        1. Python \[conda env: <name of your conda env here>\]*
    3. We want any notebook that we use to use the specific version of everything that we installed in our Anaconda Environment.
      Doing this means that you can use packages installed in your Anaconda Environment inside Jupyter too.
  10. When you're done and want to exit Jupyter, click on the 'Anaconda Prompt' window on the taskbar, click somewhere in the window, and press Ctrl+C twice.
  11. Please wait a few seconds; Jupyter should shut down, making it safe to close your notebook.

Installing packages

This assumes that you've completed the initial setup and have an environment saved on your P: .

Suppose you've just completed the initial setup and have your new environment activated in an open command line window. In that case, you can skip steps 1-3.

We strongly recommend users try installing from Anaconda sources for package installation. You should only use pip if a package isn't available from Anaconda channels.

  1. Log on to your SAIL desktop.
  2. Click on the Start menu (the little Windows icon on the bottom left) → Anaconda3 (64-bit) → Anaconda Prompt (Anaconda3). This will open a command line window with a line of text like:
    1. (base) C:\Users\<your username will be here>
  3. Activate your conda environment by typing the following in the same command line window:
    1. conda activate P:\<your username here>\<name of your new environment>
    2. So, if my username is 'leal' and I created an environment called 'mynewenv' in a folder called 'conda-envs', I would use the command:
      1. conda activate P:\leal\conda-envs\mynewenv
    3. You will know when the environment is activated because the window will show a line of text like:
      1. (P:\<your username>\<your environment name>) C:\Users\<your username>
  4. Outside your SAIL desktop, go to Google and search '<name of the package you want to install> anaconda'.
    1. For example, if I want to install the package 'recordlinkage', I would search on Google for 'recordlinkage anaconda'.
  5. Select the Google result from anaconda.org; this should take you directly to the Anaconda page for the package.
  6. On the page, there will be a command that tells you how to install it. Sticking with the recordlinkage example, the webpage shows me that the command to install is:
    1. conda install -c conda-forge recordlinkage
  7. Go back to your SAIL desktop and type this installation command into your Anaconda Prompt window, hitting 'Enter'.
  8. Wait for Anaconda to ask you if you want to proceed - hit 'y' on your keyboard and then press 'Enter'.
  9. Your package is installed! 




FAQs


Can i install R packages from CRAN?

yes, you can install packages in R using install.packages()

But we strongly recommend users try installing from Anaconda sources for package installation.
You should only use CRAN if a package isn't available from Anaconda channels.

These instructions assume you have completed the initial setup and have an environment saved on your P: .

    1. Complete steps 1-9 in the "Starting Jupyter" section above.
    2. In the Jupyter Notebook that launched when you clicked on the card with the title: 
      1. R \[conda env: <name of your conda env here>\]*
    3. Type the following in a cell and execute it:
      1. install.packages("<name of package to install>", repos="http://cran.rstudio.com")
      2. For example, to install the "rmarkdown" package, you would do
        1. install.packages("rmarkdown", repos="http://cran.rstudio.com")
    4. Your package is installed!

Can I save my notebook as an html?

 

yes, you can export your notebook in a number of different formats such as html, pdf or latex. 

 

From within the command prompt window, type:

jupyter nbconvert –to html my_notebook_name.ipynb 

 

It will save the output in the active directory location (tip – review the file path in the command prompt window). 

 

For more help on using nbconvert: 

https://nbconvert.readthedocs.io/en/latest/usage.html 

 

 

How do I view the objects and variables I’ve created during my session?


There is a nice Jupyter extension tool called variable inspector. 

https://github.com/zhangfeiran/jupyterlab-variableInspector 

 

from within your command prompt window, type: 

pip install frz-jupyterlab-variableinspector 

 

you will probably need to restart your Jupyter session, if not your browser.  


Once installed, you can right click anywhere within a notebook and you should see the option to ‘open variable inspector’ as seen below 

 

 

 

I found it was a little delayed to activate the first time I used it but afterwards was responsive in near real time. 

 

You may also wish to view the variable inspector window side-by-side with your notebook, to do that, drag the variable inspector tab and move to either side of the open tab. 

 

 

 

 

 What is an environment exactly and why do I need one? Do I need one for each project

 

The concept of an environment can be quite abstract but essentially it is a directory which contains a specific set of packages you have installed.

For example, you might require a package that only works with R version 3.6. Rather than having to reinstall all your packages and their dependencies so that they are compatitible you can create an entirely separate environment with R version 3.6 while keeping your existing version of R 4.0 and all its dependencies in its own environment. This helps to avoid conflicts between multiple tools that you may use.

In reality, you likely wouldn’t have a different environment for each research project, but if you have one project for which you are using say ML packages you may have a separate environment to one which is just for visualisations. It’s really a matter of preference for organising your workbenches. If you do pretty much the same analysis across all projects, you will likely just have one environment.

The most common case for a new environment is when there is a major change to a core language (like R) which you want to switch to while retaining environments with older versions of libraries for running legacy code. 


  • No labels