NIH RADx Data Hub - Workbench Tutorial

General

General
Create and Launch a JupyterLab Space
Upload and Download Files
Clone a Git Repository
Create a Persistent Conda Environment
Access Public Data
Change Environment
File Sync

General

General

JupyterLab

General

Figure 1: Labeled Jupyter Notebook Interface

The main elements of JupyterLab editor are:

Notebook: A document containing analysis code, outputs, and any additional markdown or text.
Cell: A single section of a notebook where to enter code, markdown, or text.
Toolbar: Perform the most common notebook actions, including:
- Save
- Insert cell below
- Cut selected cell
- Copy selected cell
- Paste from clipboard
- Run selected cell
- Interrupt the kernel
- Restart the kernel
- Restart the kernel and run all cells
- Change cell type (i.e. Code, Markdown, Raw)
- Launch terminal
Environment: Displays the current notebook kernel type.
File Browser: Displays lists of folders, notebooks, and other files.
- The Personal Studio environment is a private. personal Amazon EFS directory
Left sidebar: Contains tabs to access the following functionalities.
- File Browser: Displays lists of folders, notebooks, and other files.
- Running Terminals and Kernels: View current kernels and terminals running in JupyterLab. Optionally shut down all or select resources (i.e., notebooks, terminals, kernels, apps, and instances).
- Git: Connects to a Git repository for Git tool and operation access.
- Table of Contents: Automatically generated for each notebook, Markdown file, or Python file open to navigate the document’s structure with clickable entries.
- Extension Manager: Enables and manages third-party JupyterLab extensions.
- Jupyter AI: A JupyterLab tool to explore generative AI models and integrate them into notebooks.

Create and Launch a JupyterLab Space

The default workspace environment is a ml.t3.medium (2 vCPU, 4 GiB memory) instance type.

To create a new JupyterLab space:

When the Workbench is launched, select “JupyterLab” from the “Overview” section, or select “JupyterLab” from the “Applications” in the left panel (Figure 2).

Figure 2: Workbench appliciation highlighting JupyterLab

Figure 2: Workbench Applications Highlighting JupyterLab

Select "+ Create JupyterLab space" in the upper right corner of the JupyterLab page
- In the “Create JupyterLab space” dialog, specify a name for the space in the “Name” field. To finish, click “Create space.”
- Note: Because the platform is shared, workspaces must have a unique name. If the workspace name already exists, the following error will appear at the bottom of the page (Figure 3)
Figure 3: Error Message for Workspace Name Exists

To launch a JupyterLab space:

From the Workbench Home page, select “JupyterLab” from the Overview section, or select “JupyterLab” from “Applications” in the left panel (Figure 2).
Select “Run” in the Action column of the JupyterLab space to start the workspace (Figure 4). This may take up to a minute to start.

Figure 4: Start Running JupyterLab Space

Once the status changes to “Running”, select the “Open” icon in the Action column to launch JupyterLab in a new tab (Figure 5).

Figure 5: Open JupyterLab Space

To create a new notebook:

From the landing page, select “File,” “New,” and “Notebook” (Figure 6).
- In the “Select Kernel” dialog, select a kernel on the dropdown menu. To finish, click “Select”, which launches the notebook.

Figure 6: Launch Notebook from File Menu

From the Launcher page, click a preferred kernel in the Notebook section (Figure 7).

Figure 7: Launch Notebook Using Launcher

Upload and Download Files

To upload files from a local machine into a JupyterLab space:

In the left sidebar, choose the “File Browser” icon.
In the File Browser, choose the "Upload Files” icon.
Select the files to upload and choose “Open.”
Once the file appears in the home folder, double-click the file to open it in a new tab.

To download a file locally:

In the left sidebar, choose the “File Browser” icon.
Right click the file and select “Download.”

To download an entire folder locally:

From the menu, choose “File,” “New,” and “Terminal”, which will launch a Terminal in a new JupyterLab tab.
Type the following command replacing folder_name and /path/to/folder: zip -r -X folder_name.zip /path/to/folder
Once the folder is zipped and it appears in the File Browser, right click the .zip file and select “Download.”

Clone a Git Repository

Git repositories can be cloned into the JupyterLab home folder using the following steps:

Select the Git icon in the left sidebar.
Choose “Clone a Repository.”
In the Clone Git Repository window, enter the Git URL (for example, https://github.com/aws/amazon-sagemaker-examples.git)
Under “Project directory to clone into,” enter the path to the local directory where the cloned directory should exist, otherwise Studio will clone the repository into the home directory.
Choose “Clone,” which will automatically open a new terminal window and clone the repository. This may take up to a minute depending on the repository size.
If the repository requires credentials, a prompt will appear to enter a username and personal GitHub account access token.
When complete, the File Browser will open, displaying the cloned repository.
Choose the Git icon to view the Git user interface, which tracks the repository.
To track a different repository, open the repository in the file browser and click the Git icon.

Create a Persistent Conda Environment

Environments can be customized by installing and removing extensions and packages as needed. Any installed extensions and packages installed on the environment will persist. To create persistent conda environments in the JupyterLab application, use the following steps:

Open a JupyterLab space.
From the landing page, select “File,” “New,” and “Terminal”.
Within the terminal, create a new conda environment, replacing myenv with the desired environment name:
conda create -n myenv
Activate the environment
conda activate myenv
Install any necessary packages for the environment, for example:
conda install numpy pandas
Install the ipykernel to create a kernel option. This step can be skipped if it has already been installed:
conda install ipykernel
Add the new conda environment to the Jupyter kernel, changing the --display-name option as preferred:
python -m ipykernel install –-user --name myenv --display-name "MyEnvironment"
Verify installation of the kernel:
jupyter kernelspec list
When a notebook is launched, the new kernel should appear. If the kernel is not listed, close the tab and reopen the JupyterLab space.

Access Public Data

To access curated public and synthetic datasets on the RADx Data Hub’s Data Access page, follow the Public Data Tutorial

Datasets from the AWS Registry of Open Data, an AWS-hosted repository of more than 400 publicly available datasets, can be copied into a JupyterLab environment using the following steps:

Identify a dataset of interest and find the associated Amazon Resource Name (ARN).
- For example: NIH NCBI Sequence Read Archive (SRA)
- ARN: arn:aws:s3:::sra-pub-src-1
- The bucket name is sra-pub-src-1
From the JupyterLab landing page, select “File,” “New,” then “Terminal.”
Enter the following command: aws s3 sync s3://sra-pub-src-1 .
Replace sra-pub-src-1 with a selected dataset bucket name.

Change Environment

Notebooks launch with the minimum instance type available by default. The minimum instance type is appropriate for most tasks, however, a larger instance can be requested by submitting a Support Request. Follow the instructions in the User Support Requests Tutorial and select “Workbench Support” when choosing a Request Type. Please provide as much detail as possible in the request for the support team to determine the best suitable environment. For more detailed information about available instance types and their performance capabilities, see Available Studio Instance Types.

File Sync

If an added Workbench file does not appear in the File Browser of JupyterLab, the workspace should be resynced. Close the JupyterLab tab, and refresh the My Approved Data page. Then, follow the steps to relaunch the JupyterLab page. If the files still do not appear, the workspace may need to be manually synced with the following steps:

From the File menu, click “File,” “New,” and “Terminal.”
Enter the following into the Terminal: ./s3sync.sh

If the files in a workspace are still missing, please submit a Support Request.

The Analytics Workbench

JupyterLab

Data Wrangler

SAS Viya