Make binderhub-ui hub#
We can support users who want to build, push and launch user images, from open source GitHub repositories, from an UI similar with mybinder.org.
We call this a binderhub-ui
style hub and the primary features offered would be:
(Optional) User authentication
NO Persistent storage
BinderHub style UI
Two separate domains, one for binderhub UI and one for JupyterHub
Keeping them separate will help with having clean and correct sharing URLs without having them be based off the
hub/services/:name
path.A logged out homepage, and a logged-in homepage
See 2i2c-org/infrastructure#4168 for context on this decision.
Generate a sample initial hub configuration#
Directly run the following deployer
command or follow the steps in the Initial Hub setup guide to get started on a very basic hub setup or copy-paste the configuration of a similar hub, then follow the steps below.
deployer generate hub-asset binderhub-ui-values-file
Double-check the generated config#
The sample config that has been generated by the deployer command needs to be checked to make sure that everything is as expected and nothing is missing.
Follow the checklist below before committing the hub values file to the infrastructure repository.
I. General configuration#
The following configuration applies to both authenticated and not-authenticated binderhubs.
1. Check that some of the inherited configuration is emptied#
Some of the configuration that gets inherited either from the basehub
defaults or from the cluster’s common value file (if that exists) needs to be clear out as not relevant for a binderhub style hub.
disable
jupyterhub.custom.jupyterhub-configurator
jupyterhub: custom: jupyterhubConfigurator: enabled: false
disable
jupyterhub.custom.singleuserAdmin.extraVolumeMounts
(no persistent storage so these don’t make sense)jupyterhub: custom: singleuserAdmin: extraVolumeMounts: []
on the singleuser server, disable storage and init containers and profile lists
There will be no persistent storage, so disable it. Because of this
singleuser.extraVolumeMounts
andsingleuser.initContainers
should also be emptied.Also make sure that the profileList is disabled in case it gets set in the common values file, as keeping it will make binderhub fail to launch a server.
jupyterhub: singleuser: storage: type: none extraVolumeMounts: [] initContainers: [] profileList: []
2. Check jupyterhub and binderhub domains setup#
Having separate domains for both jupyterhub and binderhub will help with having clean and correct sharing URLs without having them be based off the hub/services/:name
path.
So make sure ingress is setup correctly for both jupyterhub and binderhub and make sure that ingress.tls.secretName differs as they will be in the same namespace and naming them the same will fail the setup of the other one.
jupyterhub:
ingress:
hosts: [{{ jupyterhub_domain }}]
tls:
- hosts: [{{ jupyterhub_domain }}]
secretName: https-auto-tls
custom:
binderhubUI:
enabled: true
binderhub-service:
ingress:
enabled: true
hosts: [{{ binderhub_domain }}]
tls:
- hosts: [{{ binderhub_domain }}]
secretName: binder-https-auto-tls
3. Check that binderhubUI is enabled#
Enable jupyterhub.custom.binderhubUI
which will in turn enable the hub to use BinderSpawnerMixin that allows converting JupyterHub container spawners into BinderHub spawners
jupyterhub:
custom:
binderhubUI:
enabled: true
4. Check that the binderhub-service chart and network policy is enabled#
We will use the binderhub-service Helm chart to run BinderHub, the Python software, as a standalone service to build and push images with repo2docker, next to JupyterHub so we need to enable it.
binderhub-service:
enabled: true
networkPolicy:
enabled: true
5. Check that BinderHub is configured correctly#
We need to configure BinderHub so that:
it’s not running in an API only mode
it knows about where the hub is running
binderhub-service:
config:
BinderHub:
base_url: /
hub_url: https://<jupyterhub-public-url>.2i2c.cloud
badge_base_url: https://<binderhub-public-url>.2i2c.cloud
enable_api_only_mode: false
6. Check that the builder docker api and user pods are scheduled on the smallest available instance or on the hub dedicated instance type#
In general, for GCP, they should run on n2-highmem-4
and on AWS they should be placed on r5.xlarge
machines. But it’s best to double-check the cluster’s terraform or eksctl configuration files to make sure this is the smallest instance and not another one.
If you are creating a separate nodepool for the binderhub, then you can set the instance type to be the one that is used for the hub of the desired size and type.
binderhub-service:
dockerApi:
nodeSelector:
# Schedule dockerApi pods to run on the smallest user nodes only
# https://github.com/2i2c-org/infrastructure/issues/4241
2i2c/hub-name: binder
node.kubernetes.io/instance-type: n2-highmem-4
config:
KubernetesBuildExecutor:
node_selector:
# Schedule builder pods to run on the smallest user nodes only
# https://github.com/2i2c-org/infrastructure/issues/4241
2i2c/hub-name: binder
node.kubernetes.io/instance-type: n2-highmem-4
jupyterhub:
singleuser:
nodeSelector:
# Schedule users on the smallest instance
# https://github.com/2i2c-org/infrastructure/issues/4241
2i2c/hub-name: binder
node.kubernetes.io/instance-type: n2-highmem-4
7. Check the binderhub extra env variables#
These are needed by the jupyterhub software bits that the binderhub software uses.
binderhub-service:
extraEnv:
- name: JUPYTERHUB_API_TOKEN
valueFrom:
# Any JupyterHub Services api_tokens are exposed in this k8s Secret
secretKeyRef:
name: hub
key: hub.services.binder.apiToken
- name: JUPYTERHUB_CLIENT_ID
value: "service-binder"
- name: JUPYTERHUB_API_URL
value: "https://<hub-public-url>.2i2c.cloud/hub/api"
# Without this, the redirect URL to /hub/api/... gets
# appended to binderhub's URL instead of the hub's
- name: JUPYTERHUB_BASE_URL
value: "https://<hub-public-url>.2i2c.cloud/"
8. Setup logging of launch events to 2i2c#
We are sending logs of launch events to a 2i2c managed GCP project to be able to produce reports about usage in the future.
This requires an explicit opt-in in the deployments chart config and setup of credentials to the 2i2c managed GCP project.
To opt-in, this should be configured:
binderhub-service:
custom:
sendLogsOfLaunchEventsTo2i2c: true
To setup credentials, we can reuse a single GCP service account’s key already
encrypted for other BinderHub UI enabled hubs. You can use sops
to read, and
then to write.
# read from existing hub
sops config/clusters/2i2c/enc-binderhub-ui-demo.secret.values.yaml
# copy a section looking like this under binderhub-service
#
# extraCredentials:
# googleServiceAccountKey: |
# ...
# ...
# ...
#
# write it to new hub by pasting it under binderhub-service
sops config/clusters/<cluster-name>/enc-<hubname>.secret.values.yaml
II. Configuration specific to authenticated hubs#
1. Check that the simpler landing page is used#
If accessing binderhub will require users to login first, then the login page, i.e. the page where users land to login into the hub before actually seeing the binderhub UI must be updated to use a simpler version of it.
This is done by having the hub track the no-homepage-subsection
branch of the default homepage repo
jupyterhub:
custom:
homepage:
gitRepoBranch: "no-homepage-subsection"
2. Make sure we don’t redirect to singleuser server after login#
After the user logs in, don’t redirect it to it’s server as we want them to go to the binderhub launch page to configure their image before launching it.
jupyterhub:
hub:
redirectToServer: false
3. Check the binder hub service#
Setup binder
as a jupyterhub externally managed service making sure that redirection happens correctly after authentication with the OAuth provider and that users are not presented with extra prompts to login.
jupyterhub:
hub:
services:
binder:
oauth_redirect_uri: https://<binderhub-public-url>/oauth_callback
4. Check the roles#
Setup a binder
and a user
role and make sure the correct permissions are being assigned to this new service but also to the users so that they can access the service.
jupyterhub:
hub:
loadRoles:
# The role binder allows service binder to start and stop servers
# and read (but not modify) any user’s model
binder:
services:
- binder
scopes:
- servers
- read:users
# The role user allows access to the user’s own resources and to access
# only the binder service
user:
scopes:
- self
# Admin users will by default have access:services, so this is only
# observed to be required for non-admin users.
- access:services!service=binder
5. Make sure servers are spawned only for authenticated hub users#
Enable authenticated binderhub spawner setup via hub.config.BinderSpawnerMixin.auth_enabled
jupyterhub:
hub:
config:
BinderSpawnerMixin:
auth_enabled: true
6. Check the binderhub extra env variables#
There is one extra env var that needs to be set if the hub is authenticated:
binderhub-service:
extraEnv:
- name: JUPYTERHUB_OAUTH_CALLBACK_URL
value: "https://{{ binderhub_domain }}/oauth_callback"
III. Configuration specific to non-authenticated hubs#
1. Check that the NullAuthenticator is used#
This will disable the hub login page and allow binderhub to generate random usernames for user servers.
This also means that any configuration of the homepage (gitRepoBranch
or templateVars
) will just be ignored.
However, you need to disable templateVars
configuration in order to pass the validation step.
jupyterhub:
custom:
homepage:
templateVars:
enabled: false
hub:
config:
JupyterHub:
authenticator_class: "null"
2. Check admins are disabled#
No authentication, so no admins:
jupyterhub:
custom:
2i2c:
add_staff_user_ids_to_admin_users: false
3. Check the roles#
Setup a binder
and a user
role and make sure the correct permissions are being assigned to this new service but also to the users so that they can access the service.
jupyterhub:
hub:
loadRoles:
# The role binder allows service binder to start and stop servers
# and read (but not modify) any user’s model
binder:
services:
- binder
scopes:
- servers
- admin:users
# The role user allows access to the user’s own resources and to access
# only the binder service
user:
scopes:
- self
# Admin users will by default have access:services, so this is only
# observed to be required for non-admin users.
- access:services!service=binder
5. Make sure servers aren’t spawned just for authenticated hub users#
Disable authenticated binderhub spawner setup via hub.config.BinderSpawnerMixin.auth_enabled
jupyterhub:
hub:
config:
BinderSpawnerMixin:
auth_enabled: false
6. Check the singleuser cmd that is used#
If the binderhub will be unauthenticated, then we need to replace jupyterhub.singleuser.jupyterhub-singleuser
with jupyterhub.singleuser.jupyter-lab
if available or jupyterhub.singleuser.jupyter-notebook
.
Otherwise the requests to authorize the user server will get redirected to /hub/login
which always returns a 403
HTTP response code when using the null authenticator.
jupyterhub:
singleuser:
cmd:
- python3
- "-c"
- |
import os
import sys
try:
import jupyterlab
import jupyterlab.labapp
major = int(jupyterlab.__version__.split(".", 1)[0])
except Exception as e:
print("Failed to import jupyterlab: {e}", file=sys.stderr)
have_lab = False
else:
have_lab = major >= 3
if have_lab:
# technically, we could accept another jupyter-server-based frontend
print("Launching jupyter-lab", file=sys.stderr)
exe = "jupyter-lab"
else:
print("jupyter-lab not found, launching jupyter-notebook", file=sys.stderr)
exe = "jupyter-notebook"
# launch the notebook server
os.execvp(exe, sys.argv)
7. Restrict the repositories that can be built#
When deploying an unauthenticated binderhub, it’s useful to restrict what repositories can be built to avoid abuse. This can be achieved by setting
binderhub-service:
config:
GitHubRepoProvider:
allowed_specs:
- <some regex>
- <another regex>
8. Enabling CORS (optional)#
Give access to the binder service from another resource, such as live computation with a MyST website.
This requires enabling Cross-Origin Resource Sharing (CORS) on the BinderHub for both JupyterHub and BinderHub. You can restrict access to certain domains or allow access from any domain with the '*'
wildcard.
jupyterhub:
hub:
config:
BinderSpawnerMixin:
cors_allow_origin: '*'
binderhub-service:
config:
BinderHub:
cors_allow_origin: '*'
Manually handle registry creation and login#
Configuration about image registry is not yet being generated by the deployer, so the steps below need to be followed in order to set it up.
Following the steps below will require adding additional configuration to the sample file generated by the deployer.
Important
For clusters running on AWS, you can use the encrypted file located at config/clusters/template/aws/enc-binder.secret.values.yaml
and follow the manual steps below to double-check everything is there and everything is correctly setup for your use-case (it should be).
1. Setup the image registry#
Follow the guide at Setup the image registry of the imagebuilding hub.
2. Further configure the binderhub-service
chart#
Some more configuration of the binderhub-service
chart is required by following the guide at Setup the binderhub-service chart.
Specifically, we need to:
Configure `binderhub-service.config.BinderHub.image_prefix so that BinderHub knows under which prefix to push the images to the registry
Setup the credentials to push the image to the registry by the build pods under
binderhub-service.buildPodsRegistryCredentials
Setup
imagePullSecret
for pulling images from the registry if using quay.io
3. Setup the credentials needed to check for and pull existing images in the registry by the BinderHub software#
Without these credentials, images will be rebuilt unnecessarily since the BinderHub software does not have the appropriate permissions to check if an image exists in the registry.
We must pass information and credentials through DockerRegistry
so that the BinderHub software can read from the registry.
You should have the username and password for the registry from a previous step, and password
should be stored in the enc-<hub>.secret.values.yaml
file.
binderhub-service:
config:
DockerRegistry:
# registry url address like https://quay.io or https://us-central1-docker.pkg.dev
url: <url-address>
username: <username>
password: <password>
4. Sops-encrypt any credentials added to the enc-<hub>.secret.values.yaml
file#
This ensures they are not leaked.
See our sops
documentation for more info.
Tip
In setting up this config, we have repeated the username and password for the registry in a few places. You can use YAML anchors to avoid this repetition like the example config below. Anchors work for individual values as well as maps, and are preserved when sops-encrypted too!
jupyterhub:
imagePullSecret:
password: &password <password>
binderhub-service:
buildPodsRegistryCredentials:
password: *password
DockerRegistry:
password: *password