Make binderhub-ui hub

Make binderhub-ui hub#

We can support users who want to build, push and launch user images, from open source GitHub repositories, from an UI similar with mybinder.org.

We call this a binderhub-ui style hub and the primary features offered would be:

(Optional) User authentication
NO Persistent storage
BinderHub style UI
Two separate domains, one for binderhub UI and one for JupyterHub

Keeping them separate will help with having clean and correct sharing URLs without having them be based off the hub/services/:name path.
A logged out homepage, and a logged-in homepage

See 2i2c-org/infrastructure#4168 for context on this decision.

Generate a sample initial hub configuration#

Directly run the following deployer command or follow the steps in the Initial Hub setup guide to get started on a very basic hub setup or copy-paste the configuration of a similar hub, then follow the steps below.

deployer generate hub-asset binderhub-ui-values-file

Double-check the generated config#

The sample config that has been generated by the deployer command needs to be checked to make sure that everything is as expected and nothing is missing.

Follow the checklist below before committing the hub values file to the infrastructure repository.

I. General configuration#

The following configuration applies to both authenticated and not-authenticated binderhubs.

1. Check that some of the inherited configuration is emptied#

Some of the configuration that gets inherited either from the basehub defaults or from the cluster’s common value file (if that exists) needs to be clear out as not relevant for a binderhub style hub.

disable jupyterhub-home-nfs (no persistent storage)
```
jupyterhub-home-nfs:
  enabled: false
```
disable jupyterhub.custom.singleuserAdmin.extraVolumeMounts (no persistent storage so these don’t make sense)
```
jupyterhub:
  custom:
    singleuserAdmin:
      extraVolumeMounts:
```
on the singleuser server, disable storage and init containers and profile lists

There will be no persistent storage, so disable it. Because of this singleuser.extraVolumeMounts and singleuser.initContainers should also be emptied.

Also make sure that the profileList is disabled in case it gets set in the common values file, as keeping it will make binderhub fail to launch a server.
```
jupyterhub:
  singleuser:
    storage:
      type: none
      extraVolumeMounts:
      extraVolumes:
    initContainers: []
    profileList: []
```

2. Check jupyterhub and binderhub domains setup#

Having separate domains for both jupyterhub and binderhub will help with having clean and correct sharing URLs without having them be based off the hub/services/:name path.

So make sure ingress is setup correctly for both jupyterhub and binderhub and make sure that ingress.tls.secretName differs as they will be in the same namespace and naming them the same will fail the setup of the other one.

jupyterhub:
  ingress:
    hosts: [{{ jupyterhub_domain }}]
    tls:
      - hosts: [{{ jupyterhub_domain }}]
        secretName: https-auto-tls
  custom:
    binderhubUI:
      enabled: true

binderhub-service:
  ingress:
    enabled: true
    hosts: [{{ binderhub_domain }}]
    tls:
      - hosts: [{{ binderhub_domain }}]
        secretName: binder-https-auto-tls

3. Check that binderhubUI is enabled#

Enable jupyterhub.custom.binderhubUI which will in turn enable the hub to use BinderSpawnerMixin that allows converting JupyterHub container spawners into BinderHub spawners

jupyterhub:
  custom:
    binderhubUI:
      enabled: true

4. Check that the binderhub-service chart and network policy is enabled#

We will use the binderhub-service Helm chart to run BinderHub, the Python software, as a standalone service to build and push images with repo2docker, next to JupyterHub so we need to enable it.

binderhub-service:
  enabled: true
  networkPolicy:
    enabled: true

5. Check that BinderHub is configured correctly#

We need to configure BinderHub so that:

it’s not running in an API only mode
it knows about where the hub is running

binderhub-service:
  config:
    BinderHub:
      base_url: /
      hub_url: https://<jupyterhub-public-url>.2i2c.cloud
      badge_base_url: https://<binderhub-public-url>.2i2c.cloud
      enable_api_only_mode: false

6. Check that the builder docker api and user pods are scheduled on the smallest available instance or on the hub dedicated instance type#

In general, for GCP, they should run on n2-highmem-4 and on AWS they should be placed on r5.xlarge machines. But it’s best to double-check the cluster’s terraform or eksctl configuration files to make sure this is the smallest instance and not another one.

If you are creating a separate nodepool for the binderhub, then you can set the instance type to be the one that is used for the hub of the desired size and type.

binderhub-service:
  dockerApi:
    nodeSelector:
      # Schedule dockerApi pods to run on the smallest user nodes only
      # https://github.com/2i2c-org/infrastructure/issues/4241
      node.kubernetes.io/instance-type: n2-highmem-4
  config:
    KubernetesBuildExecutor:
      node_selector:
        # Schedule builder pods to run on the smallest user nodes only
        # https://github.com/2i2c-org/infrastructure/issues/4241
        node.kubernetes.io/instance-type: n2-highmem-4
jupyterhub:
  singleuser:
    nodeSelector:
      # Schedule users on the smallest instance
      # https://github.com/2i2c-org/infrastructure/issues/4241
      node.kubernetes.io/instance-type: n2-highmem-4

7. Check the binderhub extra env variables#

These are needed by the jupyterhub software bits that the binderhub software uses.

binderhub-service:
  extraEnv:
    - name: JUPYTERHUB_API_TOKEN
      valueFrom:
        # Any JupyterHub Services api_tokens are exposed in this k8s Secret
        secretKeyRef:
          name: hub
          key: hub.services.binder.apiToken
    - name: JUPYTERHUB_CLIENT_ID
      value: "service-binder"
    - name: JUPYTERHUB_API_URL
      value: "https://<hub-public-url>.2i2c.cloud/hub/api"
    # Without this, the redirect URL to /hub/api/... gets
    # appended to binderhub's URL instead of the hub's
    - name: JUPYTERHUB_BASE_URL
      value: "https://<hub-public-url>.2i2c.cloud/"

8. Setup logging of launch events to 2i2c#

We are sending logs of launch events to a 2i2c managed GCP project to be able to produce reports about usage in the future.

This requires an explicit opt-in in the deployments chart config and setup of credentials to the 2i2c managed GCP project.

To opt-in, this should be configured:

binderhub-service:
  custom:
    sendLogsOfLaunchEventsTo2i2c: true

To setup credentials, we can reuse a single GCP service account’s key already encrypted for other BinderHub UI enabled hubs. You can use sops to read, and then to write.

# read from existing hub
sops config/clusters/2i2c/enc-binderhub-ui-demo.secret.values.yaml

# copy a section looking like this under binderhub-service
#
#   extraCredentials:
#       googleServiceAccountKey: |
#         ...
#         ...
#         ...
#

# write it to new hub by pasting it under binderhub-service
sops config/clusters/<cluster-name>/enc-<hubname>.secret.values.yaml

II. Configuration specific to authenticated hubs#

1. Check that the simpler landing page is used#

If accessing binderhub will require users to login first, then the login page, i.e. the page where users land to login into the hub before actually seeing the binderhub UI must be updated to use a simpler version of it.

This is done by having the hub track the no-homepage-subsection branch of the default homepage repo

jupyterhub:
  custom:
    homepage:
      gitRepoBranch: "no-homepage-subsection"

3. Check the binder hub service#

Setup binder as a jupyterhub externally managed service making sure that redirection happens correctly after authentication with the OAuth provider and that users are not presented with extra prompts to login.

jupyterhub:
  hub:
    services:
      binder:
        oauth_redirect_uri: https://<binderhub-public-url>/oauth_callback

4. Check the roles#

Setup a binder and a user role and make sure the correct permissions are being assigned to this new service but also to the users so that they can access the service.

jupyterhub:
  hub:
    loadRoles:
      # The role binder allows service binder to start and stop servers
      # and read (but not modify) any user’s model
      binder:
        services:
          - binder
        scopes:
          - servers
          - read:users
      # The role user allows access to the user’s own resources and to access
      # only the binder service
      user:
        scopes:
          - self
          # Admin users will by default have access:services, so this is only
          # observed to be required for non-admin users.
          - access:services!service=binder

5. Make sure servers are spawned only for authenticated hub users#

Enable authenticated binderhub spawner setup via hub.config.BinderSpawnerMixin.auth_enabled

jupyterhub:
  hub:
    config:
      BinderSpawnerMixin:
        auth_enabled: true

6. Check the binderhub extra env variables#

There is one extra env var that needs to be set if the hub is authenticated:

binderhub-service:
  extraEnv:
    - name: JUPYTERHUB_OAUTH_CALLBACK_URL
      value: "https://{{ binderhub_domain }}/oauth_callback"

III. Configuration specific to non-authenticated hubs#

1. Check that the NullAuthenticator is used#

This will disable the hub login page and allow binderhub to generate random usernames for user servers.

This also means that any configuration of the homepage (gitRepoBranch or templateVars) will just be ignored. However, you need to disable templateVars configuration in order to pass the validation step.

jupyterhub:
  custom:
    homepage:
      templateVars:
        enabled: false
  hub:
    config:
      JupyterHub:
        authenticator_class: "null"

2. Check admins are disabled#

No authentication, so no admins:

jupyterhub:
  custom:
    2i2c:
      add_staff_user_ids_to_admin_users: false

3. Check the roles#

Setup a binder and a user role and make sure the correct permissions are being assigned to this new service but also to the users so that they can access the service.

jupyterhub:
  hub:
    loadRoles:
      # The role binder allows service binder to start and stop servers
      # and read (but not modify) any user’s model
      binder:
        services:
          - binder
        scopes:
          - servers
          - admin:users
      # The role user allows access to the user’s own resources and to access
      # only the binder service
      user:
        scopes:
          - self
          # Admin users will by default have access:services, so this is only
          # observed to be required for non-admin users.
          - access:services!service=binder

5. Make sure servers aren’t spawned just for authenticated hub users#

Disable authenticated binderhub spawner setup via hub.config.BinderSpawnerMixin.auth_enabled

jupyterhub:
  hub:
    config:
      BinderSpawnerMixin:
        auth_enabled: false

6. Check the singleuser cmd that is used#

If the binderhub will be unauthenticated, then we need to replace jupyterhub.singleuser.jupyterhub-singleuser with jupyterhub.singleuser.jupyter-lab if available or jupyterhub.singleuser.jupyter-notebook.

Otherwise the requests to authorize the user server will get redirected to /hub/login which always returns a 403 HTTP response code when using the null authenticator.

jupyterhub:
  singleuser:
    cmd:
      - python3
      - "-c"
      - |
        import os
        import sys

        try:
            import jupyterlab
            import jupyterlab.labapp
            major = int(jupyterlab.__version__.split(".", 1)[0])
        except Exception as e:
            print("Failed to import jupyterlab: {e}", file=sys.stderr)
            have_lab = False
        else:
            have_lab = major >= 3

        if have_lab:
            # technically, we could accept another jupyter-server-based frontend
            print("Launching jupyter-lab", file=sys.stderr)
            exe = "jupyter-lab"
        else:
            print("jupyter-lab not found, launching jupyter-notebook", file=sys.stderr)
            exe = "jupyter-notebook"

        # launch the notebook server
        os.execvp(exe, sys.argv)

7. Restrict the repositories that can be built#

When deploying an unauthenticated binderhub, it’s useful to restrict what repositories can be built to avoid abuse. This can be achieved by setting

binderhub-service:
  config:
    GitHubRepoProvider:
      allowed_specs:
        - <some regex>
        - <another regex>

8. Enabling CORS (optional)#

Give access to the binder service from another resource, such as live computation with a MyST website.

This requires enabling Cross-Origin Resource Sharing (CORS) on the BinderHub for both JupyterHub and BinderHub. You can restrict access to certain domains or allow access from any domain with the '*' wildcard.

jupyterhub:
  hub:
    config:
      BinderSpawnerMixin:
        cors_allow_origin: '*'

binderhub-service:
  config:
    BinderHub:
      cors_allow_origin: '*'

Manually handle registry creation and login#

Configuration about image registry is not yet being generated by the deployer, so the steps below need to be followed in order to set it up.

Following the steps below will require adding additional configuration to the sample file generated by the deployer.

Important

For clusters running on AWS, you can use the encrypted file located at config/clusters/template/aws/enc-binder.secret.values.yaml and follow the manual steps below to double-check everything is there and everything is correctly setup for your use-case (it should be).

1. Setup the image registry#

Follow the guide at Setup the image registry of the imagebuilding hub.

2. Further configure the `binderhub-service` chart#

Some more configuration of the binderhub-service chart is required by following the guide at Setup the binderhub-service chart. Specifically, we need to:

Configure `binderhub-service.config.BinderHub.image_prefix so that BinderHub knows under which prefix to push the images to the registry
Setup the credentials to push the image to the registry by the build pods under binderhub-service.buildPodsRegistryCredentials
Setup imagePullSecret for pulling images from the registry if using quay.io

3. Setup the credentials needed to check for and pull existing images in the registry by the BinderHub software#

Without these credentials, images will be rebuilt unnecessarily since the BinderHub software does not have the appropriate permissions to check if an image exists in the registry.

We must pass information and credentials through DockerRegistry so that the BinderHub software can read from the registry. You should have the username and password for the registry from a previous step, and password should be stored in the enc-<hub>.secret.values.yaml file.

binderhub-service:
  config:
    DockerRegistry:
      # registry url address like https://quay.io or https://us-central1-docker.pkg.dev
      url: <url-address>
      username: <username>
      password: <password>

4. Sops-encrypt any credentials added to the `enc-<hub>.secret.values.yaml` file#

This ensures they are not leaked. See our sops documentation for more info.

Tip

In setting up this config, we have repeated the username and password for the registry in a few places. You can use YAML anchors to avoid this repetition like the example config below. Anchors work for individual values as well as maps, and are preserved when sops-encrypted too!

jupyterhub:
  imagePullSecret:
    password: &password <password>
binderhub-service:
  buildPodsRegistryCredentials:
    password: *password
  DockerRegistry:
    password: *password