Installing Semantic Treehouse

Introduction

This documentation page describes how to install and deploy a Semantic Treehouse environment. It might seem daunting, but we hope this document clarifies how to get STH up and running. As this is a work in progress, we will improve this guide in the coming period and are open to suggestions.

Semantic Treehouse is a web application with many functionalities, components and customizations. We use Helm for the installation process. Helm is the package manager for Kubernetes which facilitates deploying applications to a cloud infrastructure.

This Helm chart is an almost-complete configuration package to get a production-ready Semantic Treehouse application up and running, from the application itself to the database migration system to optionally an ingress or load balancer. The following is not included however:

A MariaDB database cluster. You need to provide the STH app access to a pre-existing database. This guide does show you what the application requires from this cluster.
Additional services for Semantic Treehouse. These are not yet publicly available, and are not required for a functional STH application. We have the following side services:
- XML Validator - for validating XML message instances
- JSON Validator - for validating JSON message instances
- WebVOWL - for visualizing ontologies
- A JSON schema preprocessor

info

For this installation procedure, a working knowledge with Kubernetes, Helm and the Linux command line is required.

Preliminaries

Preparing your Kubernetes cluster

For the installation of Semantic Treehouse, we need access to a Kubernetes cluster.

First and foremost, we need a Kubernetes cluster up and running. This can be a cluster running in the cloud through Google Cloud, AWS or Azure, or you can create a local (development/testing) cluster with tools like Minikube, Kind, MicroK8s or Docker Desktop.
- Recommended minimum amount of resources for an STH web application:
  - CPU: 2vCPU is more than enough
  - RAM: 256MB is OK, but 512MB is better
  - Disk: defaults to 2GB, but this can be customized
You should be able to access this cluster through the command-line management tool kubectl (link). You can find information on how to authenticate this tool with your cluster through the documentation of the Kubernetes cluster vendor of your choice.
Finally, you will need the Helm CLI tool. This program will interface with kubectl, re-using the authentication you've set-up in the previous step. Now is also a good time to add the STH repository to helm:
```
helm repo add sth https://charts.semantic-treehouse.nl
```
This will give us access to the STH Helm-chart later down the line.

It is good practice to separate different applications running in your Kubernetes cluster through namespaces. We can set the namespace for this Semantic Treehouse installation with the following commands:

kubectl create namespace sth
kubectl config set-context --current --namespace=sth

Preparing your database

The Semantic Treehouse Helm chart expects a MariaDB database to be up and running, and available within the kubernetes cluster through a service. For Semantic Treehouse to function properly we need some specific MariaDB configuration settings that are non-default, so be sure to set them on your MariaDB server (MariaDB configuration):

[mysqld]
# Added for Semantic Treehouse
lower-case-table-names=1
sql_mode='ANSI,TRADITIONAL,ANSI_QUOTES'
character-set-server=utf8mb4
collation-server=utf8mb4_nopad_bin

The STH chart further expects the following to be present inside the MariaDB database:

A schema, which will be used by the Semantic Treehouse application;
A migration user, that (only) has privileges to define data within this schema (i.e. creating, editing and dropping tables);
A "regular" user, that (only) has privileges to manipulate data on all tables within this schema (i.e. adding, updating and deleting data within present tables).

info

Inside MariaDB a schema is also known as a database, but to prevent confusion between this and a database cluster, we use the term schema in this guide.

caution

The STH application assumes MariaDB v10.6. Newer versions might work, but this is not guaranteed and might cause some issues. Don't use older versions.

Grant yourself access to a MariaDB console, with at least the privileges to create schemas and users and add grants. You can now use the following SQL-script to create the aforementioned inside the database.

CREATE SCHEMA mysth
    DEFAULT CHARACTER SET = 'utf8mb4' 
    COLLATE = 'utf8mb4_nopad_bin';

CREATE USER 'sth-migration-user' IDENTIFIED BY 'password1';
GRANT ALL PRIVILEGES ON mysth.* to 'sth-migration-user';

CREATE 'sth-user' IDENTIFIED BY 'password2';
GRANT SELECT,INDEX,UPDATE,SHOW,CREATE VIEW,SHOW VIEW,DELETE,INSERT ON mysth.* TO 'sth-user';

Be sure to replace 'password1' and 'password2' with secure passwords (e.g. random alphanumeric strings). Remember them, as we will need them later. Further notes:

"mysth" is the name of the schema for this specific Semantic Treehouse environment. Changing this allows you to have multiple environments using the same DB cluster whilst keeping their data completely separate.
'sth-user' and 'sth-migration-user' are the names for the regular and migration users in the schema. You can change it if you wish, but remember them for later.

Preparing required secrets

The database is properly configured for a Semantic Treehouse environment, but we need a way to give the application access to the database users and their respective passwords. We do that through Kubernetes secrets. The name of the secret and selected username for both users is important, because we'll configure it in the Helm values file.

 kubectl create secret generic mysth-migration-db-userpass \
    --from-literal username='sth-migration-user' \
    --from-literal password='password1'

 kubectl create secret generic mysth-sth-db-userpass \
    --from-literal username='sth-user' \
    --from-literal password='password2'

Of course, "sth-migration-user", "sth-user", "password1" and "password2" must match the values that you have entered in the database preparation step.

tip

Note that we've added a space character in front of the two commands above. This is intentional: it excludes the command from the shell history. Useful when dealing with passwords in plain text!

Preparing the Helm deployment

We can configure the STH deployment through a helm values file: a YAML file that configures Helm charts. You can get all possible configuration options, and their default values, for the STH chart by running:

helm show values sth/semantic-treehouse

But only a couple of these configuration options are required to create a functioning Semantic Treehouse application. A minimal configuration example looks as follows.

minimal.yaml
sth:
  image:
    tag: v3.3
  initialAdminEmail: [email-address]
  config: |-
    settings:
      oauthlogin.identityProviders:
        google:
          class: SemanticTreehouse\SIAM\IDP\Google
          handler: google
          name: Google
          logoUrl: assets/images/oauth/logo-google.svg
          authBase: https://accounts.google.com/o/oauth2/auth
          clientId: [google-oauth-client-id]
          clientSecret: [google-oauth-client-secret]
          tokenUrl: https://accounts.google.com/o/oauth2/token
          apiUrl: https://www.googleapis.com/userinfo/v2/me
          scopes:
            - https://www.googleapis.com/auth/userinfo.email

database:
    existingService: [name-of-mariadb-service]
    schemaName: mysth
    migrationUser:
      existingSecret: mysth-migration-db-userpass
    sthUser:
      existingSecret: mysth-sth-db-userpass

flyway:
  image:
    tag: v3.3

We have highlighted all lines that must be filled in by you. Let's walk through them step by step.

initialAdminEmail — The deployment has an install script that initializes the database. It will also add an initial admin account linked to the e-mail address that is provided through this config option. If it's not set, it's not possible to log into your new STH app.
oauthlogin.identityProviders — Semantic Treehouse handles user authentication through OAuth, with several third party identity providers. You should provide at least one, and you should be able to login through that party with the e-mail address that you've specified in the initialAdminEmail field.
- The example above shows how to configure your STH app to use the Google OAuth provider (setting up Google OAuth). This is one of the free options (for less than 50 thousand monthly active users). On this page you can find all other OAuth providers that are supported out of the box.
database.existingService — As mentioned before, the MariaDB database should be accessible in the cluster through a service. You should enter the name of that service here, so the STH app can find the database.
Within database, the fields schemaName, sthUser.existingSecret and migrationUser.existingSecret refer to the database schema name and the names of the DB user secrets you've created before for the regular and migration user, respectively. If you've deviated from the default naming, then you should also add the correct names in this config.
[flyway|sth].image.tag — Be sure that flyway and sth always have matching image tags, as they depend on each other. You can leave them out too, defaulting to the application version that is specified in the Helm chart.

note

Using this values file (with the highlighted lines correctly filled-in) will create a minimal working Semantic Treehouse application. But:

There is no ingress configured, so you can only access the app through port forwarding;
There are no resource limits in place that would prevent the STH application to claim more than a fair share of the available resources in the cluster.

The page Advanced configuration goes into detail about advanced configuration options for the STH chart. Amongst other things, it will explain how to properly enable and configure the points above. You can also look at the documentation contained within the default values file (helm show values sth/semantic-treehouse) for additional information.

Installing your Semantic Treehouse environment

Congratulations, you have prepared everything that's necessary to deploy a Semantic Treehouse application. Now comes the easy part!

Installing a fresh environment

With Helm, we can easily install a fresh environment with the following command:

helm install \
  mysth \
  sth/semantic-treehouse \
  -f minimal.yaml \
  --atomic

Again, the highlighted lines may need to be changed. The first line indicates the name of the Helm deployment, which will also be used to name the Kubernetes resources. The second line should point to the values YAML file that you created in the previous step.

The command will wait until the application is ready for use, or will roll back all changes it made if something went wrong. It will also set up the database schema, which includes creating the first admin user with the e-mail address you've provided in the values file. When the command finishes, you can access the STH application through port forwarding:

kubectl port-forward svc/mysth 8080:80

Congratulations, you have set up a working Semantic Treehouse application! You can now log in with your admin account and start using the app. Once you've verified the app is in working condition, you can restart it in production mode:

kubectl upgrade mysth sth/semantic-treehouse -f minimal.yaml --set sth.productionMode=true --atomic

Production mode removes access to some debug-only API endpoints. More information can be found in the page Advanced configuration.

danger

Make sure that production mode is enabled if you expose your application to the Internet! Leaving production mode off leaves your application vulnerable to outsider access.

Upgrading an existing Helm-environment

After you've made changes to your values file, you need to re-apply it to the deployment. We do that through the helm upgrade command:

kubectl upgrade mysth sth/semantic-treehouse -f minimal.yaml --atomic

After you've upgraded the Semantic Treehouse version, then there may be some pending database changes. These will also be automatically applied. Your application should remain accessible during the upgrade process, allowing for seamless upgrades.

Uninstalling an environment

danger

This will delete the entire Semantic Treehouse application. Note that this command leaves the database intact; you can re-attach an STH app to it if you're deleting the application temporarily, or you have to separately purge the corresponding database schema if you're not planning to re-use this data.

helm delete mysth

Introduction​

Preliminaries​

Preparing your Kubernetes cluster​

Preparing your database​

Preparing required secrets​

Preparing the Helm deployment​

Installing your Semantic Treehouse environment​

Installing a fresh environment​

Upgrading an existing Helm-environment​

Uninstalling an environment​