Dev workflows

Account and workspace creation

  1. Go to https://app.dataland.land/signup
  2. Go through the Email sign up flow (not Google/GitHub/OAuth)
  3. Go to Mailslurper hosted on http://localhost:4436 to verify your email
  4. Create a workspace - this automatically creates:
    1. A Workspace in the Orchestration Service
    2. A Database in the Database Service
    3. A Repo in the Logic Service
    4. A History Database in the History Service

Loading test data

To load TPC-H tables into a Dataland database:

just db-ls

# replace with real database uuid
just tpch-run-pipeline {{database_uuid}}

# if you've already done `run-tpch-pipeline` before, you don't need to re-generate the data
just tpch-import {{database_uuid}}

To get rid of the TPC-H tables:

just tpch-drop {{database_uuid}}

Frontend development

Usual iteration workflow:

cd web/dataland-app
npm run dev

This will run webpack-dev-server on https://dev-app.dataland.land.

For a full production/release build, you can run npm run build which will run webpack and emit the output files to web/dataland-app/build. The nginx container as defined in docker-compose.yaml will serve the release assets on https://app.dataland.land.

If you make changes to any of the libraries (dataland-api, dataland-sdk, dataland-wasm) you need to explicitly rebuild them in order for webpack (i.e. npm run dev) to incorporate the changes into the bundle:

# Maybe replace with `dataland-sdk`, etc.
cd web/dataland-api
npm run build

# Or if you want to just use one command that always works but is slower
just build-web

If you make changes to any of the protobuf definitions in the top-level proto directory, you need to explicitly run the codegen process to update the dataland-api library:

just build-api

# Or if you want to directly propagate the changes all the way through in one go
just build-api build-web

Big tables are slow to load?

In the development builds, we activate several middlewares which will inspect the entire Redux state tree after every action that gets dispatched. These middlewares are very helpful for catching bugs.

But for big tables like TPC-H customer or part (150K rows and 200K rows respectively), the data itself is part of the Redux state tree, so iterating through the entire state tree becomes very expensive.

In production builds, webpack strips out the code which adds these middlewares, so production should continue to be fast. You can always verify that this remains true by checking https://app.dataland.land.

Backend development

// TODO(hzuo)