Dev workflows
Account and workspace creation
- Go to https://app.dataland.land/signup
- Go through the Email sign up flow (not Google/GitHub/OAuth)
- Go to Mailslurper hosted on http://localhost:4436 to verify your email
- Create a workspace - this automatically creates:
- A Workspace in the Orchestration Service
- A Database in the Database Service
- A Repo in the Logic Service
- A History Database in the History Service
Loading test data
To load TPC-H tables into a Dataland database:
just db-ls
# replace with real database uuid
just tpch-run-pipeline {{database_uuid}}
# if you've already done `run-tpch-pipeline` before, you don't need to re-generate the data
just tpch-import {{database_uuid}}
To get rid of the TPC-H tables:
just tpch-drop {{database_uuid}}
Frontend development
Usual iteration workflow:
cd web/dataland-app
npm run dev
This will run webpack-dev-server on https://dev-app.dataland.land.
For a full production/release build, you can run npm run build
which will run webpack and emit the output files
to web/dataland-app/build
. The nginx
container as defined in docker-compose.yaml
will serve the release assets
on https://app.dataland.land.
If you make changes to any of the libraries (dataland-api
, dataland-sdk
, dataland-wasm
)
you need to explicitly rebuild them in order for webpack (i.e. npm run dev
) to incorporate the changes
into the bundle:
# Maybe replace with `dataland-sdk`, etc.
cd web/dataland-api
npm run build
# Or if you want to just use one command that always works but is slower
just build-web
If you make changes to any of the protobuf definitions in the top-level proto
directory, you need to explicitly
run the codegen process to update the dataland-api
library:
just build-api
# Or if you want to directly propagate the changes all the way through in one go
just build-api build-web
Big tables are slow to load?
In the development builds, we activate several middlewares which will inspect the entire Redux state tree after every action that gets dispatched. These middlewares are very helpful for catching bugs.
But for big tables like TPC-H customer
or part
(150K rows and 200K rows respectively),
the data itself is part of the Redux state tree, so iterating through the entire state tree
becomes very expensive.
In production builds, webpack strips out the code which adds these middlewares, so production should continue to be fast. You can always verify that this remains true by checking https://app.dataland.land.
Backend development
// TODO(hzuo)