This week in Databend #25

Databend aimed to be an open source elastic and reliable cloud warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

Improvement

Build/Test/CI

Bug fixes

Experimental

A series of refactorings will be carried out at datavalues-dev branch to complete the migration. See RFC - new datavalues system design.

Tips

Let's learn a weekly tip from Databend.

Build/Test Databend with Dev container

In #3853 , we have introduced a development container to make it easy for contributors to build and test Databend.

build binary artifacts

./scripts/setup/run_docker.sh  make build

run test

./scripts/setup/run_docker.sh  make test

debug or get into dev container

./scripts/setup/run_docker.sh /bin/bash

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Ecosystem/Upstream

From open source, for open source. Our team is also committed to contributing to the Rust ecosystem and upstream dependencies.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #24

Databend aimed to be an open source elastic and reliable cloud warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

Improvement

Bug fixes

Tips

Let's learn a weekly tip from Databend.

New datatype system design

We need to redesign the datatype system because current implementation had some shortcomes.

Now, DataType is an enum type:

  • We must use specific type after matching.
  • We can't use it as generic argument.
  • It may involve some nested datatypes.
  • And hard to put attributes into it.

Want to find out how we will improve the DataType system? Please check the RFC - New datatype system design.

Plan to jump to specific implementation and track progress? Databend#3794 might be for you.

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Ecosystem/Upstream

From open source, for open source. Our team is also committed to contributing to the Rust ecosystem and upstream dependencies.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #23

Databend aimed to be an open source elastic and reliable cloud warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

Improvement

Performance Improvement

Bug fixes

Tips

Let's learn a weekly tip from Databend.

Tracing in Databend

Databend using Rust's tracing ecosystem tokio-tracing to do log and profile.

Distributed tracing with Jaeger

Jaeger, is a distributed tracing platform. It can be used for monitoring microservices-based distributed systems.

  • 4 steps to enable Jaeger monitor

    • build databend-query: cargo build --bin databend-query
    • run with DEBUG log level: LOG_LEVEL=DEBUG ./databend-query
    • start jaeger: docker run -d -p6831:6831/udp -p6832:6832/udp -p16686:16686 jaegertracing/all-in-one:latest
    • Open http://127.0.0.1:16686/
  • Jaeger Tracing Show

jaeger-tracing-show

  • Read More: https://databend.rs/dev/development/tracing#distributed-tracing-with-jaeger

Explore and diagnose with tokio-console

tokio-console is a diagnostics and debugging tool for asynchronous Rust programs.

  • 3 steps to enable console subscriber

    • build databend-query with rustflags & features: RUSTFLAGS="--cfg tokio_unstable" cargo build --bin databend-query --features tokio-console
    • run with the log level of TRACE: LOG_LEVEL=TRACE databend-query
    • run tokio-console
  • Run tokio-console to explore databend-query

query-console

  • Read More: https://databend.rs/dev/development/tracing#explore-and-diagnose-with-tokio-console

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Ecosystem/Upstream

From open source, for open source. Our team is also committed to contributing to the Rust ecosystem and upstream dependencies.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #22

Databend aimed to be an open source elastic and reliable cloud warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

Improvement

Performance Improvement

  • improve performance of arithmetic plus functions by @zhyass (#3615)

Bug fixes

Tips

Let's learn a weekly tip from Databend.

Have fun with Databend UDF (User Defined Function)

In the last week we have introduced an experimental UDF engine, thanks @lianghanzhen. Let's try it out together.

Create a UDF

Databend supports the use of expressions as user defined functions. We can easily create a user-defined function in lambda-like style with CREATE FUNCTION <fn-name> AS (<fn-param0>, ...) -> <fn-expr>. Let's create a custom function to calculate the mean of two numbers together.

mysql> CREATE FUNCTION mean2number AS (x, y) -> (x + y) / 2;
Query OK, 0 rows affected (0.05 sec)
Read 0 rows, 0 B in 0.037 sec., 0 rows/sec., 0 B/sec.

Call a UDF

Calling UDF is the same as calling any other function. In the following example we have calculated the mean of 150 and 250.

mysql> SELECT mean2number(150, 250);
+-------------------+
| ((150 + 250) / 2) |
+-------------------+
|               200 |
+-------------------+
1 row in set (0.02 sec)
Read 1 rows, 1 B in 0.018 sec., 55.59 rows/sec., 55.59 B/sec.

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Ecosystem/Upstream

From open source, for open source. Our team is also committed to contributing to the Rust ecosystem and upstream dependencies.

  • datafuselabs/openraft: An implementation of the Raft distributed consensus protocol using the Tokio framework. The async-raft fork, maintained by the databend team, fixes serious bugs.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #21

Databend aimed to be an open source elastic and reliable cloud warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

This week, the migration of the Databend website to Docusaurus was completed; it is now hosted on a service sponsored by Vercel. Please enjoy it.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

Improvement

Performance Improvement

Bug fixes

Tips

Let's learn a weekly tip from Databend.

What's happening in Databend SQL layer

With #2983 ready for review, Databend's SQL layer will see the arrival of a new planner framework.

In fact, we have a number of plans for refactoring the SQL layer, which may even include a rewritten parser.

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Ecosystem/Upstream

From open source, for open source. Our team is also committed to contributing to the Rust ecosystem and upstream dependencies.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #20

Databend aimed to be an open source elastic and reliable cloud warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

This week, Databend releases v0.6.0-nightly and begins a new six-week iteration. To learn about the main changes in v0.6, please see Checklist proposal: Nightly v0.6.

Get a sneak peek at the goals of v0.7 ? Check out Checklist proposal: Nightly v0.7.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

Improvement

Build / Test / CI

Bug fixes

Tips

Let's learn a weekly tip from Databend.

Discuss proposals for Databend

We currently have some discussions about proposals on the GitHub that may help you understand the mechanics of the work or get involved.

  • Query Cache

    Clever use of caching can provide effective acceleration for Databend. We are implementing a single node memory-disk level 2 cache. Perhaps we will be moving towards distributed caching soon, more discussion on design and implementation is welcome, let's hear your thoughts. Related discussion #3478.

  • Re-organise our building systems

    Our current build / test system is quite complex. We have Makeflie, Dockerfile(s), and a lot of shell / python scripts. We want to build a build/test/benchmark system based on the Rust style, so if you have any good ideas, please feel free to share to us. Related discussion #3419.

  • Refactor CI pipleline into stages

    If you're interested in GitHub workflows, then take a look at this proposal. By refactoring CI, we have effectively reduced blocking and been able to maintain it better. Related discussion #3415.

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Ecosystem/Upstream

From open source, for open source. Our team is also committed to contributing to the Rust ecosystem and upstream dependencies.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #19

Databend aimed to be an open source elastic and reliable cloud warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

Improvement

Performance Improvement

Bug fixes

Tips

Let's learn a weekly tip from Databend.

How to explore github repos via Databend

Databend now supports GitHub as a data source, and you can read the relevant code at storages/github.

create github-engine based database

Before running databend, your Github Access Token should be set.

export GITHUB_TOKEN=<your_token>;

Create a Github powered database.

databend :) create database datafuselabs engine=github;

0 rows in set. Elapsed: 2.611 sec. 

show all tables

Show all tables in this database, which are currently flattened. This means that Repos, issues and PRs are all in the form of tables.

databend :) use datafuselabs;

0 rows in set. Elapsed: 0.013 sec.

databend :) show tables;

+---------------------------------+
| name                            |
+---------------------------------+
| .github                         |
| .github_comments                |
| .github_issues                  |
| .github_prs                     |
| databend                        |
| databend-playground             |
| databend-playground_comments    |
| databend-playground_issues      |
| databend-playground_prs         |
| databend_comments               |
| databend_issues                 |
| databend_prs                    |
| ...                             |
+---------------------------------+

36 rows in set. Elapsed: 0.053 sec. 

View basic information about a repo

databend :) select * from databend;

+------------+----------+------------+------------+-------------+----------------+-------------------+-------------------+
| reposiroty | language | license    | star_count | forks_count | watchers_count | open_issues_count | subscribers_count |
+------------+----------+------------+------------+-------------+----------------+-------------------+-------------------+
| databend   | Rust     | apache-2.0 |       2661 |         252 |           2661 |               349 |                63 |
+------------+----------+------------+------------+-------------+----------------+-------------------+-------------------+

1 rows in set. Elapsed: 1.368 sec. 

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Ecosystem/Upstream

From open source, for open source. Our team is also committed to contributing to the Rust ecosystem and upstream dependencies.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #18

Databend aimed to be an open source elastic and reliable cloud warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

Improvement

Performance Improvement

Bug fixes

Tips

Let's learn a weekly tip from Databend.

How to load data into Databend

Databend now supports the loading of data via the following methods:

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Ecosystem/Upstream

From open source, for open source. Our team is also committed to contributing to the Rust ecosystem and upstream dependencies.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #17

Databend aimed to be an open source elastic and reliable cloud warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

Improvement

Bug fixes

Tips

Let's learn a weekly tip from Databend.

What's scalar functions

Scalar functions (sometimes referred to as User-Defined Functions / UDFs) return a single value as a return value for each row, not as a result set, and can be used in most places within a query or SET statement, except for the FROM clause.

Want to learn how to add scalar functions to Databend? We have a document discussing this topic. learn more: how to write scalar functions.

Good First Issues

If you are interested in scalar functions that work with strings, you can check #3004. You can comment like: /assignme in the subtask issue to make this subtask assigned to you.

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #16

Databend aimed to be an open source elastic and reliable cloud warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

Improvement

Bug fixes

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Ecosystem/Upstream

From open source, for open source. Our team is also committed to contributing to the Rust ecosystem and upstream dependencies.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.