This week in Databend #43

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • add system$clustering_information function (#5426)
  • add statistics to TableMeta (#5476)
  • add metasrv time travel functions (#5468 & #5566)
  • snapshot timestamp & navigation (#5535)
  • undrop table & show history (#5562)

Migrate to new planner

  • translate subquery into apply operator (#5510)
  • common tree structure formatter for plan display (#5512)
  • support TRIM function in new planner (#5541)
  • support array literal in new planner (#5551)

New functions

  • support object_keys function (#5461)
  • support compare variant with other data types (#5463)
  • support variant max/min functions (#5525)
  • support variant as function (#5442)
  • add user function (#5584)

Improvement

  • replace todos in datavalues with ErrorCode (#5475)
  • add stage quota for tenant (#5575)
  • simplify catalog and meta (#5560)

hash join performance improvement

  • optimize hash join, ~6x performance improvement (#5497)
  • use DataBlock::gather_blocks in hash join, ~2x performance improvement in some cases (#5534)

Build / Testing / CI

  • fix bugs of logic test scripts and some cases (#5578)

Documentation

  • add i18n support with crowdin (#5545),

Bug fixes

  • fix retention aggregation coredump bug (#5450)
  • fix server hang when parallel execute query (#5482)
  • fix wrong output in hash join (#5538 & #5539)

Tips

Let's learn a weekly tip from Databend.

Do Conversion Funnel Analysis With Databend

Funnel analysis measures the number of unique users who has performed a set of actions, and we use it to see drop-off and conversion in multi-step processes.

Create a Table

CREATE TABLE events(user_id BIGINT, event_name VARCHAR, event_timestamp TIMESTAMP);

Now we have a table with the following fields:

  • user_id - a unique identifier for user
  • event_name - type of the event, like: login, visit, cart and purchase
  • event_timestamp - timestamp which event occurred

Funnel Analysis

It's easy and performance to use Databend WINDOW_FUNNEL Function to find out how far the user user_id could get through the chain in an hour window slides.

SELECT
    level,
    count() AS count
FROM
(
    SELECT
        user_id,
        window_funnel(3600000000)(event_timestamp, 
        event_name = 'login', event_name = 'visit', 
        event_name = 'cart', event_name = 'purchase') AS level
    FROM events
    GROUP BY user_id
)
GROUP BY level ORDER BY level ASC;

Learn more:

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

ariesdevilb41shBohuTANGChasen-Zhangdantengskydrmingdrmer
ariesdevilb41shBohuTANGChasen-Zhangdantengskydrmingdrmer
everpcpcfkunerflaneur2020hantmachanyisongjunnplus
everpcpcfkunerflaneur2020hantmachanyisongjunnplus
Kikkonleiyskylichuangmergify[bot]soyeric128sundy-li
Kikkonleiyskylichuangmergify[bot]soyeric128sundy-li
TCeasonXuanwoxudong963ygf11youngsofunZeaLoVe
TCeasonXuanwoxudong963ygf11youngsofunZeaLoVe
zhang2014ZhiHanZzhyassZuoFuhong
zhang2014ZhiHanZzhyassZuoFuhong

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #42

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • support date_add for new parser (#5419)
  • add metrics in metasrv (#5208)

Migrate to new planner

  • support limit for new planner (#5301)
  • support TRY_CAST, EXTRACT and INTERVAL (#5362)
  • enhance GROUP BY semantic check (#5431)

Work around JOIN

  • support using and natural (#5423)

RFC

Improvement

  • avoid ub and delele unnecessary unsafe (#5338)
  • fix get function unnecessary double column loop (#5349)
  • display friendly error if not started with valid flags (#5443)

Build / Testing / CI

  • Sql logic test framework (#5416)

Bug fixes

  • fix some cases in aggregator plan (#5307)
  • fix wrong result of memory table engine (#5364)
  • fix server hang when processor panic (#5394)

Tips

Let's learn a weekly tip from Databend.

Analyze User Retention With Databend

User retention helps you analyze how many users return to your product or service. Let's go through an example and see how to analyze it in Databend.

Create a Table

CREATE TABLE events(`user_id` INT, `login_date` DATE);

Now we have a table with the following fields:

  • user_id - a unique identifier for user
  • login_date - user login date

User Retention Analysis

It's easy and performance to use Databend Retention Function to do the user retention analysis.

SELECT
    sum(r[0]) AS r1,
    sum(r[1]) AS r2,
    sum(r[2]) AS r3
FROM
(
    SELECT
        user_id,
        retention(login_date = '2022-05-15', login_date = '2022-05-16', login_date = '2022-05-17') AS r
    FROM events
    GROUP BY user_id
);

Learn more:

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilb41shBohuTANGdrmingdrmereverpcpc
andylokandyariesdevilb41shBohuTANGdrmingdrmereverpcpc
fkunerjunnplusleiyskylichuangmergify[bot]RinChanNOWWW
fkunerjunnplusleiyskylichuangmergify[bot]RinChanNOWWW
sundy-liTCeasonusamoiweakishwubxXuanwo
sundy-liTCeasonusamoiweakishwubxXuanwo
xudong963ygf11youngsofunZeaLoVezhang2014
xudong963ygf11youngsofunZeaLoVezhang2014

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #41

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

Migrate to new planner

Work around Array

Improvement

  • databend-query: support two-level hashmap by @fkuner, (#5075)
  • databend-query: implement format trait, load CSV/Parquet will be a streaming read by @zhang2014, (#5167 & #5271)
  • databend-query: make expression serialized to raw sql by @sundy-li, (#5260)
  • databend-meta: store endpoints to metasrv and use balance endpoints grpc connection channel by @ariesdevil, (#4987)

Bug fixes

  • databend-query(pipeline): fix state machine of hash join by @leiysky, (#5242)
  • databend-query(parser): show alternative tokens even if the branch is optional by @andylokandy, (#5230)

Tips

Let's learn a weekly tip from Databend.

Start a Databend Cluster

Databend has been designed from day one to be a cloud-native and distributed data warehouse.

The new databend-query node only needs to register itself to the databend-meta with the same cluster_id, they will autodiscovery and formed into a cluster.

Cluster Arch

Read the following article to start your first Databend cluster:

Note

  • Databend Cluster mode only works on shared storage(AWS S3 or MinIO s3-like storage).
  • This cluster mainly used for testing purpose, it is not targeted for production use.

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #40

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • databend-query(common): introduce a helper ExpressionEvaluator to simplify expression evaluation by @leiysky, (#5108)
  • databend-query(parser): support more statements by @andylokandy, (#5089)
  • databend-query: aggregator in new planner by @xudong963, (#5027 & #5115)
  • common-functions: add scalar function humanize by @cadl, (#5073)

Improvement

  • databend-query(processor): refactor insert into query for fuse engine by @zhang2014, (#5139)
  • datablocks: add gather kernels for datablock by @sundy-li, (#5114)
  • common-meta: CreateDatabaseReq/DropDatabaseReq use DatabaseNameIdent to specify to db to create/delete by @drmingdrmer, (#5102 & #5104)
  • common-functions: specialize CastFunction with from type by @leiysky, (#5124)
  • datavalues: remove useless precision convert & remove tz in type timestamp by @Veeupup, (#5084)
  • datavalues: using enum_dispatch to represent data_type by @PsiACE, (#5063)

Documentation

Bug fixes

  • databend-query: fix trim function by @jiahui-97, (#5136)
  • databend-query: manually drop the aggregate states to avoid memory leak by @sundy-li, (#5056)
  • databend-query: fix clickhouse worker hang when interpreter fail to execute by @chowc, (#5091)

Tips

Let's learn a weekly tip from Databend.

RFC | New SQL Logic Test Framework

The Stateless/Stateful test suite is used in Databend to ensure that the query results are correct. However, it has some shortages in current logic test which should be improved.

  • Comparing output from binary to a result file cannot be extended to other protocols.
  • The result file cannot show the result for each statement.
  • Currently, we do not provide error handling for sql logic tests
  • We could not extend sql logic statement with sorting, retry and other logics.

So, we plan to refactor this testing solution to introduce an extended version of sqllogictest. The statement spec could be categorized to following fields:

  • statement ok: the sql statement is correct and the output is expected success.
  • statement error <error regex>: the sql statement output is expected error.
  • statement query <desired_query_schema_type> <options> <labels>: the sql statement output is expected success with desired result.

You can check out RFC | New SQL Logic Test Framework to learn more.

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #39

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • mem-allocator&databend-query: support memory profiling by @dantengsky, (#5050)
  • databend-meta: transaction api by @lichuang, (#5030)
  • databend-query(parser): switch to the new sql parser by @andylokandy, (#4983)
  • databend-query(planners): add switch to enable new planner by @leiysky, (#4989)
  • databend-query: implement azblob support by @Xuanwo, (#5025)
  • common-functions: support semi-structured function json_extract_path_tex by @b41sh, (#4992)
  • common-functions: support aggregate function retention by @fkuner, (#4970)
  • common-functions: support REGEXP_REPLACE function by @nange, (#4944)

Improvement

  • databend-query: rewrite config parser to respect the order by @Xuanwo, (#4971)
  • databend-query(interpreter): introduce new processor for insert interpreter by @sundy-li, (#4862)
  • datavalues: add TypeSerializerImpl to relpace Box<dyn TypeDeserializer> by @sundy-li, (#5040)
  • databend-query: date/timestamp bound check by @Veeupup, (#5054)
  • datavalues: TimeStamp as basic and alias datetime to timestamp. by @Veeupup, (#5031)

Performance Improvement

  • row granularity stream values insert by @ygf11, (#4764)

Bug fixes

  • databend-query(parser): allow to omit semicolon by @andylokandy, (#5058)
  • databend-query: report error when try to use an empty database by @chowc, (#4939)

Tips

Let's learn a weekly tip from Databend.

Load Data from MySQL

Databend now supports loading data into Databend from MySQL using mysqldump.

Before you start, you need to install the MySQL client and mysqldump. Of course, Databend also needs to be deployed and connected via the MySQL client, see How to Deploy Databend.

Dump MySQL table schema and data to file

mysqldump --single-transaction --compact -uroot -proot book_db books > dumpbooks.sql

Note that here we need to use the --single-transaction and --compact options of mysqldump

Load Data into Databend from the sql File

mysql -uroot -h127.0.0.1 -proot -P3307 < dumpbook.sql

All tables and data from users will now be loaded into Databend.

Verify the Loaded Data

SELECT * FROM books;
+------------------------------+----------------------+-------+
| title                        | author               | date  |
+------------------------------+----------------------+-------+
| Transaction Processing       |  Jim Gray            |  1992 |
| Readings in Database Systems |  Michael Stonebraker |  2004 |
+------------------------------+----------------------+-------+

Congratulations, now you can use Databend to analyze data.

You can check out https://databend.rs/doc/load-data/mysql to learn more.

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.