This week in Databend #42

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • support date_add for new parser (#5419)
  • add metrics in metasrv (#5208)

Migrate to new planner

  • support limit for new planner (#5301)
  • support TRY_CAST, EXTRACT and INTERVAL (#5362)
  • enhance GROUP BY semantic check (#5431)

Work around JOIN

  • support using and natural (#5423)

RFC

Improvement

  • avoid ub and delete unnecessary unsafe (#5338)
  • fix get function unnecessary double column loop (#5349)
  • display friendly error if not started with valid flags (#5443)

Build / Testing / CI

  • Sql logic test framework (#5416)

Bug fixes

  • fix some cases in aggregator plan (#5307)
  • fix wrong result of memory table engine (#5364)
  • fix server hang when processor panic (#5394)

Tips

Let's learn a weekly tip from Databend.

Analyze User Retention With Databend

User retention helps you analyze how many users return to your product or service. Let's go through an example and see how to analyze it in Databend.

Create a Table

CREATE TABLE events(`user_id` INT, `login_date` DATE);

Now we have a table with the following fields:

  • user_id - a unique identifier for user
  • login_date - user login date

User Retention Analysis

It's easy and performance to use Databend Retention Function to do the user retention analysis.

SELECT
    sum(r[0]) AS r1,
    sum(r[1]) AS r2,
    sum(r[2]) AS r3
FROM
(
    SELECT
        user_id,
        retention(login_date = '2022-05-15', login_date = '2022-05-16', login_date = '2022-05-17') AS r
    FROM events
    GROUP BY user_id
);

Learn more:

Changelogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilb41shBohuTANGdrmingdrmereverpcpc
andylokandyariesdevilb41shBohuTANGdrmingdrmereverpcpc
fkunerjunnplusleiyskylichuangmergify[bot]RinChanNOWWW
fkunerjunnplusleiyskylichuangmergify[bot]RinChanNOWWW
sundy-liTCeasonusamoiweakishwubxXuanwo
sundy-liTCeasonusamoiweakishwubxXuanwo
xudong963ygf11youngsofunZeaLoVezhang2014
xudong963ygf11youngsofunZeaLoVezhang2014

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #41

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

Migrate to new planner

Work around Array

Improvement

  • databend-query: support two-level hashmap by @fkuner, (#5075)
  • databend-query: implement format trait, load CSV/Parquet will be a streaming read by @zhang2014, (#5167 & #5271)
  • databend-query: make expression serialized to raw sql by @sundy-li, (#5260)
  • databend-meta: store endpoints to metasrv and use balance endpoints grpc connection channel by @ariesdevil, (#4987)

Bug fixes

  • databend-query(pipeline): fix state machine of hash join by @leiysky, (#5242)
  • databend-query(parser): show alternative tokens even if the branch is optional by @andylokandy, (#5230)

Tips

Let's learn a weekly tip from Databend.

Start a Databend Cluster

Databend has been designed from day one to be a cloud-native and distributed data warehouse.

The new databend-query node only needs to register itself to the databend-meta with the same cluster_id, they will autodiscovery and formed into a cluster.

Cluster Arch

Read the following article to start your first Databend cluster:

Note

  • Databend Cluster mode only works on shared storage(AWS S3 or MinIO s3-like storage).
  • This cluster mainly used for testing purpose, it is not targeted for production use.

Changelogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #40

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • databend-query(common): introduce a helper ExpressionEvaluator to simplify expression evaluation by @leiysky, (#5108)
  • databend-query(parser): support more statements by @andylokandy, (#5089)
  • databend-query: aggregator in new planner by @xudong963, (#5027 & #5115)
  • common-functions: add scalar function humanize by @cadl, (#5073)

Improvement

  • databend-query(processor): refactor insert into query for fuse engine by @zhang2014, (#5139)
  • datablocks: add gather kernels for datablock by @sundy-li, (#5114)
  • common-meta: CreateDatabaseReq/DropDatabaseReq use DatabaseNameIdent to specify to db to create/delete by @drmingdrmer, (#5102 & #5104)
  • common-functions: specialize CastFunction with from type by @leiysky, (#5124)
  • datavalues: remove useless precision convert & remove tz in type timestamp by @Veeupup, (#5084)
  • datavalues: using enum_dispatch to represent data_type by @PsiACE, (#5063)

Documentation

Bug fixes

  • databend-query: fix trim function by @jiahui-97, (#5136)
  • databend-query: manually drop the aggregate states to avoid memory leak by @sundy-li, (#5056)
  • databend-query: fix clickhouse worker hang when interpreter fail to execute by @chowc, (#5091)

Tips

Let's learn a weekly tip from Databend.

RFC | New SQL Logic Test Framework

The Stateless/Stateful test suite is used in Databend to ensure that the query results are correct. However, it has some shortages in current logic test which should be improved.

  • Comparing output from binary to a result file cannot be extended to other protocols.
  • The result file cannot show the result for each statement.
  • Currently, we do not provide error handling for sql logic tests
  • We could not extend sql logic statement with sorting, retry and other logics.

So, we plan to refactor this testing solution to introduce an extended version of sqllogictest. The statement spec could be categorized to following fields:

  • statement ok: the sql statement is correct and the output is expected success.
  • statement error <error regex>: the sql statement output is expected error.
  • statement query <desired_query_schema_type> <options> <labels>: the sql statement output is expected success with desired result.

You can check out RFC | New SQL Logic Test Framework to learn more.

Changelogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #39

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • mem-allocator&databend-query: support memory profiling by @dantengsky, (#5050)
  • databend-meta: transaction api by @lichuang, (#5030)
  • databend-query(parser): switch to the new sql parser by @andylokandy, (#4983)
  • databend-query(planners): add switch to enable new planner by @leiysky, (#4989)
  • databend-query: implement azblob support by @Xuanwo, (#5025)
  • common-functions: support semi-structured function json_extract_path_tex by @b41sh, (#4992)
  • common-functions: support aggregate function retention by @fkuner, (#4970)
  • common-functions: support REGEXP_REPLACE function by @nange, (#4944)

Improvement

  • databend-query: rewrite config parser to respect the order by @Xuanwo, (#4971)
  • databend-query(interpreter): introduce new processor for insert interpreter by @sundy-li, (#4862)
  • datavalues: add TypeSerializerImpl to relpace Box<dyn TypeDeserializer> by @sundy-li, (#5040)
  • databend-query: date/timestamp bound check by @Veeupup, (#5054)
  • datavalues: TimeStamp as basic and alias datetime to timestamp. by @Veeupup, (#5031)

Performance Improvement

  • row granularity stream values insert by @ygf11, (#4764)

Bug fixes

  • databend-query(parser): allow to omit semicolon by @andylokandy, (#5058)
  • databend-query: report error when try to use an empty database by @chowc, (#4939)

Tips

Let's learn a weekly tip from Databend.

Load Data from MySQL

Databend now supports loading data into Databend from MySQL using mysqldump.

Before you start, you need to install the MySQL client and mysqldump. Of course, Databend also needs to be deployed and connected via the MySQL client, see How to Deploy Databend.

Dump MySQL table schema and data to file

mysqldump --single-transaction --compact -uroot -proot book_db books > dumpbooks.sql

Note that here we need to use the --single-transaction and --compact options of mysqldump

Load Data into Databend from the sql File

mysql -uroot -h127.0.0.1 -proot -P3307 < dumpbook.sql

All tables and data from users will now be loaded into Databend.

Verify the Loaded Data

SELECT * FROM books;
+------------------------------+----------------------+-------+
| title                        | author               | date  |
+------------------------------+----------------------+-------+
| Transaction Processing       |  Jim Gray            |  1992 |
| Readings in Database Systems |  Michael Stonebraker |  2004 |
+------------------------------+----------------------+-------+

Congratulations, now you can use Databend to analyze data.

You can check out https://databend.rs/doc/load-data/mysql to learn more.

Changelogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #38

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • databend-query(parser): add select statement by @andylokandy, (#4941)
  • databend-query: support mysqldump dump schema by @BohuTANG, (#4972)
  • databend-query: refine new planner framework: use trait object to represent plans to make it more extensible. by @leiysky, (#4895)
  • databend-query: create user if not exists on JWT authenticate by @Junnplus, (#4924)
  • databend-meta: support watch api by @lichuang, (#4779)
  • databend-query(parser): support keyword DATABASE synonym SCHEMA by @TCeason, (#4855)

Improvement

  • databend-query: reconstruct type: date/datetime to simplify date type by @Veeupup, (#4921)
  • common-functions: refine the functions name from xY to x_y by @BohuTANG, (#4915, #4906 and #4884)
  • common-meta: metasrv has to be compatible with 20220413-34e89c9 by @drmingdrmer, (#4901)
  • databend-query: compatible with mysql insert and select by @TCeason, (#4883)
  • common-functions: replace FactoryCreator with FactoryCreatorWithTypes for functions by @zhyass, (#4688)

Build / Testing / CI

Performance Improvement

  • databend-query(processor): replace global mutex with atomic by @zhang2014, (#4905)

Bug fixes

  • common-functions(cast): fix the behavior of null to boolean by @sundy-li, (#4911)
  • databend-query(group_by): fix group by with negative value by @zhang2014, (#4902)
  • databend-query(transform_limit): fixes limit and offset with one block by @zhang2014, (#4907)
  • databend-query(interpreters): fix empty query by @cadl, (#4894)
  • *: fix show grants from inherited role by @Junnplus, (#4873)

Tips

Let's learn a weekly tip from Databend.

Visualization Databend data in Jupyter Notebook

The Jupyter Notebook is the original web application for creating and sharing computational documents. It offers a simple, streamlined, document-centric experience.

Recently, we have worked on improving Databend's compatibility with the MySQL/Clickhouse ecosystem to provide a better experience. Thanks to improved support for sqlalchemy, we can now interact with data in Databend in Jupyter Notebook.

To experience it, there are only three steps:

Databend with Jupyter Notebook

You can check out https://databend.rs/doc/integrations/gui-tool/jupyter to learn more.

Changelogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.