This week in Databend #48

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • support abort pipeline (#6174)
  • integration with sentry (#6226)
  • rewrite predicate and accelerate tpch19 (#6301)

databend meta

  • support leave a cluster with databend-meta --leave.. (#6181)
  • add import init cluster support (#6280)

statements

  • support exists statement (#6166)
  • statement delete from... (#5691)
  • order by sub stmt support db.table.col (#6191)

new planner

  • introduce serializable physical plan (#6191)
  • support non-equi conditions in hash join (#6145)
  • decorrelate EXISTS subquery with non-equi condition (#6232)
  • migrate Create(#5905)/Alter(#6319)/Drop(#6327) UDF

Improvement

  • improve compatibility with clickhouse http handler (#6148)
  • limit push down for table fuse_snapshot & proc system$fuse_snapshot (#6167)
  • split ast statements into multiple mods (#6176)
  • store grpc addr to node info and auto refresh backends addrs for grpc client (#5495)

Join Performance Improvements

  • improve left/semi/anti join performance [~80x] (#6241)
  • improve join results gather [~7x] (#6228)
  • improve semi/anti join with other conjuncts [~17x] (#6366)

Build/Testing/CI

  • add tpch stateless-test (#6225)
  • add async insert test (#5964)

Bug fixes

  • fix datatype different cause mysql session distroy (#6150)
  • fix node id truncation when cluster id is escaped (#6193)
  • fix aggregate count incorrect state place (#6218)
  • fix grouping check (#6219)
  • fix output of to_datetime() (#6252)
  • fix MySQL connection close_wait or fin_wait_2 (#6341)

Tips

Let's learn a weekly tip from Databend.

DELETE in Databend

The DELETE statement can delete one or more rows from a table.

Syntax

Databend now supports such syntax:

DELETE FROM table_name
[WHERE search_ condition]

Example

Suppose that the bookstore table currently contains the following data:

bookIdbookName
101After the death of Don Juan
102Grown ups
103The long answer
104Wartime friends
105Deconstructed

Now let's delete the book with id = 103:

DELETE from bookstore where bookId = 103;

After deletion, the data in the bookstore table is shown as follows:

bookIdbookName
101After the death of Don Juan
102Grown ups
104Wartime friends
105Deconstructed

Learn more:

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

ariesdevilb41shBohuTANGClSlaidcuichenlidantengsky
ariesdevilb41shBohuTANGClSlaidcuichenlidantengsky
drmingdrmereverpcpcfkunerleiyskylichuangmergify[bot]
drmingdrmereverpcpcfkunerleiyskylichuangmergify[bot]
PsiACEsoyeric128sundy-liTCeasonTennyZhuangXuanwo
PsiACEsoyeric128sundy-liTCeasonTennyZhuangXuanwo
xudong963youngsofunzhang2014ZhiHanZ
xudong963youngsofunzhang2014ZhiHanZ

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #47

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • transient fuse table (#5968)
  • support field comment (#5952)
  • stop parsing at insert statement (#6048)
  • configurable repr of float denormals in output formats (#6065)
  • integrate cluster select query with new processor (#4544)

optimizer

  • decorrelate EXISTS subquery (#6073)
  • support push down filter through cross apply (#6079)

functions

  • support ifnull (#5921)
  • support multi_if (#6039)
  • support date_sub (#6050)
  • support to_nullable and assume_not_null (#6055)
  • support coalesce (#5922)

clickhouse handle

  • support settings (#5945)
  • improve output format (#6027)
  • support set database. (#6097)

new planner

  • show statements (#6013)
  • show users/roles statement (#6016)
  • migrate grant (#6049)
  • migrate revoke (#6066)
  • migrate copy (#6074)
  • migrate insert statement (#5897)
  • support query log for new planner (#6053)
  • support SELECT ... FROM ... { AT TIMESTAMP } (#6056)

Improvement

  • stop supporting multiple statements (#6052)
  • make uuid return uuid values for each row (#6114)
  • add more meta cluster status (#6083)
  • cache panic of http handlers (#6090)
  • adjust table snapshot timestamp precision to micro second (#6144)

Bug fixes

  • fix de_csv with escaped quoted (#6008)
  • consider NULL for binary op in type checker (#6043)
  • fix error of EXISTS subquery (#6073)
  • transient object storage IO operation fault handling (#6045)
  • remove a wrong cut parser in subquery (#6111)

Tips

Let's learn a weekly tip from Databend.

Cluster key

Cluster key is intended to improve query performance by physically clustering data together.

For example, when you set a column as your cluster key for a table, the table data will be physically sorted by the column you set. This will maximize the query performance if your most queries are filtered by the column.

Sets a cluster key when creating a table

CREATE TABLE <name> ... CLUSTER BY ( <expr1> [ , <expr2> ... ] )

Changes the cluster key for a table

ALTER TABLE <name> CLUSTER BY ( <expr1> [ , <expr2> ... ] )

Deletes the cluster key for a table

ALTER TABLE <name> DROP CLUSTER KEY

Learn more:

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyAngleNetb41shBohuTANGcadldantengsky
andylokandyAngleNetb41shBohuTANGcadldantengsky
drmingdrmereverpcpcfkunerhantmacjunnplusleiysky
drmingdrmereverpcpcfkunerhantmacjunnplusleiysky
lichuangmergify[bot]PsiACERinChanNOWWWsoyeric128sundy-li
lichuangmergify[bot]PsiACERinChanNOWWWsoyeric128sundy-li
TCeasonTennyZhuangTianLangStudioVeeupupwfxrwubx
TCeasonTennyZhuangTianLangStudioVeeupupwfxrwubx
Xuanwoxudong963youngsofunyuuchzhang2014zhyass
Xuanwoxudong963youngsofunyuuchzhang2014zhyass

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #46

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • WINDOW function (#5401)
  • new table engine: RANDOM (#5896)
  • clickhouse http handler support TsvWithNamesAndTypes (#5898)
  • add benchmark scripts of metasrv (#5865)
  • support remove stage files (#5788)
  • support struct data type (#5940)

optimizer

  • support some common optimizer rules to refine optimizer framework (#5877)
  • support predicate push down through join (#5914)
  • support constant folding (#5924)

new planner

  • support left outer join and right outer join (#5972)
  • support semi and anti join in new planner (#5869)
  • support set operators in parser and planner (#5833)
  • support table statements in new planner (#5907)
  • add DDL STAGE for new planner framework (#5821)
  • support rename database in new planner (#5887)
  • support alter/drop view in new planner (#5862 & #5920)
  • migrate CREATE/DROP ROLE statement to new planner (#5935)

Improvement

  • row base serializer (#5791)
  • renaming for better readability (#5879)
  • list/remove stage files from meta api (#5857)
  • use unboxed_simple to simple GAT in async-trait (#5999)
  • add meta grpc client network metrics (#5978)
  • improve clickhouse http handler (#5933)
  • use micromarshal to make references clearer (#5974)

Build / Testing / CI

  • use fuse engine instead of memory engine in test (#5530)

Bug fixes

  • fix read quoted string (#5870)
  • corretly handle catalog in statements (#5909)
  • fix correlated subquery with joins (#5947)
  • server panic when exceeds max active sessions (#5928)
  • fix status cause mysql client hang (#5961)
  • ProcessorExecutorStream lost data (#5983)
  • fix incorrect return datatype of function if (#5980)

Tips

Let's learn a weekly tip from Databend.

Using Join with Databend

As you can see, one of the major efforts of Databend in recent times has been the development and migration to the new Planner. We have implemented support for Join on the new Planner to better accommodate multi-table queries.

To use JOIN, you must enable the new Databend planner first. To do so, perform the following command in the SQL client:

set enable_planner_v2=1;

A join allows you to combine columns from two or more tables into a single result set. Databend supports the following join types:

  • Inner Join
  • Natural Join
  • Cross Join
  • Left Join
  • Right Join

Let's see a CROSS JOIN example

A cross join returns a result set that includes each row from the first table joined with each row from the second table.

SELECT select_list
FROM table_a
	CROSS JOIN table_b

Imagine we have the following tables:

Table "vip_info": This table stores the VIP client information.

| Client_ID | Region    |
| --------- | --------- |
| 101       | Toronto   |
| 102       | Quebec    |
| 103       | Vancouver |

Table "gift": This table lists the gift options for the VIP clients.

| Gift      |
| --------- |
| Croissant |
| Donut     |
| Coffee    |
| Soda      |

The following command returns a result set that assigns each gift option to each VIP client:

select * from vip_info cross join gift;

Output:

| Client_ID | Region    | Gift      |
| --------- | --------- | --------- |
| 101       | Toronto   | Croissant |
| 101       | Toronto   | Donut     |
| 101       | Toronto   | Coffee    |
| 101       | Toronto   | Soda      |
| 102       | Quebec    | Croissant |
| 102       | Quebec    | Donut     |
| 102       | Quebec    | Coffee    |
| 102       | Quebec    | Soda      |
| 103       | Vancouver | Croissant |
| 103       | Vancouver | Donut     |
| 103       | Vancouver | Coffee    |
| 103       | Vancouver | Soda      |

Learn more:

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyb41shBohuTANGdantengskydoki23everpcpc
andylokandyb41shBohuTANGdantengskydoki23everpcpc
fkunerhantmacjunnplusKikkonleiyskylichuang
fkunerhantmacjunnplusKikkonleiyskylichuang
mergify[bot]PsiACERinChanNOWWWsoyeric128sundy-liTCeason
mergify[bot]PsiACERinChanNOWWWsoyeric128sundy-liTCeason
Xuanwoxudong963ygf11youngsofunZeaLoVezhyass
Xuanwoxudong963ygf11youngsofunZeaLoVezhyass

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #45

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • undrop database (#5770)
  • support async insert mode to imporve throughput (#5567)
  • support alter table cluster key (#5718)

meta

  • add gc out of drop retention time data schema API and unit tests (#5746)
  • protobuf message has to persist MIN_COMPATIBLE_VER (#5785)

functions

  • support function timezone() (#5840)
  • add function NULLIF (#5772)

new planner

  • introduce InterpreterFactoryV2 for new planner (#5729)
  • support udf (#5751)
  • CREATE VIEW (#5816)
  • CREATE DATABASE (#5804) / DROP DATABASE (#5846)
  • CREATE USER (#5802) / ALTER USER (#5823) / DROP USER (#5813)

Improvement

  • refactor location parse logic (#5790)
  • show tenant quota for current tenant (#5750)
  • support errorcode hint in new planner (#5756)
  • don't load credential while reading stage (#5783)
  • move Clickhouse HTTP handler to its own port (#5797)

new parser

  • support cross join (#5730)
  • flatten join lists (#5742)

Build / Testing / CI

  • enable logic test (#5836)

Bug fixes

  • fix compressed buf not consumed correctly (#5727)
  • support abort for pipeline executor stream (#5803)
  • lz4raw compression of zero len buffer (#5806)
  • fix server hang when sync work panic (#5814)

Tips

Let's learn a weekly tip from Databend.

Databend Supports Async Insert Mode

When thousands of clients concurrently insert a small batch of data, each insert will be executed as follows:

Parser -> Planner -> Interpreter -> Pipeline

It's inefficient because of I/O depth and cache locality.

To solve the problem, we want to buffer small inserts into batches in server which sacrifices tiny latency for better insert throughput, smaller block count and larger DataBlock in storage.

After doing this, inserts into the same table will be parsed and planned individual. The insert data will be convert to DataBlock and buffered. When some conditions are triggered, the buffered DataBlock will be interpreted once which also is beneficial for pipelines.

Learn more:

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyb41shBohuTANGdantengskydevillove084drmingdrmer
andylokandyb41shBohuTANGdantengskydevillove084drmingdrmer
everpcpcfkunerflaneur2020hantmacjunnplusleiysky
everpcpcfkunerflaneur2020hantmacjunnplusleiysky
lichuangLiuYuHuimergify[bot]RinChanNOWWWsoyeric128sundy-li
lichuangLiuYuHuimergify[bot]RinChanNOWWWsoyeric128sundy-li
TCeasonXuanwoxudong963ygf11youngsofunZeaLoVe
TCeasonXuanwoxudong963ygf11youngsofunZeaLoVe
zhang2014zhyass
zhang2014zhyass

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #44

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • support correlated subquery (#5593)
  • support select ... at (#5617 & #5628)
  • exchange protocol version with client (#5645)
  • add decompress support for COPY INTO and streaming loading (#5655)

new parser

  • implment string unescape (#5638)
  • allow mysql-style hex number and single-item array (#5654)

stage related

  • copy into stage support (#5579)
  • add system.stages table and show stages (#5581)

call functions

  • add call stats functions (#5646)
  • add call function system$search_tables (#5643)

new planner

  • support explain for new planner (#5587)
  • support tuple in new planner (#5640)
  • support cross join (#5715)

Variant related

  • support variant group by (#5694)
  • support variant order by (#5668)

Improvement

  • refine table optimize (#5589)
  • parse integer to u64 (#5692)
  • support FixedKey u128, u256, u512 in group query (#5678)
  • introduce ScalarEvaluator to evaluate Scalar (#5689)
  • utilize with HashMethodFixedKeys in hash join (#5693)
  • record count of tables for a tenant in KV space (#5708)

Build / Testing / CI

  • add mock module, add test of out retention time data (#5707)

Bug fixes

  • retry while meeting error during load_credential (#5590)
  • deny the root login from others host (#5588)

Tips

Let's learn a weekly tip from Databend.

COPY INTO Databend with COMPRESSION Option

After PR 5655, Databend started to support decompression reads in COPY INTO and streaming leading.

Loading Compressed Files from Amazon S3

Try to load data from a gzip compressed csv and insert into mytable.

COPY INTO mytable
  FROM s3://mybucket/data.csv.gz
  credentials=(aws_key_id='<AWS_ACCESS_KEY_ID>' aws_secret_key='<AWS_SECRET_ACCESS_KEY>')
  FILE_FORMAT = (type = "CSV" field_delimiter = ',' record_delimiter = '\n' 
  skip_header = 1 compression = GZIP) size_limit=10;

COMPRESSION Option

The COMPRESSION option is a string that represents the compression algorithm.

| Values        | Notes                                                           |
| ------------- | --------------------------------------------------------------- |
| `AUTO`        | Auto detect compression via file extensions                     |
| `GZIP`        |                                                                 |
| `BZ2`         |                                                                 |
| `BROTLI`      | Must be specified if loading/unloading Brotli-compressed files. |
| `ZSTD`        | Zstandard v0.8 (and higher) is supported.                       |
| `DEFLATE`     | Deflate-compressed files (with zlib header, RFC1950).           |
| `RAW_DEFLATE` | Deflate-compressed files (without any header, RFC1951).         |
| `NONE`        | Indicates that the files have not been compressed.              |

Learn more:

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilb41shBohuTANGChasen-Zhangdantengsky
andylokandyariesdevilb41shBohuTANGChasen-Zhangdantengsky
devillove084drmingdrmereverpcpcflaneur2020hanyisongjunnplus
devillove084drmingdrmereverpcpcflaneur2020hanyisongjunnplus
leiyskylichuangmergify[bot]PsiACERinChanNOWWWryanrussell
leiyskylichuangmergify[bot]PsiACERinChanNOWWWryanrussell
soyeric128sundy-liTCeasonXuanwoxudong963youngsofun
soyeric128sundy-liTCeasonXuanwoxudong963youngsofun
ZeaLoVezhyass
ZeaLoVezhyass

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.