This week in Databend #52

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

logging

  • implement RFC The New Logging (#6845)

meta

  • add grant and revoke object API in ShareApi (#6724)
  • show share api (#6790)
  • add get_share_grant_objects API in ShareApi (#6798)

http handle

  • http handler return session state (#6846)

processor

  • implement explain fragments (#6851)
  • support distributed subquery in new cluster framework (#6666)

new planner

  • support order by expression (#6725)
  • enable delete stmt (#6768)
  • implement distributed query (#6440)
  • support push down predicates to storage (#6842)

storage

  • add support for COPY from https (#6691)
  • construct leaf column statistics (#6731)
  • support read nested columns (#6612)

new expression

  • support float32, float64 and Map(T) datatype (#6711 & #6838)
  • add serializable expression (#6712)
  • support user-defined CAST and TRY_CAST (#6663)
  • migrate Boolean functions to new expression framework (#6763)
  • migrate some String functions to new expression framework (progress of migration #6766)

Improvement

  • purge mapping data in DB/table GC (#6753)
  • fuzz with afl (#6793)
  • make auto-nullable and auto-vectorization independent (#6797)
  • refactor pipeline builder (#6820)

new planner

  • make PRESIGN works on old planner by forwarding (#6713)
  • forward COPY and STAGE to new planner entirely (#6853)
  • migrate more new planners to be enabled (#6716)

Build/Testing/CI

Bug fixes

  • fix uncorrelated scalar subquery returns error results (#6720)
  • fix bug in FileSplitter skip header (#6732)
  • fix oom when returning large results in clickhouse tcp handler (#6789)
  • Any/Exists subquery in projection (#6809)

Tips

Let's learn a weekly tip from Databend.

COPY INTO <table> FROM REMOTE FILES

After #6691 has been merged, Databend now supports loading data into a table from one or more remote files by their URL.

Syntax

COPY INTO [<database>.]<table_name>
FROM 'https://<site>/<directory>/<filename>'
[ FILE_FORMAT = ( TYPE = { CSV | JSON | PARQUET } [ formatTypeOptions ] ) ]

Example

This example loads data into the table ontime200 from the remote files ontime_2006_200.csv, ontime_2007_200.csv, and ontime_2008_200.csv:

copy into ontime200 from 'https://repo.databend.rs/dataset/stateful/ontime_200{6,7,8}_200.csv' FILE_FORMAT = (type = 'CSV' field_delimiter = ','  record_delimiter = '\n' skip_header = 1)

Of course, this example could also be written in the following form:

copy into ontime200 from 'https://repo.databend.rs/dataset/stateful/ontime_200[6-8]_200.csv' FILE_FORMAT = (type = 'CSV' field_delimiter = ','  record_delimiter = '\n' skip_header = 1)

Learn More

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilaseadayb41shBohuTANGClSlaid
andylokandyariesdevilaseadayb41shBohuTANGClSlaid
dantengskyleiyskylichuangmergify[bot]PsiACEsoyeric128
dantengskyleiyskylichuangmergify[bot]PsiACEsoyeric128
sundy-liTCeasonTianLangStudioXuanwoxudong963ygf11
sundy-liTCeasonTianLangStudioXuanwoxudong963ygf11
youngsofunZeaLoVezhang2014zhyass
youngsofunZeaLoVezhang2014zhyass

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #51

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • add StageFileFormatType::Tsv (#6651)

meta

  • add share metasrv ShareApi(create_share,drop_share) (#6582)
  • add share metasrv ShareApi {add|remove}_share_account (#6656)
  • add share id to share name map, add share test suites (#6670)
  • adds cli command to send RPC to a running meta cluster (#6559)

hive catalog

  • support read boolean, float, double, date, array columns (#6629)

new planner

  • support create table as select (#6618)
  • optimize correlated subquery by decorrelation (#6632)

new expression

  • Implement domain calculation (#6649)
  • implement error report (#6661)
  • allow function to return runtime error (#6662)
  • support UInt32, UInt64, Int32, Int64 (#6660)
  • support conversion between arrow (#6674)

Improvement

  • support insert zero date and zero datetime (#6592)
  • Stage Copy use internal InputFormat (#6638)
  • decouple Table from QueryContext (#6665)
  • refactor pipeline builder (#6695)

new planner

  • stage/tables/databases DDL statements defaults to use new planner (#6648)
  • users/roles/grants DDL statements default to use new planner (#6687)

Build/Testing/CI

  • add ydb test cases (#6681)

Bug fixes

  • fix range delete panic and incorrect statistics (of in_memory_size) (#6609)
  • disable null values in join (#6616)
  • COPY shoud be able to run under new planner (#6624)
  • fix InSubquery returns error result (#6641)
  • fix variant map access filter (#6645)
  • adhoc fix session leak (#6672)
  • support read i96 timestamp from parquet file (#6668)
  • check parquet schema mismatch (#6690)

Tips

Let's learn a weekly tip from Databend.

Send & Receive gRPC Metadata

Databend allows you to send and receive gRPC (gRPC Remote Procedure Calls) metadata (key-value pairs) to and from a running meta service cluster with the command-line interface (CLI) commands.

Update and Create a Key-Value Pair

./databend-meta --grpc-api-address "<grpc-api-address>" --cmd kvapi::upsert --key <key> --value <value>

Get Value by a Key

./databend-meta --grpc-api-address "<grpc-api-address>" --cmd kvapi::get --key <key>

Get Values by Multiple Keys

./databend-meta --grpc-api-address "<grpc-api-address>" --cmd kvapi::mget --key <key1> <key2> ...

List Key-Value Pairs by a Prefix

./databend-meta --grpc-api-address "<grpc-api-address>" --cmd kvapi::list --prefix <prefix>

Learn More

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilb41shBohuTANGdantengskydependabot[bot]
andylokandyariesdevilb41shBohuTANGdantengskydependabot[bot]
drmingdrmereverpcpcjiaoew1991lichuangmergify[bot]PsiACE
drmingdrmereverpcpcjiaoew1991lichuangmergify[bot]PsiACE
RinChanNOWWWsandfleesoyeric128sundy-liXuanwoxudong963
RinChanNOWWWsandfleesoyeric128sundy-liXuanwoxudong963
youngsofunyuuchZeaLoVezhang2014
youngsofunyuuchZeaLoVezhang2014

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #50

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • migrate window function to new pipeline (#6500)
  • add format diagnostic (#6530)
  • add date_trunc function (#6540)
  • support global setting (#6579)
  • add {db,table}_id map to {(tenant,db_name), (db_id, table_name)} in metasrv (#6607)
  • support ALL and SOME subquery, mark join with non-equi condition, and make tpch q20 happy (#6534)

presign statement

  • add presign statement in parser (#6513)
  • implement presign support (#6529)

storage

  • allow COPY FROM/INTO different storage services (#6573)
  • allow create stage for different services (#6602)

new expression

  • add new crate common-expression (#6576)
  • implement pretty print for Chunk (#6597)

Improvement

  • improve performances for group by queries (#6551)
  • try abandon internal parquet2 patches (#6067)
  • refactor interpreter factory for reuse interpreters code (#6566)
  • replace infallible (#6568)
  • remove old processor useless code (#6584)
  • pretty format for explain (#6585)

Build/Testing/CI

Bug fixes

  • big query hang with clickhouse (#6583)
  • catchup planner update in http handler (#6572)
  • fix load json value by csv format (#6548)
  • fix input format CSV (#6524)
  • show query with limit will failed when enable planner v2 (#6381)
  • add watch txn unit test (#6526)
  • fix thread unsafe when processor schedule (#6533)
  • fix database and user related functions in planner v2 (#6473)

Tips

Let's learn a weekly tip from Databend.

Presign Statement

Generates the pre-signed URL for a staged file by the stage name and file path you provide. The pre-signed URL enables you to access the file through a web browswer or an API request.

Syntax

PRESIGN [ { DOWNLOAD | UPLOAD }] @<stage_name>/.../<file_name> [ EXPIRE = <expire_in_seconds> ]

Example

This example generates the pre-signed URL for downloading the file books.csv on the stage my-stage:

PRESIGN @my_stage/books.csv
+--------+---------+---------------------------------------------------------------------------------+
| method | headers | url                                                                             |
+--------+---------+---------------------------------------------------------------------------------+
| GET    | {}      | https://example.s3.amazonaws.com/books.csv?X-Amz-Algorithm=AWS4-HMAC-SHA256&... |
+--------+---------+---------------------------------------------------------------------------------+

Learn More

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilb41shBohuTANGdantengskyDefined2014
andylokandyariesdevilb41shBohuTANGdantengskyDefined2014
everpcpcfkunergaoxingeGrapeBaBajiaoew1991lichuang
everpcpcfkunergaoxingeGrapeBaBajiaoew1991lichuang
mergify[bot]PsiACEsoyeric128sundy-liTCeasonXuanwo
mergify[bot]PsiACEsoyeric128sundy-liTCeasonXuanwo
xudong963youngsofunZeaLoVezhang2014
xudong963youngsofunZeaLoVezhang2014

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #49

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • add call procedure for sync stage (#6344)
  • show settings support like (#6394)
  • support all JsonEachRowOutputFormat variants (#6434)
  • support any, all and some subquery in parser (#6438)
  • support geo_to_h3 function (#6389)

storage

  • add xz compression support (#6421)
  • introduce system.tables_with_history (#6435)

new planner

  • migrate call statement to new planner (#6361)
  • support IS [NOT] DISTINCT FROM in planner_v2 (#6170)
  • support qualified column name with database specified (#6444)
  • support mark join, (not)in/any subquery, make tpch16 and tpch18 happy (#6412)

RFC

  • add Presign statement (#6503)

Improvement

  • add span info for TableReference (#6370)
  • improve optimize table compact (#6373)

refactor

  • split formats (#6443)
  • intro common-http to reduce duplicate code (#6484)

Build/Testing/CI

  • logic test with clickhouse handler (#6329)
  • enable semantic PRs and fully migrate to mergify and gh cli (#6386, #6419 and more)

Bug fixes

  • fix hashmap memory leak (#6354)
  • fix array inner type with null (#6407)
  • fix lost event in resize processor (#6501)

cluster

  • show correctly progress in cluster mode (#6253)
  • fix cannot destroy thread in cluster mode (#6436)

format

  • add NestedCheckpointReader for input format parser (#6385)
  • fix tsv deserialization (#6453)

Tips

Let's learn a weekly tip from Databend.

Monitoring Databend with Sentry

Sentry is cross-platform application monitoring, with a focus on error reporting.

Databend supports error tracking and performance monitoring with Sentry.

Preparing

Error Tracking

This will only use the sentry-log feature, which will help us with error tracking.

DATABEND_SENTRY_DSN="<your-sentry-dsn>" ./databend-query

sentry-error

Performance Monitoring

Setting SENTRY_TRACES_SAMPLE_RATE greater than 0.0 will allow sentry to perform trace sampling, which will help set up performance monitoring.

DATABEND_SENTRY_DSN="<your-sentry-dsn>" SENTRY_TRACES_SAMPLE_RATE=1.0 LOG_LEVEL=DEBUG ./databend-query

Note: Set SENTRY_TRACES_SAMPLE_RATE to a lower value in production.

sentry-performance

Learn more

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

ariesdevilb41shBohuTANGClSlaiddantengskydatabend-bot
ariesdevilb41shBohuTANGClSlaiddantengskydatabend-bot
drmingdrmereverpcpcflaneur2020junnplusleiyskylichuang
drmingdrmereverpcpcflaneur2020junnplusleiyskylichuang
mergify[bot]PragmaTwicePsiACEsoyeric128sundy-liTCeason
mergify[bot]PragmaTwicePsiACEsoyeric128sundy-liTCeason
VeeupupXuanwoxudong963youngsofunZeaLoVezhang2014
VeeupupXuanwoxudong963youngsofunZeaLoVezhang2014
ZhiHanZzhyass
ZhiHanZzhyass

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #48

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • support abort pipeline (#6174)
  • integration with sentry (#6226)
  • rewrite predicate and accelerate tpch19 (#6301)

databend meta

  • support leave a cluster with databend-meta --leave.. (#6181)
  • add import init cluster support (#6280)

statements

  • support exists statement (#6166)
  • statement delete from... (#5691)
  • order by sub stmt support db.table.col (#6191)

new planner

  • introduce serializable physical plan (#6191)
  • support non-equi conditions in hash join (#6145)
  • decorrelate EXISTS subquery with non-equi condition (#6232)
  • migrate Create(#5905)/Alter(#6319)/Drop(#6327) UDF

Improvement

  • improve compatibility with clickhouse http handler (#6148)
  • limit push down for table fuse_snapshot & proc system$fuse_snapshot (#6167)
  • split ast statements into multiple mods (#6176)
  • store grpc addr to node info and auto refresh backends addrs for grpc client (#5495)

Join Performance Improvements

  • improve left/semi/anti join performance [~80x] (#6241)
  • improve join results gather [~7x] (#6228)
  • improve semi/anti join with other conjuncts [~17x] (#6366)

Build/Testing/CI

  • add tpch stateless-test (#6225)
  • add async insert test (#5964)

Bug fixes

  • fix datatype different cause mysql session distroy (#6150)
  • fix node id truncation when cluster id is escaped (#6193)
  • fix aggregate count incorrect state place (#6218)
  • fix grouping check (#6219)
  • fix output of to_datetime() (#6252)
  • fix MySQL connection close_wait or fin_wait_2 (#6341)

Tips

Let's learn a weekly tip from Databend.

DELETE in Databend

The DELETE statement can delete one or more rows from a table.

Syntax

Databend now supports such syntax:

DELETE FROM table_name
[WHERE search_ condition]

Example

Suppose that the bookstore table currently contains the following data:

bookIdbookName
101After the death of Don Juan
102Grown ups
103The long answer
104Wartime friends
105Deconstructed

Now let's delete the book with id = 103:

DELETE from bookstore where bookId = 103;

After deletion, the data in the bookstore table is shown as follows:

bookIdbookName
101After the death of Don Juan
102Grown ups
104Wartime friends
105Deconstructed

Learn more:

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

ariesdevilb41shBohuTANGClSlaidcuichenlidantengsky
ariesdevilb41shBohuTANGClSlaidcuichenlidantengsky
drmingdrmereverpcpcfkunerleiyskylichuangmergify[bot]
drmingdrmereverpcpcfkunerleiyskylichuangmergify[bot]
PsiACEsoyeric128sundy-liTCeasonTennyZhuangXuanwo
PsiACEsoyeric128sundy-liTCeasonTennyZhuangXuanwo
xudong963youngsofunzhang2014ZhiHanZ
xudong963youngsofunzhang2014ZhiHanZ

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.