This week in Databend #57

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

share

  • add create database from share SQL (#7288)

meta

  • add admin api: /v1/ctrl/trigger_snapshot let the leader send a snapshot to every follower (#7298)

storage

  • HuaweiCloud OBS as storage backend for databend (#7365)
  • add top-N pruner (#7302)
  • split hive file to small partition and support read more than one rowgroups (#7311)

format

  • add format ndjson (#7328)

planner

  • implement cost-based optimization (#7187)
  • push limit down further (#7273)
  • support explain ast (#7215)

query

  • alter table recluster (#7400)

new expression

  • migrate sign / trigonometric / abs (#7272)
  • implement is_null and is_not_null (#7282)
  • add Timestamp type (#7393)

Improvement

share

  • refactor drop share (#7341)

meta

  • introduce UpserKV to simplify meta command (#7339 & #7345)

storage

  • refactor compact and recluster (#7274)
  • remove Github Engine (#7289)
  • enable file meta data cache (#7386)

async insert

  • eliminate circular references (#7411)

query

  • make args to be consistent of other popular DBMS (#7357)

Build/Testing/CI

  • add part of duckdb logictest suites (#7394)

Bug fixes

meta

  • when expiring a record, leader and followers should use the same now time (#7325)

parser

  • improve parsing speed for large expr (#7279)

planner

  • scalar subquery in function got error (#7293)
  • fix case sensitivity of USING clause (#7304)
  • validate duplicated column name when creating table (#7307)

query

  • fix JSON value incorrect memory size (#7346)

cluster

  • optimize the performance for cluster mode (#7351)
  • fix panic if exchange key is nullable (#7368)

News

Let's take a look at what's new at Datafuse Labs & Databend each week.

Databend Now Supports Gcs and Huawei Obs as Storage Backends

Databend uses opendal as the storage access layer to interface with various storage systems.

Recently, Databend completed support for Google Cloud Storage and Huawei Cloud Object Storage, making it easy to access the data stored on these backends.

Learn more:

Remove Support for GitHub Engine

In the past, Databend has supported the GitHub engine to easily demonstrate, which pulls data from the GitHub API to form a database.

As the FUSE engine matures, we are officially removing support for the GitHub toy engine to reduce unnecessary maintenance work.

Learn more: https://github.com/datafuselabs/databend/discussions/7286

Databend Have Achieved the OpenSSF Best Practice Badge

The Open Source Security Foundation (OpenSSF) Best Practices badge is a way for Free/Libre and Open Source Software (FLOSS) projects to show that they follow best practices.

CII Best Practices

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyAngleNetariesdevilb41shBohuTANGClSlaid
andylokandyAngleNetariesdevilb41shBohuTANGClSlaid
dantengskydependabot[bot]drmingdrmereastfishereverpcpcleiysky
dantengskydependabot[bot]drmingdrmereastfishereverpcpcleiysky
lichuangmergify[bot]PsiACERinChanNOWWWsandfleesoyeric128
lichuangmergify[bot]PsiACERinChanNOWWWsandfleesoyeric128
sundy-liTCeasonXuanwoxudong963youngsofunZeaLoVe
sundy-liTCeasonXuanwoxudong963youngsofunZeaLoVe
zhang2014zhyass
zhang2014zhyass

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #56

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

share

  • add show grant on object and show grant of share sql (#7181)

storage

  • Google Cloud Storage as storage backend for databend (#7171 & #7197)

planner

  • support settings set sql_dialect (#7175)
  • support union/union all (#7160)

query

  • support datetime format (#7126)
  • introduce custom allocator for HashTable (#7221)

new expression

  • add function-v2 concat/concat_ws (#7167)
  • migrate bin, oct, hex, and unhex (#7219)

Improvement

meta

  • simplify open_create_boot() (#7212)
  • improve join-cluster (#7198)

storage

  • use pipline to refactor compact (#7244)

sessions

  • decoupling session manager and other managers (#7093)

planner

  • use Evaluator to refactor insert (#7201)

workspace

  • reorg workspace, distinguish between common and query (#7188)

Build/Testing/CI

  • add tpch stateless test of factor 0.1 (#6739)
  • logictest support regex with new query type R (#7230)

Bug fixes

parser

  • fix parse float with E failed (#7186)

functions

  • fix function if result incorrect bug (#7239)

service

  • rewrite desc stage query (#7205)
  • fix statements desc share and show shares may have resultset (#7177)

cluster

  • remove invalid cluster node in current query (#7246)

new expression

  • make multi_if accept null conditions (#7226)

News

Let's take a look at what's new at Datafuse Labs & Databend each week.

Databend 0.8.0 Is Out!

Development of Databend v0.8 started on March 28th, with 5000+ commits and 4600+ file changes. In the last 5 months, the community of 120+ contributors added 420K lines of code and removed 160K lines, equivalent to rewriting Databend once. In this release, the community made significant improvements to the SQL Planner framework and migrated all SQL statements to the new Planner, providing full JOIN and subquery support.

Learn more: https://databend.rs/blog/databend-release-v0.8

Deploy Databend on Kubernetes

Databend now provides official K8s deployment documentation showing how to install and configure a Databend query cluster on Kubernetes with MinIO as the storage backend.

In addition to an easy to follow 4 step deployment guide, it also covers how to deploy a Databend cluster using the official Helm Charts.

Learn more: https://databend.rs/doc/deploy/deploying-databend-on-kubernetes

Using Databend as a Destination for Airbyte

  • Airbyte is an open-source data integration platform that syncs data from applications, APIs & databases to data warehouses lakes & DBs.
  • You could load data from any airbyte source to Databend.

Currently we implemented an experimental airbyte destination allow you to send data from your airbyte source to databend.

Learn more: https://databend.rs/doc/integrations/data-tool/airbyte

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilb41shBohuTANGClSlaiddantengsky
andylokandyariesdevilb41shBohuTANGClSlaiddantengsky
drmingdrmeredPandaflaneur2020gaoxingeleiyskylichuang
drmingdrmeredPandaflaneur2020gaoxingeleiyskylichuang
mergify[bot]PsiACERinChanNOWWWsoyeric128sundy-liTCeason
mergify[bot]PsiACERinChanNOWWWsoyeric128sundy-liTCeason
Xuanwoxudong963youngsofunZeaLoVezhang2014ZhiHanZ
Xuanwoxudong963youngsofunZeaLoVezhang2014ZhiHanZ
zhyass
zhyass

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #55

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

RFC

CTE

  • support common table expression in planner (#6056)

share

  • add alter share tenants sql (#7044)
  • add desc share sql (#7105)
  • add show shares sql (#7145)
  • add get_grant_tenants_of_share and get_grant_privileges_of_object api (#7157)

planner

  • support explain raw query (#7079)
  • support tuple map access pushdown to storage (#7080)
  • support explain syntax (#7124)

query

  • parquet schema case in-senstive match #7045

new expression

  • support 3 args function (#7075)
  • add a helper trait ColumnFrom (#7067)
  • add arithmetics functions (#7096 & #7140)

Improvement

meta

  • add defensive check to raft-store (#7125)
  • get_share_grant_objects API should return name instead of id (#7088)
  • refactor show share api (#7142)

query

  • predicate push down support multi expressions (#7078)
  • use common hashtable to store the numeric distinct state (#7135)
  • improve performance of aggregate function distinct (#7110)

workspace

  • reorg workspace, a basic structure (#7074)

Build/Testing/CI

  • add logictest in cluster mode (#7099)
  • add part of crdb logictest suites (#7154)

Bug fixes

meta

  • when handling append-entries, if prev_log_id is purged, it should not delete any logs (#7113)

planner

  • fix left join using() return error result (#7086)
  • fix ColumnPruner finds wrong smallest column index (#7097)

processor

  • use flight do_exchange replace flight do_put (#7025)
  • try fix invalid physical exchange plan for http handle (#7095)

handle

  • streaming mysql resultset (#7022)

News

Let's take a look at what's new at Datafuse Labs & Databend each week.

common table expressions

Databend supports common table expressions (CTEs) and allows you to use a WITH clause to define one or multiple named temporary result sets that are used by the query that follows. The "temporary" means that the result sets will be not permanently stored anywhere in the database schema. They act as temporary views that are only available to the query that follows.

Learn more: https://databend.rs/doc/reference/sql/query-syntax/dml-with

crates

  • openraft has released v0.7.0, which includes a number of refactorings and reliability improvements, and adds examples of rocksdb and sled.
  • opendal has just added support for Google Cloud Storage and released 0.13.0. Recent improvements include a new Builder with sync support and some useful layers.
  • The v0.2.0 release of opensrv supports streaming writes to mysql result sets and simplifies metadata for clickhouse.

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyb41shBohuTANGClSlaiddantengskydrmingdrmer
andylokandyb41shBohuTANGClSlaiddantengskydrmingdrmer
e1ijah1everpcpcgaoxingeIDJackKikkonleiysky
e1ijah1everpcpcgaoxingeIDJackKikkonleiysky
lichuangmergify[bot]PsiACEsandfleesoyeric128sundy-li
lichuangmergify[bot]PsiACEsandfleesoyeric128sundy-li
Xuanwoxudong963ZeaLoVezhang2014
Xuanwoxudong963ZeaLoVezhang2014

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #54

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

cte

  • support cte in parser #7030

share

  • add Create/Drop Share sql support (#6975 & #6987)
  • add Grant and Revoke Share Object sql support (#7009)

parser

  • add mysql dialect #7071

RBAC

  • avoid making cycle between roles (#7051)

handlers

  • deprecate clickhouse's tcp protocol support (#7012)
  • add tenant tables status api (#7037)

optimizer

  • implement column pruning for heuristic optimizer (#6478)

planner

  • support name resolution for alias (#6979)
  • enable projection pushdown (#6938)
  • enable new planner by default (#6869)
  • enhanced case-sensitivity of identifiers (#7026)

storage

  • generate cluster statistics in deletion (#7041)

new expression

  • add concat/filter/scatter/take kernel for chunk (#7038)
  • implement adaptive constant folding (#7054)

Improvement

split query

  • extract storages into sub crates (#6981)

share

  • refactor add/remove share accounts API (#7029)

proto conv

  • pb::S3StorageConfig should be decoded into StorageS3Config, instead of into enum StorageParams (#7047)
  • trait FromToPB use associated type instead of type parameter (#7048)

meta api

  • introduce Id for KVApi (#7055)
  • merge TableIdGen, DatabaseIdGen and ShareIdGen into one id-generator key (#7062)

Build/Testing/CI

  • enable subquery tests in ydb suit (#6948)
  • add mini hits dataset 100k to stateful test (#6964)
  • test fuse-table compatibility (#6990)

Bug fixes

  • fix hive table read when pushdowns is None (#7008)
  • fix column prune for COUNT(*) (#7000)
  • fix prune projection (#7013)
  • fix clickhouse handler, try catch error before response (#7019)
  • fix case sensitivity of cluster by expression (#7060)

News

Let's take a look at what's new at Databend each week.

Release proposal: Nightly v0.9

Databend plans to release v0.8 in the coming week, with new parser and planner support.

The call for proposals for the release of v0.9 is now open. See Release proposal: Nightly v0.9 #7052

Benchmarking Databend using TPC-H

Databend has recently enabled the new Planner by default, which means that we have fully enabled support for JOIN queries and correlated subqueries.

Now you can easily run the TPC-H test suite with Databend and perform benchmark tests. See Benchmarking Databend using TPC-H

Deprecate clickhouse's tcp protocol support

As the Clickhouse TCP protocol is almost a black box and requires a lot of effort to ensure compatibility, Databend has removed compatibility with the Clickhouse TCP protocol. See Deprecate clickhouse's tcp protocol support #7012

Databend will focus on Clickhouse HTTP protocol compatibility to ensure compatibility with the existing ecosystem.

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyb41shBohuTANGdantengskydrmingdrmerflaneur2020
andylokandyb41shBohuTANGdantengskydrmingdrmerflaneur2020
leiyskylichuangmergify[bot]pymongosandfleesoyeric128
leiyskylichuangmergify[bot]pymongosandfleesoyeric128
sundy-liXuanwoxudong963youngsofunZeaLoVezhyass
sundy-liXuanwoxudong963youngsofunZeaLoVezhyass

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #53

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

RFC

meta

  • add raft store fail metrics (#6927)
  • metasrv unittest logs tracing event with customized formatter (#6874)

storage

  • enable bloom filter index (#6639)
  • support query hive partition table (#6906)

RBAC

  • add auth role to jwt (#6829)

format

  • pass FileSplit instead of Vec (#6873)

new expression

  • make chunk support scalar values (#6918)
  • migrate quote, reverse and ascii (#6907)
  • migrate trim functions to new expression framework (#6921)

Improvement

  • Dedicate See you again to the old planner (#6895)
  • Remove unused reload config (#6933)

new expression

  • add NullableColumn and NullableColumnBuilder (#6867)
  • use Scalar to store constant in Expr (#6923)

Build/Testing/CI

Bug fixes

  • don't expand null scalar to column (#6834)
  • fix mistake using try_cast for cast (#6879)
  • fix session drop early in clickhouse handler (#6888)
  • fix binder create table (#6899)
  • fix mysql return 'Empty Set' when result set is empty (#6841)
  • fix case expr with case operator equal (#6950)
  • fix cannot kill query in cluster mod (#6954)

Tips

Let's learn a weekly tip from Databend.

Call for Migrating Functions to the New Expression Framework

If you are interested in typed type system, or maybe you'd like to try your hand at a database project, take a look at how Databend does it.

We are now trying to migrate some old functions to the new expression framework, would you like to try it out?

Background

Recently Databend is working on a new expression framework that will bring some interesting features.

  • Type checking.
  • Type-safe downcast.
  • Enum-dispatched columns.
  • Generic types.

How To

Legacy functions are settle in common/functions/src/scalars. The task is to migrate all of them to common/functions-v2/src/scalars/.

Usually you can reuse the logic of the previous implementation, it just needs some rewriting to make it fit the new way.

Similarly, the legacy tests in common/functions/tests/it/scalars/ should also be migrated to common/functions-v2/tests/it/scalars/.

The new tests will be written using goldenfile, so you can easily generate test cases without a lot of painful handwriting.

Example

A unary function OCTET_LENGTH can be defined using 6 lines incommon/functions-v2/src/scalars/strings.rs.

OCTET_LENGTH will return the length of a string in bytes.

registry.register_1_arg::<StringType, NumberType<u64>, _, _>(
    "octet_length",
    FunctionProperty::default(),
    |_| None,
    |val| val.len() as u64,
);

LENGTH is a synonym for OCTET_LENGTH.

We can easily define function aliases with one line.

registry.register_aliases("octet_length", &["length"]);

Next, let's write some tests to make sure it works correctly.

Edit common/functions-v2/tests/it/scalars/string.rs.

fn test_octet_length(file: &mut impl Write) {
    run_ast(file, "octet_length('latin')", &[]);
    run_ast(file, "octet_length(NULL)", &[]);
    run_ast(file, "length(a)", &[(
        "a",
        DataType::String,
        build_string_column(&["latin", "кириллица", "кириллица and latin"]),
    )]);
}

Register it in the test_string function,

#[test]
fn test_string() {
    let mut mint = Mint::new("tests/it/scalars/testdata");
    let file = &mut mint.new_goldenfile("string.txt").unwrap();

    ...
    test_octet_length(file);
    ...
}

Next, let's try to generate these test cases from the command line.

REGENERATE_GOLDENFILES=1 cargo test -p common-functions-v2 --test it

Well done, we did it.

Learn More

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilb41shBohuTANGdantengskydrmingdrmer
andylokandyariesdevilb41shBohuTANGdantengskydrmingdrmer
flaneur2020gaoxingeleiyskylichuangmergify[bot]PsiACE
flaneur2020gaoxingeleiyskylichuangmergify[bot]PsiACE
RinChanNOWWWsandfleesoyeric128sundy-liTCeasonXuanwo
RinChanNOWWWsandfleesoyeric128sundy-liTCeasonXuanwo
xudong963ygf11youngsofunZeaLoVezhang2014zhyass
xudong963ygf11youngsofunZeaLoVezhang2014zhyass

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.