This week in Databend #38

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • databend-query(parser): add select statement by @andylokandy, (#4941)
  • databend-query: support mysqldump dump schema by @BohuTANG, (#4972)
  • databend-query: refine new planner framework: use trait object to represent plans to make it more extensible. by @leiysky, (#4895)
  • databend-query: create user if not exists on JWT authenticate by @Junnplus, (#4924)
  • databend-meta: support watch api by @lichuang, (#4779)
  • databend-query(parser): support keyword DATABASE synonym SCHEMA by @TCeason, (#4855)

Improvement

  • databend-query: reconstruct type: date/datetime to simplify date type by @Veeupup, (#4921)
  • common-functions: refine the functions name from xY to x_y by @BohuTANG, (#4915, #4906 and #4884)
  • common-meta: metasrv has to be compatible with 20220413-34e89c9 by @drmingdrmer, (#4901)
  • databend-query: compatible with mysql insert and select by @TCeason, (#4883)
  • common-functions: replace FactoryCreator with FactoryCreatorWithTypes for functions by @zhyass, (#4688)

Build / Testing / CI

Performance Improvement

  • databend-query(processor): replace global mutex with atomic by @zhang2014, (#4905)

Bug fixes

  • common-functions(cast): fix the behavior of null to boolean by @sundy-li, (#4911)
  • databend-query(group_by): fix group by with negative value by @zhang2014, (#4902)
  • databend-query(transform_limit): fixes limit and offset with one block by @zhang2014, (#4907)
  • databend-query(interpreters): fix empty query by @cadl, (#4894)
  • *: fix show grants from inherited role by @Junnplus, (#4873)

Tips

Let's learn a weekly tip from Databend.

Visualization Databend data in Jupyter Notebook

The Jupyter Notebook is the original web application for creating and sharing computational documents. It offers a simple, streamlined, document-centric experience.

Recently, we have worked on improving Databend's compatibility with the MySQL/Clickhouse ecosystem to provide a better experience. Thanks to improved support for sqlalchemy, we can now interact with data in Databend in Jupyter Notebook.

To experience it, there are only three steps:

Databend with Jupyter Notebook

You can check out https://databend.rs/doc/integrations/gui-tool/jupyter to learn more.

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #37

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • databend-query(expressions): add try_cast function by @sundy-li, (#4794)
  • common-functions: support cast variant to other data types by @b41sh, (#4787)
  • common-functions: support REGEXP_SUBSTR function by @nange, (#4771)
  • databend-query: support show table status by @TCeason, (#4757)
  • common-functions: support semi-structured function GET/GET_IGNORE_CASE/GET_PATH by @b41sh, (#4684)

Improvement

  • databend-query: pass parameter from query to functions by @Veeupup, (#4805)
  • databend-query(mysql_handler): add more federated command for some old drivers by @BohuTANG, (#4809)
  • databend-query(compact): add transform compact by @sundy-li, (#4784)
  • databend-query(storage): show fuse engine table status by @dantengsky, (#4786)
  • databend-query(mysql_handler): sqlalchemy execute work by @BohuTANG, (#4774)

Performance Improvement

  • databend-query: try to avoid string copy in insert-values again by @ygf11, (#4730)

Bug fixes

  • databend-query(fuse): limit push down respect orders by @sundy-li, (#4818)
  • common-meta(state_machine): rename table should keep table_id nochange by @ariesdevil, (#4838)
  • common-building: try persist credits at build time by @PsiACE, (#4791)
  • databend-query: select * shouldn't return results by @xudong963, (#4796)

Tips

Let's learn a weekly tip from Databend.

Databend Performance Data Collection and Visualization

Late last week, we proudly announced the https://perf.databend.rs/. This is a website for monitoring the performance of Databend's nightly releases.

All benchmarks are currently running on an Amazon EC2 server of size c5n.9xlarge, with 36 vCPUs and 96 GiB of memory, and Intel Xeon Platinum 8000 processors.

Databend Performance Data

The current benchmarks consists of:

  • A set of numerical computation SQLs for evaluating the performance of in-memory vectorization engines, based on Databend's numbers table function providing ten billions data.
  • A common set of SQLs for air traffic analysis, based on the publicly available OnTime dataset from the U.S. Department of Transportation, 60.8 GB of CSV, 202687654 records.

To view the source code, please visit GitHub - datafuselabs/databend-perf:

  • collector: stores daily performance data for each nightly release.
  • benchmarks: contains the benchmark suite defined by the yaml format.

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #36

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • common-functions: support REGEXP_INSTR function by @nange, (#4629)
  • databend-query(processor): support complete executor by @zhang2014, (#4639)
  • databend-query: support logical view by @Veeupup, (#4628)
  • databend-query: support information_schema database by @Veeupup, (#4672)
  • *: refactor type deserialization by @sundy-li, (#4634)

Improvement

Build/Test/CI

Bug fixes

  • databend-query: prohibit using reserved table option in create table statement. by @dantengsky, (#4632)
  • bump to OpenDAL v0.4 to fix copy into don't support special filename by @Xuanwo, (#4678)

Tips

Let's learn a weekly tip from Databend.

Analyzing Nginx Logs with Databend and Vector

Systems are producing all kinds metrics and logs time by time, do you want to gather them and analyze the logs in real time?

Databend provides integration with Vector, easy to do it now!

You can use Databend to analyze Nginx access logs in just four steps:

  1. Deploy Databend, Create a Database and a Table for Nginx logs, Create a User for Vector Auth
  2. Install and Configure Nginx
  3. Install and Configure Vector, Run it
  4. Generate Nginx logs and Analyze them in Databend

To learn more about how to implement it, check out Analyzing Nginx Logs with Databend and Vector.

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #35

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • databend-query: add ndjson source for streaming load by @sundy-li, (#4561)
  • common-functions: embed markdown dos into system.functions by @sundy-li, (#4552)
  • databend-query: add user option for user info: introduce user options to separate system management privileges and security object privileges. by @Junnplus, (#4553)
  • common-datavalues: support Semi-structured array, object data type by @b41sh, (#4571)
  • http handler: support basic clickhouse REST handler by @youngsofun, (#4613)
  • databend-query: add check_json function by @kevinw66, (#4606)
  • databend-query(processor): support pushing executor by @zhang2014, (#4625)

Improvement

  • databend-query: version of storage layout by @dantengsky, (#4244)
  • databend-query: remove manage access check and use tenant statement by @Junnplus, (#4616)

Performance Improvement

  • datavalues: Simd selected for StringColumn by @LiuYuHui, (#4528)
  • databend-query: impove performance of insert-into literal values by @ygf11, (#4497)

Build/Test/CI

Bug fixes

Tips

Let's learn a weekly tip from Databend.

Announcing Databend v0.7.0-nightly

This release brings Databend architecture to a stable stage!

  • Simple primitive data type framework
  • New Pull&Push-Based Processor framework
  • Git-Like table format with snapshot transaction isolation
  • Announce OpenDAL for object storage data access
  • Announce OpenRaft to improve raft as the next generation consensus protocol

v0.7.0-nightly also includes several new user-facing features, performance optimizations, and many other improvements, activate your object storage for big data analytics!

To learn more, please check out Announcing Databend v0.7.0 - Deploy easier, query faster.

Want to know what will happen in v0.8.0? Please check Checklist proposal: Nightly v0.8.

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #34

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • databend-query: support Semi-structured variant data type by @b41sh, (#4348)
  • databend-query: support stage list & stage streaming upload by @sundy-li, (#4472 & #4477)
  • databend-query: integrate fuse table with new processor by @zhang2014, (#4444)
  • **databend-query: add support for alter table rename statement ** by @kevinw66, (#4532)
  • *: add bootstrap_tenant procedure for tenant management by @Junnplus, (#4530)
  • http handler: support server-side-session by @youngsofun, (#4538)
  • metactl: dump data from a running metasrv by @light4, (#4473)
  • common-functions: support PARSE_JSON / TRY_PARSE_JSON function by @b41sh, (#4534)

Improvement

Performance Improvement

  • datavalues: Simd selected for BooleanColumn by @LiuYuHui, (#4484)
  • databend-query: enable new processor by default(standalone mode) by @zhang2014, (#4486)

Build/Test/CI

  • add musl support & release musl compiled binaries by @Xuanwo & @ZhiHanZ, (#4520 & #4535)
  • drop databend-benchmark and old perf tool, add benchmark solution with hyperfine by @PsiACE, (#4545)

Bug fixes

  • databend-query: fix groupby single string in new processor by @sundy-li, (#4475)
  • clickhouse handler: to_clickhouse_block always convert to full column if constant by @sundy-li, (#4514)

Tips

Let's learn a weekly tip from Databend.

How to Benchmark with Hyperfine

Databend recommends using hyperfine to perform benchmarking via the ClickHouse/MySQL client. With a simple script, we can run benchmark easily:

#!/bin/bash

WARMUP=3
RUN=10

export script="hyperfine -w $WARMUP -r $RUN"

script=""
function run() {
        port=$1
        sql=$2
        result=$3
        script="hyperfine -w $WARMUP -r $RUN"
        while read SQL; do
                n="-n \"$SQL\" "
                s="echo \"$SQL\" | mysql -h127.0.0.1 -P$port -uroot -s"
                script="$script '$n' '$s'"
        done <<< $(cat $sql)

        script="$script  --export-markdown $result"
        echo $script | bash -x
}


run "$1" "$2" "$3"

For details, please read databend.rs - How to Benchmark with Hyperfine

We've also updated some of Databend's performance, if you're interested in that, check out the following articles:

Changlogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Ecosystem/Upstream

From open source, for open source. Our team is also committed to contributing to the Rust ecosystem and upstream dependencies.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.