This week in Databend #37

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • databend-query(expressions): add try_cast function by @sundy-li, (#4794)
  • common-functions: support cast variant to other data types by @b41sh, (#4787)
  • common-functions: support REGEXP_SUBSTR function by @nange, (#4771)
  • databend-query: support show table status by @TCeason, (#4757)
  • common-functions: support semi-structured function GET/GET_IGNORE_CASE/GET_PATH by @b41sh, (#4684)

Improvement

  • databend-query: pass parameter from query to functions by @Veeupup, (#4805)
  • databend-query(mysql_handler): add more federated command for some old drivers by @BohuTANG, (#4809)
  • databend-query(compact): add transform compact by @sundy-li, (#4784)
  • databend-query(storage): show fuse engine table status by @dantengsky, (#4786)
  • databend-query(mysql_handler): sqlalchemy execute work by @BohuTANG, (#4774)

Performance Improvement

  • databend-query: try to avoid string copy in insert-values again by @ygf11, (#4730)

Bug fixes

  • databend-query(fuse): limit push down respect orders by @sundy-li, (#4818)
  • common-meta(state_machine): rename table should keep table_id nochange by @ariesdevil, (#4838)
  • common-building: try persist credits at build time by @PsiACE, (#4791)
  • databend-query: select * shouldn't return results by @xudong963, (#4796)

Tips

Let's learn a weekly tip from Databend.

Databend Performance Data Collection and Visualization

Late last week, we proudly announced the https://perf.databend.rs/. This is a website for monitoring the performance of Databend's nightly releases.

All benchmarks are currently running on an Amazon EC2 server of size c5n.9xlarge, with 36 vCPUs and 96 GiB of memory, and Intel Xeon Platinum 8000 processors.

Databend Performance Data

The current benchmarks consists of:

  • A set of numerical computation SQLs for evaluating the performance of in-memory vectorization engines, based on Databend's numbers table function providing ten billions data.
  • A common set of SQLs for air traffic analysis, based on the publicly available OnTime dataset from the U.S. Department of Transportation, 60.8 GB of CSV, 202687654 records.

To view the source code, please visit GitHub - datafuselabs/databend-perf:

  • collector: stores daily performance data for each nightly release.
  • benchmarks: contains the benchmark suite defined by the yaml format.

Changelogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #36

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • common-functions: support REGEXP_INSTR function by @nange, (#4629)
  • databend-query(processor): support complete executor by @zhang2014, (#4639)
  • databend-query: support logical view by @Veeupup, (#4628)
  • databend-query: support information_schema database by @Veeupup, (#4672)
  • *: refactor type deserialization by @sundy-li, (#4634)

Improvement

Build/Test/CI

Bug fixes

  • databend-query: prohibit using reserved table option in create table statement. by @dantengsky, (#4632)
  • bump to OpenDAL v0.4 to fix copy into don't support special filename by @Xuanwo, (#4678)

Tips

Let's learn a weekly tip from Databend.

Analyzing Nginx Logs with Databend and Vector

Systems are producing all kinds metrics and logs time by time, do you want to gather them and analyze the logs in real time?

Databend provides integration with Vector, easy to do it now!

You can use Databend to analyze Nginx access logs in just four steps:

  1. Deploy Databend, Create a Database and a Table for Nginx logs, Create a User for Vector Auth
  2. Install and Configure Nginx
  3. Install and Configure Vector, Run it
  4. Generate Nginx logs and Analyze them in Databend

To learn more about how to implement it, check out Analyzing Nginx Logs with Databend and Vector.

Changelogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #35

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • databend-query: add ndjson source for streaming load by @sundy-li, (#4561)
  • common-functions: embed markdown dos into system.functions by @sundy-li, (#4552)
  • databend-query: add user option for user info: introduce user options to separate system management privileges and security object privileges. by @Junnplus, (#4553)
  • common-datavalues: support Semi-structured array, object data type by @b41sh, (#4571)
  • http handler: support basic clickhouse REST handler by @youngsofun, (#4613)
  • databend-query: add check_json function by @kevinw66, (#4606)
  • databend-query(processor): support pushing executor by @zhang2014, (#4625)

Improvement

  • databend-query: version of storage layout by @dantengsky, (#4244)
  • databend-query: remove manage access check and use tenant statement by @Junnplus, (#4616)

Performance Improvement

  • datavalues: Simd selected for StringColumn by @LiuYuHui, (#4528)
  • databend-query: improve performance of insert-into literal values by @ygf11, (#4497)

Build/Test/CI

Bug fixes

Tips

Let's learn a weekly tip from Databend.

Announcing Databend v0.7.0-nightly

This release brings Databend architecture to a stable stage!

  • Simple primitive data type framework
  • New Pull&Push-Based Processor framework
  • Git-Like table format with snapshot transaction isolation
  • Announce OpenDAL for object storage data access
  • Announce OpenRaft to improve raft as the next generation consensus protocol

v0.7.0-nightly also includes several new user-facing features, performance optimizations, and many other improvements, activate your object storage for big data analytics!

To learn more, please check out Announcing Databend v0.7.0 - Deploy easier, query faster.

Want to know what will happen in v0.8.0? Please check Checklist proposal: Nightly v0.8.

Changelogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #34

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

  • databend-query: support Semi-structured variant data type by @b41sh, (#4348)
  • databend-query: support stage list & stage streaming upload by @sundy-li, (#4472 & #4477)
  • databend-query: integrate fuse table with new processor by @zhang2014, (#4444)
  • **databend-query: add support for alter table rename statement ** by @kevinw66, (#4532)
  • *: add bootstrap_tenant procedure for tenant management by @Junnplus, (#4530)
  • http handler: support server-side-session by @youngsofun, (#4538)
  • metactl: dump data from a running metasrv by @light4, (#4473)
  • common-functions: support PARSE_JSON / TRY_PARSE_JSON function by @b41sh, (#4534)

Improvement

Performance Improvement

  • datavalues: Simd selected for BooleanColumn by @LiuYuHui, (#4484)
  • databend-query: enable new processor by default(standalone mode) by @zhang2014, (#4486)

Build/Test/CI

  • add musl support & release musl compiled binaries by @Xuanwo & @ZhiHanZ, (#4520 & #4535)
  • drop databend-benchmark and old perf tool, add benchmark solution with hyperfine by @PsiACE, (#4545)

Bug fixes

  • databend-query: fix groupby single string in new processor by @sundy-li, (#4475)
  • clickhouse handler: to_clickhouse_block always convert to full column if constant by @sundy-li, (#4514)

Tips

Let's learn a weekly tip from Databend.

How to Benchmark with Hyperfine

Databend recommends using hyperfine to perform benchmarking via the ClickHouse/MySQL client. With a simple script, we can run benchmark easily:

#!/bin/bash

WARMUP=3
RUN=10

export script="hyperfine -w $WARMUP -r $RUN"

script=""
function run() {
        port=$1
        sql=$2
        result=$3
        script="hyperfine -w $WARMUP -r $RUN"
        while read SQL; do
                n="-n \"$SQL\" "
                s="echo \"$SQL\" | mysql -h127.0.0.1 -P$port -uroot -s"
                script="$script '$n' '$s'"
        done <<< $(cat $sql)

        script="$script  --export-markdown $result"
        echo $script | bash -x
}


run "$1" "$2" "$3"

For details, please read databend.rs - How to Benchmark with Hyperfine

We've also updated some of Databend's performance, if you're interested in that, check out the following articles:

Changelogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Ecosystem/Upstream

From open source, for open source. Our team is also committed to contributing to the Rust ecosystem and upstream dependencies.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.

This week in Databend #33

Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.

Big changes

Below is a list of some major changes that we don't want you to miss.

Features

Improvement

  • databend-query: embed more column meta data inside SegmentInfo by @dantengsky, (#4372)
  • databend-query: change async rwlock to sync to avoid async drop by @ariesdevil, (#4394)
  • databend-query: implement new block reader for fuse engine by @zhang2014, (#4383)
  • databend-query: support copy from stage by @sundy-li, (#4437)
  • databend-query: change bloom filter to use datavalue2 by @junli1026, (#4418)
  • functions: refactor ScalarExpression and add simd operator by @zhyass, (#4375)

Performance Improvement

  • databend-query&datablocks: introduce HashMethodSingleString by @sundy-li, (#4417)
  • databend-query: dedicated runtime for io tasks by @dantengsky, (#4404)

Bug fixes

  • databend-query: fix hang after async processor throw error by @zhang2014, (#4380)
  • databend-query: global MetaGrpcClient cause dispatch drop error by @ariesdevil, (#4361)

Tips

Let's learn a weekly tip from Databend.

Deploy Databend and try to develop

Databend has now completely updated its online documentation, supports for multiple deployment scenarios including local disk, s3, minio, and more, and provides first step for application developers in Go and Python.

Deploy

Develop

  • With Go: https://databend.rs/doc/develop/go
  • With Python: https://databend.rs/doc/develop/python

Changelogs

You can check the changelogs of Databend nightly to learn about our latest developments.

Meet Us

Please join the DatafuseLabs Community if you are interested in Databend.

We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.

You can submit issues for any problems you find. We also highly appreciate any of your pull requests.