Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .

Special Note: This Week in Databend will be gradually migrated to the Databend Blog. We will keep the content in sync until the final migration is complete.

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

AST

  • select from stage support uri with connect*ion options (#9926)

Catalog

  • Iceberg/create-catalog (#9017)

Expression

  • type decimal support agg func min/max (#10085)
  • add sum/avg for decimal types (#10059)

Pipeline

  • enrich core pipelines processors (#10098)

Query

  • create stage, select stage, copy, infer_schema support named file format (#10084)
  • query result cache (#10042)

Storage

  • table data cache (#9772)
  • use drop_table_by_id api in drop all (#10054)
  • native storage format support nested data types (#9798)

Code Refactoring πŸŽ‰

Meta

  • add compatible layer for upgrade (#10082)
  • More elegant error handling (#10112, #10114, etc.)

Cluster

  • support exchange sorting (#10149)

Executor

  • add check processor graph completed (#10166)

Planner

  • apply constant folder at physical plan builder (#9889)

Query

  • use accumulating to impl single state aggregator (#10125)

Storage

  • adopt OpenDAL's batch delete support (#10150)
  • adopt OpenDAL query based metadata cache (#10162)

Build/Testing/CI Infra Changes πŸ”Œ

  • release deb repository (#10080)
  • release with systemd units (#10145)

Bug Fixes πŸ”§

Expression

  • no longer return Variant as common super type (#9961)
  • allow auto cast from string and variant (#10111)

Cluster

  • fix limit query hang in cluster mode (#10006)

Storage

  • wrong column statistics when contain tuple type (#10068)
  • compact not work as expected with add column (#10070)
  • fix add column min/max stat bug (#10137)

What's On In Databend

Stay connected with the latest news about Databend.

Query Result Cache

In the past week, Databend now supports caching of query results!

             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” 1  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” 1
             β”‚         β”œβ”€β”€β”€β–Ίβ”‚         β”œβ”€β”€β”€β–ΊDummy───►Downstream
Upstream────►│Duplicateβ”‚ 2  β”‚         β”‚ 3
             β”‚         β”œβ”€β”€β”€β–Ίβ”‚         β”œβ”€β”€β”€β–ΊDummy───►Downstream
             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚         β”‚
                            β”‚ Shuffle β”‚
             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” 3  β”‚         β”‚ 2  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
             β”‚         β”œβ”€β”€β”€β–Ίβ”‚         β”œβ”€β”€β”€β–Ίβ”‚  Write  β”‚
Upstream────►│Duplicateβ”‚ 4  β”‚         β”‚ 4  β”‚ Result  β”‚
             β”‚         β”œβ”€β”€β”€β–Ίβ”‚         β”œβ”€β”€β”€β–Ίβ”‚  Cache  β”‚
             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Table Data Cache

Databend now supports table data cache:

  • disk cache: raw column(compressed) data of the data block.
  • in-memory cache(experimental): deserialized column objects of a data block.

For cache-friendly workloads, the performance gains are significant.

Deb Source & Systemd Support

Databend now offers the official Deb package source and supports the use of systemd to manage the service.

For DEB822 Source Format:

sudo curl -L -o /etc/apt/sources.list.d/datafuselabs.sources https://repo.databend.rs/deb/datafuselabs.sources
sudo apt update
sudo apt install databend
sudo systemctl start databend-meta
sudo systemctl start databend-query

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Service Activation Progress Report

When starting a Query/Meta node, it is necessary to perform checks and output them explicitly to help the user diagnose faults and confirm status.

Example:

storage check succeed
meta check failed: timeout, no response. endpoints: xxxxxxxx .
status check failed: address already in use.

Issue 10193: Feature: output the necessary progress when starting a query/meta node

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilb41shBig-WuuBohuTANGcameronbraid
andylokandyariesdevilb41shBig-WuuBohuTANGcameronbraid
Chasen-ZhangClSlaiddantengskydrmingdrmereverpcpcjohnhaxx7
Chasen-ZhangClSlaiddantengskydrmingdrmereverpcpcjohnhaxx7
lichuangmergify[bot]PsiACERinChanNOWWWsoyeric128sundy-li
lichuangmergify[bot]PsiACERinChanNOWWWsoyeric128sundy-li
suyanhanxTCeasonXuanwoxudong963youngsofunzhang2014
suyanhanxTCeasonXuanwoxudong963youngsofunzhang2014
zhyass
zhyass

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.