This Week in Databend #77

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

Special Note: This Week in Databend will be gradually migrated to the Databend Blog. We will keep the content in sync until the final migration is complete.

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Meta

  • use expression::TableSchema to replace obsolete datavalues::DataSchema (#9506)
  • iter() iterate every tree and every records in theses trees (#9621)

Expression

  • add other base geo functions (#9588)

Optimizer

  • improve cardinality estimation for join based on histogram (#9594)

Planner

  • improve join reorder algorithm (#9571)

Query

  • support insert with placeholder (#9575)
  • set setting support expr (#9574)
  • add information_schema for sharding-jdbc (#9583)
  • support named params for table functions (#9630)

Storage

  • read_parquet page index (#9563)
  • update interpreter and storage support (#9261)

Code Refactoring πŸŽ‰

  • refine on_error mode (#9473)

Meta

  • remove unused meta types and conversion util (#9584)

Parser

  • more strict parser for format_options (#9635)

Expression

  • rearrange common_expression and common_function (#9585)

Build/Testing/CI Infra Changes πŸ”Œ

  • run sqllogictests with binary (#9603)

Bug Fixes πŸ”§

Expression

  • constant folder should run repeatedly until stable (#9572)
  • check_date() and to_string(boolean) may panic (#9561)

Planner

  • fix stack overflow when applying RuleFilterPushDownJoin (#9645)

Storage

  • fix range filter read stat with index (#9619)

Sqllogictest

  • sqllogic test hangs (cluster mod + clickhouse handler) (#9615)

What's On In Databend

Stay connected with the latest news about Databend.

Upgrade Databend Query from 0.8 to 0.9

Databend-query-0.9 introduces incompatible changes in metadata, these metadata has to be manually migrated. Databend provides a program for this job: databend-meta-upgrade-09, which you can find in a release package or can be built from source.

Upgrade

databend-meta-upgrade-09 --cmd upgrade --raft-dir "<./your/raft-dir/>"

Learn More

Release Proposal: Nightly v1.0

The call for proposals for the release of v1.0 is now open.

The preliminary plan is to release in March, mainly focusing on alter table, update, and group by spill.

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Add Type Checker for Sqllogictest

We can check if each row's each element's type is correct.

databend/tests/sqllogictests/src/client/mysql_client.rs

 // Todo: add types to compare 
 Ok(DBOutput::Rows { 
     types, 
     rows: parsed_rows, 

Issue 9647: Feature: Add type checker for sqllogictest

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

We're gearing up for the v0.9 release of Databend. Stay tuned.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilb41shBohuTANGdantengskydrmingdrmer
andylokandyariesdevilb41shBohuTANGdantengskydrmingdrmer
everpcpcleiyskymergify[bot]PsiACERinChanNOWWWsoyeric128
everpcpcleiyskymergify[bot]PsiACERinChanNOWWWsoyeric128
sundy-liTCeasonXuanwoxudong963youngsofunyufan022
sundy-liTCeasonXuanwoxudong963youngsofunyufan022
zhang2014zhyass
zhang2014zhyass

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

This Week in Databend #76

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

Special Note: This Week in Databend will be gradually migrated to the Databend Blog. We will keep the content in sync until the final migration is complete.

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Meta

  • add reader-min-msg-ver and msg-min-reader-ver in proto-conv (#9535)

Planner

  • support tuple.1 and get(1)(tuple) (#9493)
  • support display estimated rows in EXPLAIN (#9528)

Query

  • efficiently memory two level group by in standalone mode (#9504)

Storage

  • support nested type in read_parquet (#9486)
  • add build options table (#9502)

Code Refactoring πŸŽ‰

  • merge new expression (#9411)
  • remove and rename crates (#9481)
  • bump rust version (#9540)

Expression

  • move negative functions to binder (#9484)
  • use error_to_null() to eval try_cast (#9545)

Functions

  • replace h3ron to h3o (#9553)

Format

  • extract AligningStateTextBased (#9472)
  • richer error context (#9534)

Query

  • use ctx to store the function evaluation error (#9501)
  • refactor map access to support view read tuple inner (#9516)

Storage

  • bump opendal for streaming read support (#9503)
  • refactor bloom index to use vectorized siphash function (#9542)

Bug Fixes πŸ”§

HashTable

  • fix memory leak for unsized hash table (#9551)

Storage

  • fix row group stats collection (#9537)

What's On In Databend

Stay connected with the latest news about Databend.

New Year, New Expression!

We're so thrilled to tell you that Databend now fully works with New Expression after more than a half year of dedicated work. New Expression introduces a formal type system to Databend and supports type-safe downward casting , making the definition of functions easier.

New Expression is still being tuned, and a new version (v0.9) of Databend will be released once the tuning work is complete.

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

UNNEST Function

The UNNEST function takes an array as a parameter, and returns a table containing each element of the array in a row.

Syntax

UNNEST(ARRAY) [WITH OFFSET]

If you're interested in becoming a contributor, helping us develop the UNNEST function would be a good start.

Issue 9549: Feature: Support unnest

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

We're gearing up for the v0.9 release of Databend. Stay tuned.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilb41shBohuTANGClSlaiddantengsky
andylokandyariesdevilb41shBohuTANGClSlaiddantengsky
dependabot[bot]drmingdrmereverpcpcflaneur2020leiyskymergify[bot]
dependabot[bot]drmingdrmereverpcpcflaneur2020leiyskymergify[bot]
PsiACERinChanNOWWWsoyeric128sundy-liTCeasonwubx
PsiACERinChanNOWWWsoyeric128sundy-liTCeasonwubx
Xuanwoxudong963youngsofunzhang2014
Xuanwoxudong963youngsofunzhang2014

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

This Week in Databend #75

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

Special Note: This Week in Databend will be gradually migrated to the Databend Blog. We will keep the content in sync until the final migration is complete.

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Format

  • basic output format JSON (#9447)

Query

  • check connection params (#9437)
  • add max_query_row_nums (#9406)

Storage

  • support prewhere in hive (#9427)
  • add generic cache trait for different object reader (#9436)
  • add metrics for new cache (#9445)

New Expression

  • migrate hash func to func-v2 (#9402)

Sqllogictest

  • run all tests in parallel (#9400)

Code Refactoring πŸŽ‰

Storage

  • add to_bytes and from_bytes for CachedObject (#9439)
  • better table-meta and parquet reader function (#9434)
  • convert fuse_snapshot unit tests to sqlloigc test (#9428)

Bug Fixes πŸ”§

Format

  • catch unwind when read split (#9420)

User

Planner

  • create Stage URL's path should ends with / (#9450)

What's On In Databend

Stay connected with the latest news about Databend.

Databend 2022 Recap

Let's look back and see how Databend did in 2022.

  • Open source: got 2,000+ stars, merged 2,400+ PRs, and solved 1,900 issues.
  • From data warehouse to lakehouse: Brand-new design with enhanced capabilities.
  • Rigorous testing: SQL Logic Tests, SQLancer, and https://perf.databend.rs.
  • Building the ecosystem: More customers chose, trusted, and grew with Databend, including Kuaishou and SAP.
  • Databend Cloud: Built on top of Databend, the next big data analytics platform.

We wish everyone a Happy New Year and look forward to engaging with you.

Learn More

Databend 2023 Roadmap

As the new year approaches, Databend is also actively planning its roadmap for 2023.

We will continue to polish the Planner and work on data and query caching. Enhancing storage and query issues for PB-level data volumes is also on our list.

Try Databend and join the roadmap discussion.

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Profile-Guided Optimization (PGO)

The basic concept of PGO is to collect data about the typical execution of a program (e.g. which branches it is likely to take) and then use this data to inform optimizations such as inlining, machine-code layout, register allocation, etc.

rustc supports doing profile-guided optimization (PGO). We expect to be able to use it to enhance the build.

Issue 9387: Feature: Add PGO Support

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

ariesdevilBohuTANGdantengskydependabot[bot]everpcpcflaneur2020
ariesdevilBohuTANGdantengskydependabot[bot]everpcpcflaneur2020
hantmacleiyskymergify[bot]PsiACEsandfleesoyeric128
hantmacleiyskymergify[bot]PsiACEsandfleesoyeric128
sundy-liTCeasonXuanwoxudong963youngsofunzhang2014
sundy-liTCeasonXuanwoxudong963youngsofunzhang2014

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

This week in Databend #74

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

Special Note: This Week in Databend will be gradually migrated to the Databend Blog. We will keep the content in sync until the final migration is complete.

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Meta

  • remove stream when a watch client is dropped (#9334)

Planner

  • support selectivity estimation for range predicates (#9398)

Query

  • support copy on error (#9312)
  • support databend-local (#9282)
  • external storage support location part prefix (#9381)

Storage

  • rangefilter support in (#9330)
  • try to improve object storage io read (#9335)
  • support table compression (#9370)

Metrics

  • add more metrics for fuse compact and block write (#9399)

Sqllogictest

  • add no-fail-fast support (#9391)

Code Refactoring πŸŽ‰

*

  • adopt rustls entirely, removing all deps to native-tls (#9358)

Format

  • remove format_xxx settings (#9360)
  • adjust interface of FileFormatOptionsExt (#9395)

Planner

  • remove SyncTypeChecker (#9352)

Query

  • split fuse source to read data and deserialize (#9353)
  • avoid io copy in read parquet data (#9365)
  • add uncompressed buffer for parquet reader (#9379)

Storage

  • add read/write settings (#9359)

Bug Fixes πŸ”§

Format

  • fix align_flush with header only (#9327)

Settings

  • use logical CPU number as default value of num_cpus (#9396)

Processors

  • the data type on both sides of the union does not match (#9361)

HTTP Handler

  • false alarm (warning log) about query not exists (#9380)

Sqllogictest

  • refactor sqllogictest http client and fix expression string like (#9363)

What's On In Databend

Stay connected with the latest news about Databend.

Introducing databend-local​

Inspired by clickhouse-local, databend-local allows you to perform fast processing on local files, without the need of launching a Databend cluster.

> export CONFIG_FILE=tests/local/config/databend-local.toml
> cargo run --bin=databend-local -- --sql="SELECT * FROM tbl1" --table=tbl1=/path/to/databend/docs/public/data/books.parquet

exec local query: SELECT * FROM tbl1
+------------------------------+---------------------+------+
| title                        | author              | date |
+------------------------------+---------------------+------+
| Transaction Processing       | Jim Gray            | 1992 |
| Readings in Database Systems | Michael Stonebraker | 2004 |
| Transaction Processing       | Jim Gray            | 1992 |
| Readings in Database Systems | Michael Stonebraker | 2004 |
+------------------------------+---------------------+------+
4 rows in set. Query took 0.015 seconds.

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Compressing Short Strings​

When processing the same queries with short strings involved, Databend usually reads more data than other databases, such as Snowflake.

SELECT SearchPhrase, MIN(URL), COUNT(*) AS c FROM hits WHERE URL LIKE '%google%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10;

Such queries might be more efficient if short strings (URLs, etc) are compressed.

Issue 9001: performance: compressing for short strings

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilBohuTANGdantengskydrmingdrmereastfisher
andylokandyariesdevilBohuTANGdantengskydrmingdrmereastfisher
everpcpcleiyskymergify[bot]PsiACERinChanNOWWWsoyeric128
everpcpcleiyskymergify[bot]PsiACERinChanNOWWWsoyeric128
sundy-liXuanwoxudong963youngsofunzhang2014zhyass
sundy-liXuanwoxudong963youngsofunzhang2014zhyass

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

This week in Databend #73

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

Special Note: This Week in Databend will be gradually migrated to the Databend Blog. We will keep the content in sync until the final migration is complete.

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Multiple Catalogs

  • implement show tables (from|in catalog.database) (#9153)

Planner

  • introduce histogram in column statistics (#9310)

Query

  • support attaching stage for insert values (#9249)
  • add native format in fuse table (#9279)
  • add internal_enable_sandbox_tenant config and sandbox_tenant (#9277)

Sqllogictest

  • introduce rust native sqllogictest framework (#9150)

Code Refactoring πŸŽ‰

*

  • unify apply_file_format_options for copy & insert (#9323)

IO

  • remove unused code (#9266)

meta

  • test watcher count (#9324)

Planner

  • replace TableContext in planner with PlannerContext (#9290)

Bug Fixes πŸ”§

Base

  • try fix SIGABRT when catch unwind (#9269)
  • replace #[thread_local] to thread_local macro (#9280)

Query

  • fix unknown database in query without relation to this database (#9250)
  • fix wrong current_role when drop the role (#9276)

What's On In Databend

Stay connected with the latest news about Databend.

Introduced a Rust Native Sqllogictest Framework

Sqllogictest verifies the results returned from a SQL database engine by comparing them with the results of other engines for the same queries.

In the past, Databend ran such tests using a program written in Python and migrated a large number of test cases from other popular databases. We implemented the program again with sqllogictest-rs in recent days.

Learn More

Experimental: Native Format

PA is a native storage format based on Apache Arrow. Similar to Arrow IPC, PA aims at optimizing the storage layer.

Databend is introducing PA as a native storage format in the hope of getting a performance boost, though it's still at an early stage of development.

create table tmp (a int) ENGINE=FUSE STORAGE_FORMAT='native';

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Checking File Existence Before Returning Presigned URL​

When presigning a file, Databend currently returns a potentially valid URL based on the filename without checking if the file really exists. Thus, the 404 error might occur if the file doesn't exist at all.

Issue 8702: Before return presign url add file exist judgement

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

ariesdevilb41shBohuTANGClSlaiddrmingdrmereverpcpc
ariesdevilb41shBohuTANGClSlaiddrmingdrmereverpcpc
leiyskymergify[bot]PsiACEsandfleesoyeric128sundy-li
leiyskymergify[bot]PsiACEsandfleesoyeric128sundy-li
Xuanwoxudong963youngsofunzhang2014ZhiHanZzhyass
Xuanwoxudong963youngsofunzhang2014ZhiHanZzhyass

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.