This Week in Databend #78

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

Special Note: This Week in Databend will be gradually migrated to the Databend Blog. We will keep the content in sync until the final migration is complete.

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

SQL

  • eliminate extra group by scalars (#9706)

Query

  • add privilege check for insert/delete/optimize (#9664)
  • enable empty projection (#9675)
  • add aggregate limit in final aggregate stage (#9716)
  • add optional column names to create/alter view statement (#9715)

Storage

  • add prewhere support in native storage format (#9600)

Code Refactoring πŸŽ‰

IO

  • move io constants to common/io (#9700)
  • refine fuse/io/read (#9711)

Planner

  • rename Scalar to ScalarExpr (#9665)

Storage

  • refactor cache layer (#9672)
  • pruner.rs -> fuse_bloom_pruner.rs (#9710)
  • make pruner hierarchy to chain (#9714)

Build/Testing/CI Infra Changes πŸ”Œ

  • support setup minio storage & external s3 storage in docker image (#9676)

Bug Fixes πŸ”§

Expression

  • fix missing simple_cast (#9671)

Query

  • fix efficiently_memory_final_aggregator result is not stable (#9685)
  • fix max_result_rows only limit output results nums (#9661)
  • fix query hang in two level aggregator (#9694)

Storage

  • may get wrong datablocks if not sorted by output schema (#9470)
  • bloom filter is using wrong cache key (#9706)

What's On In Databend

Stay connected with the latest news about Databend.

Databend All-in-One Docker Image

Databend Docker Image now supports setting up MinIO storage and external AWS S3 storage.

Now you can easily use a Docker image for your first experiment with Databend.***

Run with MinIO as backend

docker run \
    -p 8000:8000 \
    -p 9000:9000 \
    -e MINIO_ENABLED=true \
    datafuselabs/databend

Run with self managed query config

docker run \
    -p 8000:8000 \
    -e DATABEND_QUERY_CONFIG_FILE=/etc/databend/mine.toml \
    -v query_config_file:/etc/databend/mine.toml \
    datafuselabs/databend

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Vector search captures the meaning and context of unstructured data, and is commonly used for text or image processing, enabling the use of semantics to find similar results and obtain more valid results than traditional keyword retrieval.

Databend plans to provide users with a richer and more efficient means of querying by supporting vector search, and the introduction of Faiss Index may be an initial solution.

Issue 9699: feat: vector search (Faiss index)

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

We're gearing up for the v0.9 release of Databend. Stay tuned.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilb41shBohuTANGdantengskydependabot[bot]
andylokandyariesdevilb41shBohuTANGdantengskydependabot[bot]
everpcpcflaneur2020johnhaxx7leiyskymergify[bot]PsiACE
everpcpcflaneur2020johnhaxx7leiyskymergify[bot]PsiACE
RinChanNOWWWsandfleesundy-lixudong963zhang2014zhyass
RinChanNOWWWsandfleesundy-lixudong963zhang2014zhyass

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

This Week in Databend #77

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

Special Note: This Week in Databend will be gradually migrated to the Databend Blog. We will keep the content in sync until the final migration is complete.

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Meta

  • use expressin::TableSchema to replace obsolete datavalues::DataSchema (#9506)
  • iter() iterate every tree and every records in theses trees (#9621)

Expression

  • add other base geo functions (#9588)

Optimizer

  • improve cardinality estimation for join based on histogram (#9594)

Planner

  • improve join reorder algorithm (#9571)

Query

  • support insert with placeholder (#9575)
  • set setting support expr (#9574)
  • add information_schema for sharding-jdbc (#9583)
  • support named params for table functions (#9630)

Storage

  • read_parquet page index (#9563)
  • update interpreter and storage support (#9261)

Code Refactoring πŸŽ‰

  • refine on_error mode (#9473)

Meta

  • remove unused meta types and conversion util (#9584)

Parser

  • more strict parser for format_options (#9635)

Expression

  • rearrange common_expression and common_function (#9585)

Build/Testing/CI Infra Changes πŸ”Œ

  • run sqllogictests with binary (#9603)

Bug Fixes πŸ”§

Expression

  • constant folder should run repeatly until stable (#9572)
  • check_date() and to_string(boolean) may panic (#9561)

Planner

  • fix stack overflow when applying RuleFilterPushDownJoin (#9645)

Storage

  • fix range filter read stat with index (#9619)

Sqllogictest

  • sqllogic test hangs (cluster mod + clickhouse handler) (#9615)

What's On In Databend

Stay connected with the latest news about Databend.

Upgrade Databend Query from 0.8 to 0.9

Databend-query-0.9 introduces incompatible changes in metadata, these metadata has to be manually migrated. Databend provides a program for this job: databend-meta-upgrade-09, which you can find in a release package or can be built from source.

Upgrade

databend-meta-upgrade-09 --cmd upgrade --raft-dir "<./your/raft-dir/>"

Learn More

Release Proposal: Nightly v1.0

The call for proposals for the release of v1.0 is now open.

The preliminary plan is to release in March, mainly focusing on alter table, update, and group by spill.

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Add Type Checker for Sqllogictest

We can check if each row's each element's type is correct.

databend/tests/sqllogictests/src/client/mysql_client.rs

 // Todo: add types to compare 
 Ok(DBOutput::Rows { 
     types, 
     rows: parsed_rows, 

Issue 9647: Feature: Add type checker for sqllogictest

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

We're gearing up for the v0.9 release of Databend. Stay tuned.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilb41shBohuTANGdantengskydrmingdrmer
andylokandyariesdevilb41shBohuTANGdantengskydrmingdrmer
everpcpcleiyskymergify[bot]PsiACERinChanNOWWWsoyeric128
everpcpcleiyskymergify[bot]PsiACERinChanNOWWWsoyeric128
sundy-liTCeasonXuanwoxudong963youngsofunyufan022
sundy-liTCeasonXuanwoxudong963youngsofunyufan022
zhang2014zhyass
zhang2014zhyass

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

This Week in Databend #76

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

Special Note: This Week in Databend will be gradually migrated to the Databend Blog. We will keep the content in sync until the final migration is complete.

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Meta

  • add reader-min-msg-ver and msg-min-reader-ver in proto-conv (#9535)

Planner

  • support tuple.1 and get(1)(tuple) (#9493)
  • support display estimated rows in EXPLAIN (#9528)

Query

  • efficiently memory two level group by in standalone mode (#9504)

Storage

  • support nested type in read_parquet (#9486)
  • add build options table (#9502)

Code Refactoring πŸŽ‰

  • merge new expression (#9411)
  • remove and rename crates (#9481)
  • bump rust version (#9540)

Expression

  • move negative funtions to binder (#9484)
  • use error_to_null() to eval try_cast (#9545)

Functions

  • replace h3ron to h3o (#9553)

Format

  • extract AligningStateTextBased (#9472)
  • richer error context (#9534)

Query

  • use ctx to store the function evaluation error (#9501)
  • refactor map access to support view read tuple inner (#9516)

Storage

  • bump opendal for streaming read support (#9503)
  • refactor bloom index to use vectorized siphash function (#9542)

Bug Fixes πŸ”§

HashTable

  • fix memory leak for unsized hash table (#9551)

Storage

  • fix row group stats collection (#9537)

What's On In Databend

Stay connected with the latest news about Databend.

New Year, New Expression!

We're so thrilled to tell you that Databend now fully works with New Expression after more than a half year of dedicated work. New Expression introduces a formal type system to Databend and supports type-safe downward casting , making the definition of functions easier.

New Expression is still being tuned, and a new version (v0.9) of Databend will be released once the tuning work is complete.

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

UNNEST Function

The UNNEST function takes an array as a parameter, and returns a table containing each element of the array in a row.

Syntax

UNNEST(ARRAY) [WITH OFFSET]

If you're interested in becoming a contributor, helping us develop the UNNEST function would be a good start.

Issue 9549: Feature: Support unnest

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

We're gearing up for the v0.9 release of Databend. Stay tuned.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilb41shBohuTANGClSlaiddantengsky
andylokandyariesdevilb41shBohuTANGClSlaiddantengsky
dependabot[bot]drmingdrmereverpcpcflaneur2020leiyskymergify[bot]
dependabot[bot]drmingdrmereverpcpcflaneur2020leiyskymergify[bot]
PsiACERinChanNOWWWsoyeric128sundy-liTCeasonwubx
PsiACERinChanNOWWWsoyeric128sundy-liTCeasonwubx
Xuanwoxudong963youngsofunzhang2014
Xuanwoxudong963youngsofunzhang2014

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

This Week in Databend #75

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

Special Note: This Week in Databend will be gradually migrated to the Databend Blog. We will keep the content in sync until the final migration is complete.

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Format

  • basic output format JSON (#9447)

Query

  • check connection params (#9437)
  • add max_query_row_nums (#9406)

Storage

  • support prewhere in hive (#9427)
  • add generic cache trait for different object reader (#9436)
  • add metrics for new cache (#9445)

New Expression

  • migrate hash func to func-v2 (#9402)

Sqllogictest

  • run all tests in parallel (#9400)

Code Refactoring πŸŽ‰

Storage

  • add to_bytes and from_bytes for CachedObject (#9439)
  • better table-meta and parquet reader function (#9434)
  • convert fuse_snapshot unit tests to sqlloigc test (#9428)

Bug Fixes πŸ”§

Format

  • catch unwind when read split (#9420)

User

Planner

  • create Stage URL's path should ends with / (#9450)

What's On In Databend

Stay connected with the latest news about Databend.

Databend 2022 Recap

Let's look back and see how Databend did in 2022.

  • Open source: got 2,000+ stars, merged 2,400+ PRs, and solved 1,900 issues.
  • From data warehouse to lakehouse: Brand-new design with enhanced capabilities.
  • Rigorous testing: SQL Logic Tests, SQLancer, and https://perf.databend.rs.
  • Building the ecosystem: More customers chose, trusted, and grew with Databend, including Kuaishou and SAP.
  • Databend Cloud: Built on top of Databend, the next big data analytics platform.

We wish everyone a Happy New Year and look forward to engaging with you.

Learn More

Databend 2023 Roadmap

As the new year approaches, Databend is also actively planning its roadmap for 2023.

We will continue to polish the Planner and work on data and query caching. Enhancing storage and query issues for PB-level data volumes is also on our list.

Try Databend and join the roadmap discussion.

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Profile-Guided Optimization (PGO)

The basic concept of PGO is to collect data about the typical execution of a program (e.g. which branches it is likely to take) and then use this data to inform optimizations such as inlining, machine-code layout, register allocation, etc.

rustc supports doing profile-guided optimization (PGO). We expect to be able to use it to enhance the build.

Issue 9387: Feature: Add PGO Support

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

ariesdevilBohuTANGdantengskydependabot[bot]everpcpcflaneur2020
ariesdevilBohuTANGdantengskydependabot[bot]everpcpcflaneur2020
hantmacleiyskymergify[bot]PsiACEsandfleesoyeric128
hantmacleiyskymergify[bot]PsiACEsandfleesoyeric128
sundy-liTCeasonXuanwoxudong963youngsofunzhang2014
sundy-liTCeasonXuanwoxudong963youngsofunzhang2014

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.

This week in Databend #74

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

Special Note: This Week in Databend will be gradually migrated to the Databend Blog. We will keep the content in sync until the final migration is complete.

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

Meta

  • remove stream when a watch client is dropped (#9334)

Planner

  • support selectivity estimation for range predicates (#9398)

Query

  • support copy on error (#9312)
  • support databend-local (#9282)
  • external storage support location part prefix (#9381)

Storage

  • rangefilter support in (#9330)
  • try to improve object storage io read (#9335)
  • supprot table compression (#9370)

Metrics

  • add more metrics for fuse compact and block write (#9399)

Sqllogictest

  • add no-fail-fast support (#9391)

Code Refactoring πŸŽ‰

*

  • adopt rustls entirely, removing all deps to native-tls (#9358)

Format

  • remove format_xxx settings (#9360)
  • adjust interface of FileFormatOptionsExt (#9395)

Planner

  • remove SyncTypeChecker (#9352)

Query

  • split fuse source to read data and deserialize (#9353)
  • avoid io copy in read parquet data (#9365)
  • add uncompressed buffer for parquet reader (#9379)

Storage

  • add read/write settings (#9359)

Bug Fixes πŸ”§

Format

  • fix align_flush with header only (#9327)

Settings

  • use logical CPU number as default value of num_cpus (#9396)

Processors

  • the data type on both sides of the union does not match (#9361)

HTTP Handler

  • false alarm (warning log) about query not exists (#9380)

Sqllogictest

  • refactor sqllogictest http client and fix expression string like (#9363)

What's On In Databend

Stay connected with the latest news about Databend.

Introducing databend-local​

Inspired by clickhouse-local, databend-local allows you to perform fast processing on local files, without the need of launching a Databend cluster.

> export CONFIG_FILE=tests/local/config/databend-local.toml
> cargo run --bin=databend-local -- --sql="SELECT * FROM tbl1" --table=tbl1=/path/to/databend/docs/public/data/books.parquet

exec local query: SELECT * FROM tbl1
+------------------------------+---------------------+------+
| title                        | author              | date |
+------------------------------+---------------------+------+
| Transaction Processing       | Jim Gray            | 1992 |
| Readings in Database Systems | Michael Stonebraker | 2004 |
| Transaction Processing       | Jim Gray            | 1992 |
| Readings in Database Systems | Michael Stonebraker | 2004 |
+------------------------------+---------------------+------+
4 rows in set. Query took 0.015 seconds.

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Compressing Short Strings​

When processing the same queries with short strings involved, Databend usually reads more data than other databases, such as Snowflake.

SELECT SearchPhrase, MIN(URL), COUNT(*) AS c FROM hits WHERE URL LIKE '%google%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10;

Such queries might be more efficient if short strings (URLs, etc) are compressed.

Issue 9001: performance: compressing for short strings

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilBohuTANGdantengskydrmingdrmereastfisher
andylokandyariesdevilBohuTANGdantengskydrmingdrmereastfisher
everpcpcleiyskymergify[bot]PsiACERinChanNOWWWsoyeric128
everpcpcleiyskymergify[bot]PsiACERinChanNOWWWsoyeric128
sundy-liXuanwoxudong963youngsofunzhang2014zhyass
sundy-liXuanwoxudong963youngsofunzhang2014zhyass

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and share your thoughts.