This week in Databend #67
Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .
What's Changed
Below is a list of some major changes that we don't want you to miss.
Exciting New Features ✨
toolchain
- upgrade to 1.67 nightly (#8631)
multiple catalog
- multiple catalog create (planner and catalog manager) (#8620)
compact
- optimize compact for data load (#8644)
planner
- optimize left/single join (#8583)
query
- support copy from xml (#8404)
- add collation (#8610)
- copy files order by last modified time asc (#8628)
- improve sort, 10%~50% faster than the old one (#8452)
new expression
Code Refactor 🎉
format
- refactor output format with FieldEncoders (#8700)
planner
- move plan from query/planner to sql/planner (#8660)
query
storage
- move and group sub-crates in storages (#8613, #8621, #8627, etc.)
- compact segments, which strictly preserves the order of ingestion (#8590)
new expression
Build/Testing/CI Infra Changes 🔌
- rust-toolchain nightly 1.67.0 (nightly-2022-11-07) (#8641)
Thoughtful Bug Fix 🔧
compatibility
- problem when using Trino Mysql connector (#8668)
meta
- emit kv change events after committing a transaction (#8674)
query
News
Let's take a look at what's new at Datafuse Labs & Databend each week.
Support Copy from XML
After #8404 was merged, Databend now offers support for loading data from XML formatted files.
Similar to the use of other formats, in the SQL statement it is only necessary to set the format
option to XML
and an example of using the streaming load API is given below.
curl -sH "insert_sql:insert into test_xml format XML" \
-F "[email protected]/tmp/simple_v1.xml" \
-u root: -XPUT "http://localhost:${QUERY_HTTP_HANDLER_PORT}/v1/streaming_load"
The content of your XML file needs to match one or more of the following types:
- Column names as attributes and column values as attribute values:
<row column1="value1" column2="value2" .../>
- Column names as tags and column values as the content of these tags:
<row>
<column1>value1</column1>
<column2>value2</column2>
</row>
- Column names are the name attributes of
tags, and values are the contents of these tags:
<row>
<field name='column1'>value1</field>
<field name='column2'>value2</field>
</row>
Learn More
Support for Char Collation
After #8610 was merged, Databend now supports setting collation
to select the string encoding to be considered.
By default, collation
is set to 'binary'
, as Databend stores string columns in binary format by default, which you can change to 'utf-8'
with a statement like the following:
set collation = 'utf8';
This may help you to get the expected results when working with non-English strings.
statement query TI
select substr('城区主城区其他', 1, 6), length('我爱中国');
----
城区 12
statement ok
set collation = 'utf8';
statement query TI
select substr('城区主城区其他', 1, 6), length('我爱中国');
----
城区主城区其 4
Learn More
Issues
Meet issues you may be interested in and try to solve it.
Enable Xor Filter Index for IN
Databend introduced the Xor Fliter to replace the Bloom Filter (#7870), which in some scenarios gives about twice the performance improvement and requires very little data to be scanned.
Initially, we simply added this index for the string columns.Then, in #7958, it is enabled for the integer columns.
Now, we want to enable Xor Filter index for IN
.
SELECT * FROM t1 where xx IN ('', '')
Issue 8625: performance: enable xor filter index for IN
If you find it interesting, try to solve it or participate in discussions and PR reviews. Or you can click on https://link.databend.rs/i-m-feeling-lucky to pick up a good first issue, good luck!
Changelogs
You can check the changelogs of Databend nightly to learn about our latest developments.
- v0.8.105-nightly
- v0.8.104-nightly
- v0.8.103-nightly
- v0.8.102-nightly
- v0.8.101-nightly
- v0.8.100-nightly
Contributors
Thanks a lot to the contributors for their excellent work this week.
andylokandy | b41sh | BohuTANG | Chasen-Zhang | ClSlaid | dantengsky |
dependabot[bot] | drmingdrmer | eliasyaoyc | lichuang | mergify[bot] | RinChanNOWWW |
soyeric128 | sundy-li | TCeason | Xuanwo | xudong963 | youngsofun |
zhang2014 | ZhiHanZ |
Meet Us
Please join the DatafuseLabs Community if you are interested in Databend.
We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.
You can submit issues for any problems you find. We also highly appreciate any of your pull requests.
- Databend Website
- Weekly (A weekly newsletter about Databend)
- GitHub Discussions (Feature/Bug reports, Contributions)
- Twitter (Get the news fast)
- Slack Channel (For live discussion with the Community)