This week in Databend #59
Databend is an open source elastic and reliable Modern Cloud Data Warehouse, it offers blazing fast query and combines elasticity, simplicity, low cost of the cloud, built to make the Data Cloud easy.
Below is a list of some major changes that we don't want you to miss.
Exciting New Features ✨
- Idempotent Copy (#7541)
- new RPC to echo client ip (#7538)
- save table stage file info into meta, remove these data when truncate table (#7558)
- add grpc API
- add clustering_history system table (#7535)
- abstract active instance counting (#7545)
Code Refactor 🎉
- remove redundant
ActionHandler; move logic into
- replace recursion for fast-path insert with loop (#7530)
- always list from OpenDAL instead of meta (#7547)
- fix set operation err format (#7575)
Build/Testing/CI Infra Changes 🔌
Thoughtful Bug Fix 🔧
- change generated alias name for scalar expression to lowercase (#7525)
- add missing EOI (#7534)
- stop tasks in cluster when select limit (#7542)
scan_progressshould be incr before prewhere filter (#7566)
- fix ceil return type (#7520)
Let's take a look at what's new at Datafuse Labs & Databend each week.
RFC: Idempotent Copy
When streaming copy stage files into a table, there is a chance that some files have already been copied, So it needs some ways to avoid duplicate copying files, make it an
- Save copy into table stage files meta information in meta service
- Avoiding duplicates when copy stage files into a table
Learn more: https://databend.rs/doc/contributing/rfcs/idempotent-copy
Databend Perf with Ontime JOIN
With several recent patches, Databend can fully support Ontime JOIN queries, so you can now also see them in the Databend Perf dashboard.
SELECT Carrier, c, c2, c*100/c2 as c3 FROM( SELECT IATA_CODE_Reporting_Airline AS Carrier, count(*) AS c FROM ontime WHERE DepDelay>10 AND Year=2007 GROUP BY Carrier) q JOIN ( SELECT IATA_CODE_Reporting_Airline AS Carrier, count(*) AS c2 FROM ontime WHERE Year=2007 GROUP BY Carrier ) qq USING (Carrier) ORDER BY c3 DESC;
SELECT Carrier, c, c2, c*100/c2 as c3 FROM( SELECT IATA_CODE_Reporting_Airline AS Carrier, count(*) AS c FROM ontime WHERE DepDelay>10 AND Year>=2000 AND Year<=2008 GROUP BY Carrier) q JOIN ( SELECT IATA_CODE_Reporting_Airline AS Carrier, count(*) AS c2 FROM ontime WHERE Year>=2000 AND Year<=2008 GROUP BY Carrier ) qq USING (Carrier) ORDER BY c3 DESC;
SELECT Year, c1/c2 FROM( select Year, count(*)*100 as c1 from ontime WHERE DepDelay>10 GROUP BY Year) q JOIN ( select Year, count(*) as c2 from ontime GROUP BY Year ) qq USING (Year) ORDER BY Year;
View dashboard: https://perf.databend.rs/
You can check the changelogs of Databend nightly to learn about our latest developments.
Thanks a lot to the contributors for their excellent work this week.
Please join the DatafuseLabs Community if you are interested in Databend.
We are looking forward to seeing you try our code. We have a strong team behind you to ensure a smooth experience in trying our code for your projects. If you are a hacker passionate about database internals, feel free to play with our code.
You can submit issues for any problems you find. We also highly appreciate any of your pull requests.
- Databend Website
- Weekly (A weekly newsletter about Databend)
- GitHub Discussions (Feature/Bug reports, Contributions)
- Twitter (Get the news fast)
- Slack Channel (For live discussion with the Community)
- I'm feeling lucky (Pick up a good first issue now!)