R2 SQL now supports set operations (UNION, INTERSECT, EXCEPT) and SELECT DISTINCT, expanding the range of analytical queries you can run directly on Apache Iceberg tables in R2 Data Catalog.
Set operations
Combine the results of multiple SELECT statements:
UNION— returns all rows from both queries, removing duplicatesUNION ALL— returns all rows from both queries, including duplicatesINTERSECT— returns only rows that appear in both queriesEXCEPT— returns rows from the first query that do not appear in the second
-- Find zones that had either firewall blocks OR high-risk requestsSELECT zone_id FROM my_namespace.firewall_events WHERE action = 'block'UNIONSELECT zone_id FROM my_namespace.http_requests WHERE risk_score > 0.8
-- Find zones with both firewall blocks AND high trafficSELECT zone_id FROM my_namespace.firewall_events WHERE action = 'block'INTERSECTSELECT zone_id FROM my_namespace.http_requestsGROUP BY zone_idHAVING COUNT(*) > 10000
-- Find enterprise zones that have not been compactedSELECT zone_id FROM my_namespace.zones WHERE plan = 'enterprise'EXCEPTSELECT zone_id FROM my_namespace.compaction_history
Select distinct
Eliminate duplicate rows from query results:
SELECT DISTINCT region, departmentFROM my_namespace.sales_dataWHERE total_amount > 1000ORDER BY region, departmentLIMIT 100
For large datasets where approximate results are acceptable, approx_distinct() remains a faster alternative for counting unique values.
For the full syntax reference, refer to the SQL reference. For performance guidance, refer to Limitations and best practices.
Leave a Reply