Filtering Aggregated Rows
We’ll start again with a simple GROUP BY
@output =
SELECT
Region,
SUM(Duration) AS TotalDuration
FROM @searchlog
GROUP BY Region;
| Region | TotalDuration |
|---|---|
| en_ca | 24 |
| en_ch | 10 |
| en_fr | 241 |
| en_gb | 688 |
| en_gr | 305 |
| en_mx | 422 |
| en_us | 8291 |
Filtering with WHERE
You might try WHERE here.
@output =
SELECT
Region,
SUM(Duration) AS TotalDuration
FROM @searchlog
WHERE TotalDuration > 200
GROUP BY Region;
Which will cause an error. Because WHERE can only work on the input columns to the statement, not the output columns
We could use multiple U-SQL Statements to accomplish this
@output =
SELECT
Region,
SUM(Duration) AS TotalDuration
FROM @searchlog
GROUP BY Region;
@output =
SELECT *
FROM @output
WHERE TotalDuration > 200;
Filtering with HAVING
Alternatively , we can use the HAVING clause which is designed to filter columns when a GROUP BY is used..
@output =
SELECT
Region,
SUM(Duration) AS TotalDuration
FROM @searchlog
GROUP BY Region
HAVING SUM(Duration) > 200;
You may have noticed that SUM(Duration) was repeated in the HAVING clause. That’s because HAVING (like WHERE) cannot use columns created in the SELECT clause.