Getting the Rows that have the maximum (or minimum) value for a Column within a partition
Another scenario easily done through the ranking functions, is finding the row that contains the max value in a partition
Returning to our original input data set, imagine we want to partition by Vertical and within each vertical find the row that has the maximum value for latency.
Query | Latency | Vertical |
---|---|---|
Banana | 300 | Image |
Cherry | 300 | Image |
Durian | 500 | Image |
Apple | 100 | Web |
Fig | 200 | Web |
Papaya | 200 | Web |
Fig | 300 | Web |
Cherry | 400 | Web |
Durian | 500 | Web |
The desired output for is as follows. As clearly 500 is the maximum latency in both Image and Web
Query | Latency | Vertical |
---|---|---|
Durian | 500 | Image |
Durian | 500 | Web |
The U-SQL that accomplishes this uses ROW_NUMBER
@results =
SELECT
Query,
Latency,
Vertical,
ROW\_NUMBER() OVER (PARTITION BY Vertical ORDER BY Latency DESC) AS rn
FROM @querylog;
@results =
SELECT
Query,
Latency,
Vertical
FROM @results
WHERE rn==1;
To retrieve the row with the minimum value for each partition, in the OVER clause change the DESC to ASC.