Grouping and Aggregation

Grouping, in essence, collapses multiple rows into single rows based on some criteria. Hand-in-hand with performing a grouping operation, some fields in the output rowset must be aggregated into some meaningful value (or discarded if no possible or meaningful aggregation can be done).

We can witness this behavior by building up to it in stages.

// list all session durations.  
@output =  
  SELECT Duration  
  FROM @searchlog;

This creates a simple list of integers.

Duration
73
614
74
24
1213
241
502
60
1270
610
422
283
305
10
612
1220
691
63
30
119
732
183
630

Now, let's add all the numbers together. This yields a RowSet with
exactly one row and one column.

// Find the total duration for all sessions combined
@output =  
  SELECT  
    SUM(Duration) AS TotalDuration  
  FROM @searchlog;
Duration
9981

Now let's use the GROUP BY operator to break apart the totals by
Region.

// find the total Duration by Region
@output =
  SELECT
    Region,
    SUM(Duration) AS TotalDuration
  FROM searchlog
  GROUP BY Region;

This returns:

en_ca 24
en_ch 10
en_fr 241
en_gb 688
en_gr 305
en_mx 422
en_us 8291

This is a good opportunity to explore a common use of the HAVING operator. We can use HAVING to restrict the output RowSet to those rows that have aggregate values we are interested in. For example, perhaps we want to find all the Regions where total dwell time is above some value.

// find all the Regions where the total dwell time is > 200
@output =
  SELECT
    Region,
    SUM(Duration) AS TotalDuration
  FROM @searchlog
  GROUP BY Region
  HAVING TotalDuration > 200;
en-fr 241
en-gb 688
en-gr 305
en-mx 422
en-us 8291
// Count the number of total sessions.
@output =
  SELECT
    COUNT() AS NumSessions
  FROM @searchlog;
23

Count the number of total sessions by Region.

@output =
  SELECT
    COUNT() AS NumSessions,
    Region
  FROM @searchlog
  GROUP BY Region;
1 en_ca
1 en_ch
1 en_fr
2 en_gb
1 en_gr
1 en_mx
16 en_us

Count the number of total sessions by Region and include total duration for that language.

@output =
  SELECT
    COUNT() AS NumSessions,
    Region,
    SUM(Duration) AS TotalDuration,
    AVG(Duration) AS AvgDwellTtime,
    MAX(Duration) AS MaxDuration,
    MIN(Duration) AS MinDuration
    FROM @searchlog
  GROUP BY Region;
NumSessions Region TotalDuration AvgDuration MaxDuration MinDuration
1 en_ca 24 24 24 24
1 en_ch 10 10 10 10
1 en_fr 241 241 241 241
2 en_gb 688 344 614 74
1 en_gr 305 305 305 305
1 en_mx 422 422 422 422
16 en_us 8291 518.1875 1270 30

Data types coming from aggregate functions

You should be aware of how some aggregation operators deal with data types. Some aggregations will promote a numeric type to a "larger" type.

For example:

  • SUM(floatexpr) -> double
  • SUM(doubleexpr) -> double
  • SUM(intexpr) -> long
  • SUM(byteexpr) -> long

DISTINCT with Aggregates

Every aggregate function can take a DISTINCT qualifier.

For example

COUNT(DISTINCT x)

Notes

Aggregates can ONLY appear in a SELECT clause.

results matching ""

    No results matching ""