Ora

How Do I Change Aggregation in SAS?

Published in SAS Data Aggregation 6 mins read

Changing aggregation in SAS can refer to several different operations, ranging from modifying the definition of an existing aggregated data source within a SAS application's interface to performing new aggregations programmatically using SAS code. The method you choose depends on your specific task and the SAS environment you are using.

Modifying an Existing Aggregated Data Source in a UI

If you are working with an already defined aggregated data source within a visual SAS application (such as SAS Visual Analytics), you can often modify its structure and filtering directly through the user interface. This is particularly useful when you need to adjust the scope or content of a pre-built aggregated dataset without writing code.

Here's a general process for editing an existing aggregated data source:

  1. Select the Data Source: Navigate to the Data pane (or similar data management section) and select the specific aggregated data source you intend to modify.
  2. Access Edit Options: Look for an options or settings icon (often represented by three dots, a gear, or a wrench) associated with your selected aggregated data source. Click this icon, and then choose the Edit aggregated data option from the menu that appears.
  3. Adjust Name (Optional): In the edit window, you may have the option to change the Name of the aggregated data source for better identification.
  4. Refine Data Items and Filters: This is where you make core changes to the aggregation's definition:
    • Add or remove Selected items: Adjust which variables or columns are included in the aggregated output.
    • Add a New filter: Apply or modify filters to control which subset of the original data is used for the aggregation, thereby changing the scope of the aggregated results.
  5. Confirm Changes: Once you have made your desired modifications, click OK to apply them and update the aggregated data source.

This process allows for dynamic adjustment of aggregated data definitions managed within the application interface.

Changing Aggregation Programmatically with SAS Code

For more granular control over aggregation methods, grouping variables, and output formats, SAS provides powerful procedural steps. You can change aggregation by specifying different functions and grouping criteria.

Using PROC MEANS or PROC SUMMARY

PROC MEANS and PROC SUMMARY are fundamental procedures for calculating descriptive statistics (aggregations) for entire datasets or for groups within a dataset. PROC SUMMARY is often preferred when you only need the output dataset and no printed reports.

Key Features:

  • Aggregation Functions: You can specify various statistics to compute, such as SUM, MEAN, N (count of non-missing values), NMISS (count of missing values), MIN, MAX, STD (standard deviation), VAR (variance), Q1 (first quartile), MEDIAN, Q3 (third quartile), and more.
  • Grouping Variables: Use the CLASS statement to define the variables by which you want to group your data. Aggregations will be performed separately for each unique combination of these variables.
  • Output Control: The OUTPUT statement allows you to create a new dataset containing the aggregated results.

Example: Changing Aggregation from Sum to Average for Sales by Region

Let's say you have a dataset sales_data with Region, Product, and SalesAmount.

/* Original aggregation: Sum of SalesAmount by Region */
PROC MEANS DATA=mydata.sales_data SUM;
    CLASS Region;
    VAR SalesAmount;
    OUTPUT OUT=regional_sum_sales (DROP=_TYPE_ _FREQ_);
RUN;

/* Changing aggregation to Average (Mean) of SalesAmount by Region */
PROC MEANS DATA=mydata.sales_data MEAN;
    CLASS Region;
    VAR SalesAmount;
    OUTPUT OUT=regional_avg_sales (DROP=_TYPE_ _FREQ_);
RUN;

/* Changing aggregation to count of products sold by Region and Product */
PROC MEANS DATA=mydata.sales_data NWAY; /* NWAY ensures output for all CLASS variable combinations */
    CLASS Region Product;
    VAR SalesAmount; /* VAR statement still needed even for just count */
    OUTPUT OUT=regional_product_count (DROP=_TYPE_ _FREQ_) N=ProductCount;
RUN;

In the examples above, simply changing SUM to MEAN or N in the PROC MEANS statement alters the aggregation type. Adding or removing variables from the CLASS statement changes the grouping.

Using PROC SQL

PROC SQL offers a flexible and powerful way to perform aggregations, similar to standard SQL in relational databases. It's particularly useful for complex queries involving joins, subqueries, and conditional aggregations.

Key Features:

  • Aggregation Functions: Use SUM(), AVG(), COUNT(), MIN(), MAX(), etc., directly within the SELECT statement.
  • Grouping Variables: The GROUP BY clause specifies the variables by which to aggregate.
  • Flexibility: Easily combine multiple aggregation functions and conditions.

Example: Changing Aggregation from Sum to Average for Sales by Region using SQL

/* Original aggregation: Sum of SalesAmount by Region */
PROC SQL;
    CREATE TABLE regional_sum_sales_sql AS
    SELECT
        Region,
        SUM(SalesAmount) AS TotalSales
    FROM
        mydata.sales_data
    GROUP BY
        Region;
QUIT;

/* Changing aggregation to Average (Mean) of SalesAmount by Region */
PROC SQL;
    CREATE TABLE regional_avg_sales_sql AS
    SELECT
        Region,
        AVG(SalesAmount) AS AverageSales
    FROM
        mydata.sales_data
    GROUP BY
        Region;
QUIT;

/* Changing aggregation to count of distinct products by Region */
PROC SQL;
    CREATE TABLE regional_distinct_product_count AS
    SELECT
        Region,
        COUNT(DISTINCT Product) AS DistinctProductCount
    FROM
        mydata.sales_data
    GROUP BY
        Region;
QUIT;

Here, changing SUM(SalesAmount) to AVG(SalesAmount) or COUNT(DISTINCT Product) directly controls the aggregation method. Modifying the GROUP BY clause changes the grouping logic.

Comparison of Aggregation Procedures

Feature PROC MEANS / PROC SUMMARY PROC SQL
Ease of Use Excellent for standard descriptive statistics Powerful for complex, SQL-like aggregations
Aggregation Fns SUM, MEAN, N, STD, MIN, MAX, etc. SUM(), AVG(), COUNT(), MIN(), MAX(), etc.
Grouping CLASS statement GROUP BY clause
Output Creates new dataset with OUTPUT statement Creates new table with CREATE TABLE AS SELECT
Flexibility Good for standard aggregations High, supports complex queries and joins

Changing Default Aggregation in Visualizations

In interactive reporting tools like SAS Visual Analytics, when you drag a measure onto a report canvas, it often defaults to a specific aggregation (e.g., Sum for quantitative variables). You can typically change this default aggregation for a specific visualization directly on the canvas.

To do this:

  1. Select the Measure: In your report, right-click on the measure in the visualization (e.g., a bar chart showing "Sales Amount").
  2. Choose Aggregation: From the context menu, you will see an Aggregation or Measure Aggregation option.
  3. Select New Type: Choose the desired aggregation type from the list (e.g., Average, Count, Minimum, Maximum). This will immediately update how that measure is displayed in the current visualization.

By understanding these different approaches, you can effectively change aggregation in SAS to meet your data analysis and reporting needs.