Changing aggregation in SAS can refer to several different operations, ranging from modifying the definition of an existing aggregated data source within a SAS application's interface to performing new aggregations programmatically using SAS code. The method you choose depends on your specific task and the SAS environment you are using.
Modifying an Existing Aggregated Data Source in a UI
If you are working with an already defined aggregated data source within a visual SAS application (such as SAS Visual Analytics), you can often modify its structure and filtering directly through the user interface. This is particularly useful when you need to adjust the scope or content of a pre-built aggregated dataset without writing code.
Here's a general process for editing an existing aggregated data source:
- Select the Data Source: Navigate to the
Data pane
(or similar data management section) and select the specific aggregated data source you intend to modify. - Access Edit Options: Look for an options or settings icon (often represented by three dots, a gear, or a wrench) associated with your selected aggregated data source. Click this icon, and then choose the
Edit aggregated data
option from the menu that appears. - Adjust Name (Optional): In the edit window, you may have the option to change the
Name
of the aggregated data source for better identification. - Refine Data Items and Filters: This is where you make core changes to the aggregation's definition:
- Add or remove Selected items: Adjust which variables or columns are included in the aggregated output.
- Add a New filter: Apply or modify filters to control which subset of the original data is used for the aggregation, thereby changing the scope of the aggregated results.
- Confirm Changes: Once you have made your desired modifications, click
OK
to apply them and update the aggregated data source.
This process allows for dynamic adjustment of aggregated data definitions managed within the application interface.
Changing Aggregation Programmatically with SAS Code
For more granular control over aggregation methods, grouping variables, and output formats, SAS provides powerful procedural steps. You can change aggregation by specifying different functions and grouping criteria.
Using PROC MEANS
or PROC SUMMARY
PROC MEANS
and PROC SUMMARY
are fundamental procedures for calculating descriptive statistics (aggregations) for entire datasets or for groups within a dataset. PROC SUMMARY
is often preferred when you only need the output dataset and no printed reports.
Key Features:
- Aggregation Functions: You can specify various statistics to compute, such as
SUM
,MEAN
,N
(count of non-missing values),NMISS
(count of missing values),MIN
,MAX
,STD
(standard deviation),VAR
(variance),Q1
(first quartile),MEDIAN
,Q3
(third quartile), and more. - Grouping Variables: Use the
CLASS
statement to define the variables by which you want to group your data. Aggregations will be performed separately for each unique combination of these variables. - Output Control: The
OUTPUT
statement allows you to create a new dataset containing the aggregated results.
Example: Changing Aggregation from Sum to Average for Sales by Region
Let's say you have a dataset sales_data
with Region
, Product
, and SalesAmount
.
/* Original aggregation: Sum of SalesAmount by Region */
PROC MEANS DATA=mydata.sales_data SUM;
CLASS Region;
VAR SalesAmount;
OUTPUT OUT=regional_sum_sales (DROP=_TYPE_ _FREQ_);
RUN;
/* Changing aggregation to Average (Mean) of SalesAmount by Region */
PROC MEANS DATA=mydata.sales_data MEAN;
CLASS Region;
VAR SalesAmount;
OUTPUT OUT=regional_avg_sales (DROP=_TYPE_ _FREQ_);
RUN;
/* Changing aggregation to count of products sold by Region and Product */
PROC MEANS DATA=mydata.sales_data NWAY; /* NWAY ensures output for all CLASS variable combinations */
CLASS Region Product;
VAR SalesAmount; /* VAR statement still needed even for just count */
OUTPUT OUT=regional_product_count (DROP=_TYPE_ _FREQ_) N=ProductCount;
RUN;
In the examples above, simply changing SUM
to MEAN
or N
in the PROC MEANS
statement alters the aggregation type. Adding or removing variables from the CLASS
statement changes the grouping.
Using PROC SQL
PROC SQL
offers a flexible and powerful way to perform aggregations, similar to standard SQL in relational databases. It's particularly useful for complex queries involving joins, subqueries, and conditional aggregations.
Key Features:
- Aggregation Functions: Use
SUM()
,AVG()
,COUNT()
,MIN()
,MAX()
, etc., directly within theSELECT
statement. - Grouping Variables: The
GROUP BY
clause specifies the variables by which to aggregate. - Flexibility: Easily combine multiple aggregation functions and conditions.
Example: Changing Aggregation from Sum to Average for Sales by Region using SQL
/* Original aggregation: Sum of SalesAmount by Region */
PROC SQL;
CREATE TABLE regional_sum_sales_sql AS
SELECT
Region,
SUM(SalesAmount) AS TotalSales
FROM
mydata.sales_data
GROUP BY
Region;
QUIT;
/* Changing aggregation to Average (Mean) of SalesAmount by Region */
PROC SQL;
CREATE TABLE regional_avg_sales_sql AS
SELECT
Region,
AVG(SalesAmount) AS AverageSales
FROM
mydata.sales_data
GROUP BY
Region;
QUIT;
/* Changing aggregation to count of distinct products by Region */
PROC SQL;
CREATE TABLE regional_distinct_product_count AS
SELECT
Region,
COUNT(DISTINCT Product) AS DistinctProductCount
FROM
mydata.sales_data
GROUP BY
Region;
QUIT;
Here, changing SUM(SalesAmount)
to AVG(SalesAmount)
or COUNT(DISTINCT Product)
directly controls the aggregation method. Modifying the GROUP BY
clause changes the grouping logic.
Comparison of Aggregation Procedures
Feature | PROC MEANS / PROC SUMMARY | PROC SQL |
---|---|---|
Ease of Use | Excellent for standard descriptive statistics | Powerful for complex, SQL-like aggregations |
Aggregation Fns | SUM , MEAN , N , STD , MIN , MAX , etc. |
SUM() , AVG() , COUNT() , MIN() , MAX() , etc. |
Grouping | CLASS statement |
GROUP BY clause |
Output | Creates new dataset with OUTPUT statement |
Creates new table with CREATE TABLE AS SELECT |
Flexibility | Good for standard aggregations | High, supports complex queries and joins |
Changing Default Aggregation in Visualizations
In interactive reporting tools like SAS Visual Analytics, when you drag a measure onto a report canvas, it often defaults to a specific aggregation (e.g., Sum for quantitative variables). You can typically change this default aggregation for a specific visualization directly on the canvas.
To do this:
- Select the Measure: In your report, right-click on the measure in the visualization (e.g., a bar chart showing "Sales Amount").
- Choose Aggregation: From the context menu, you will see an
Aggregation
orMeasure Aggregation
option. - Select New Type: Choose the desired aggregation type from the list (e.g., Average, Count, Minimum, Maximum). This will immediately update how that measure is displayed in the current visualization.
By understanding these different approaches, you can effectively change aggregation in SAS to meet your data analysis and reporting needs.