Ora

How do I sum all columns in SAS?

Published in SAS Data Manipulation 5 mins read

To sum all columns in SAS, the approach you take depends on whether you want a single sum for each column across all rows, or if you want to sum across specific columns for each individual row to create a new variable. Both are common requirements, and SAS offers straightforward methods for each.

How to Sum All Columns in SAS

Here's how to sum columns in various scenarios, from creating a new variable per row to getting aggregate sums for each column.

1. Summing Across Columns for Each Row (Creating a New Variable)

This is a very common task where you want a new variable in your dataset that represents the sum of several existing columns for each observation (row).

  • Using the SUM function in a DATA step: The most direct and efficient method is to use the SUM function along with the OF keyword within a DATA step. This function conveniently treats missing values as zero in the summation, which is often desired.

    DATA MyNewDataset;
        SET OriginalDataset;
        /* Sum specific columns */
        TotalSales = SUM(OF JanSales, FebSales, MarSales);
    
        /* Sum a range of columns (e.g., Var1, Var2, ..., Var5) */
        TotalScores = SUM(OF Score1-Score5);
    
        /* Sum all numeric columns whose names start with 'Amount' */
        TotalAmounts = SUM(OF Amount:);
    
        /* Sum ALL numeric columns in the current dataset (use with caution) */
        OverallRowSum = SUM(OF _NUMERIC_);
    RUN;

    Explanation:

    • SUM(): This powerful function can take multiple arguments and returns their sum.
    • OF: This keyword allows you to specify a list or range of variables:
      • JanSales, FebSales, MarSales: Explicitly list the column names you wish to sum.
      • Score1-Score5: A shorthand to specify columns Score1, Score2, Score3, Score4, and Score5 (assuming they exist and are ordered as such in the dataset's metadata).
      • Amount:: A prefix list that includes all numeric columns whose names begin with "Amount" (e.g., AmountDue, AmountPaid).
      • _NUMERIC_: A special SAS keyword that refers to all numeric variables currently available in the DATA step.

2. Summing Each Column Individually Across All Rows

If your goal is to compute a single aggregate sum for Column1 (across all rows), another single sum for Column2, and so on, you are looking for an aggregate statistic for each column.

  • Using PROC MEANS or PROC SUMMARY: These procedures are designed for calculating descriptive statistics, including sums, for entire columns. They are highly efficient for this type of aggregation.

    PROC MEANS DATA=OriginalDataset SUM;
        VAR SalesQ1 SalesQ2 SalesQ3; /* Specify the columns to sum */
        OUTPUT OUT=ColumnAggregates SUM=Sum_Q1 Sum_Q2 Sum_Q3;
    RUN;

    Explanation:

    • SUM: This option tells PROC MEANS to calculate the sum statistic.
    • VAR: This statement lists the columns for which you want to calculate the sum.
    • OUTPUT OUT=...: Creates a new dataset containing the calculated sums.
    • SUM=...: Allows you to rename the output sum variables (e.g., SalesQ1's sum becomes Sum_Q1).
  • Using PROC SQL: SQL provides a flexible and powerful way to perform aggregations, including summing columns.

    PROC SQL;
        CREATE TABLE ColumnAggregates_SQL AS
        SELECT
            SUM(SalesQ1) AS Sum_Q1,
            SUM(SalesQ2) AS Sum_Q2,
            SUM(SalesQ3) AS Sum_Q3
        FROM OriginalDataset;
    QUIT;

    Explanation:

    • SUM(ColumnName): This SQL aggregate function directly calculates the sum for each specified column across all rows in the dataset.
    • AS: Used to assign an alias (new name) to the resulting sum column.
  • Using PROC IML with the Subscript Reduction Operator (for Matrix Computations):
    For advanced matrix operations, SAS's Interactive Matrix Language (PROC IML) offers a direct way to sum columns of a matrix. If you load your data into an IML matrix, you can utilize the subscript reduction operator to efficiently compute column sums.

    PROC IML;
        USE OriginalDataset; /* Open the SAS dataset */
        /* Read specific numeric columns into an IML matrix */
        READ ALL VAR {SalesQ1 SalesQ2 SalesQ3} INTO mySalesMatrix;
    
        /* Compute the sum of each column of the matrix */
        ColumnTotals = mySalesMatrix[+,]; /* The [+,] operator sums each column */
    
        PRINT "Sum of Each Column:" ColumnTotals;
    QUIT;

    Explanation:

    • USE OriginalDataset;: This statement opens the specified SAS dataset in PROC IML.
    • READ ALL VAR {SalesQ1 SalesQ2 SalesQ3} INTO mySalesMatrix;: Reads the data from the listed columns and stores it into an IML matrix named mySalesMatrix. Each column from the dataset becomes a column in this matrix.
    • mySalesMatrix[+,];: This is the subscript reduction operator specific to PROC IML. The [+,] syntax computes a row vector where each element represents the sum of the corresponding column in mySalesMatrix. This provides the sum for each column of your matrix efficiently.

Summary of Methods for Summing Columns

Method Purpose SAS Code Example Output Type
DATA Step SUM(OF ...) Sums across specified columns for each row RowTotal = SUM(OF ColA, ColB); New variable in the dataset
PROC MEANS / SUMMARY Sums each column individually across all rows PROC MEANS DATA=... SUM VAR ColA ColB; Output dataset with sums
PROC SQL Sums each column individually across all rows SELECT SUM(ColA) AS SumA, SUM(ColB) AS SumB FROM ...; SQL result set / new table
PROC IML matrix[+,] Sums each column of a matrix (advanced) ColumnSums = myMatrix[+,]; IML matrix (row vector)

By understanding these different methods, you can select the most suitable approach for your specific data analysis needs in SAS.