Ora

How Do I Filter Data in SAS Studio?

Published in SAS Data Management 3 mins read

Filtering data in SAS Studio is a fundamental operation that allows you to work with specific subsets of your datasets, enhancing analysis efficiency and precision. You can achieve this through two primary methods: interactive visual filtering within the SAS Studio interface or by writing SAS code.

Visual Filtering in SAS Studio

For quick filtering without writing code, SAS Studio provides an intuitive visual interface, often found within data viewers, query builders, or specific tasks. This method is ideal for users who prefer a point-and-click approach.

Here's how to apply visual filters:

  1. Access the Filter Tab: When viewing a table or working within a task that supports filtering (like the "Filter and Sort" task or a data grid view), locate and click the Filter tab.
  2. Select Columns: From the Columns area, drag one or more columns that you wish to use for filtering into the filter section.
  3. Define Your Filter Condition: In the filter window that appears for the selected column:
    • Choose an Operator: Select a comparison operator (e.g., =, >, CONTAINS) from the operator drop-down list.
    • Specify a Value: If the chosen operator requires a value (such as = or <), enter or select the desired value in the Value box. You might be able to select from a list of unique values, especially for categorical data.
  4. Add More Conditions: You can add multiple filter conditions for different columns or apply more complex logic (e.g., combining conditions with AND/OR) depending on the specific task or interface.

Once applied, the data display will update to show only the rows that satisfy your specified criteria.

Code-Based Filtering in SAS Studio

For more advanced, reproducible, and robust filtering, using SAS code is the most powerful method. SAS Studio provides a robust code editor where you can write and execute DATA steps and PROC steps with WHERE statements or WHERE clauses.

Filtering in a DATA Step (WHERE Statement)

The WHERE statement within a DATA step is used to select observations (rows) from an existing dataset to create a new dataset. This is perfect for creating refined subsets of your data for further processing or analysis.

Example: Filtering sashelp.cars to include only electric cars manufactured in the USA.

DATA Work.US_Electric_Cars;
    SET sashelp.cars;
    WHERE Origin = 'USA' AND Type = 'Electric';
RUN;

PROC PRINT DATA=Work.US_Electric_Cars;
    TITLE "US-Made Electric Cars";
RUN;

In this example:

  • SET sashelp.cars; reads data from the built-in sashelp.cars dataset.
  • WHERE Origin = 'USA' AND Type = 'Electric'; selects only those observations where the Origin is 'USA' and the Type is 'Electric'.

Filtering with PROC SQL (WHERE Clause)

PROC SQL allows you to query SAS datasets using SQL syntax, offering a flexible way to filter and manipulate data. The WHERE clause in PROC SQL functions similarly to its counterpart in a DATA step, selecting rows based on specified conditions.

Example: Using PROC SQL to find all cars with an MPG_City greater than 30.

PROC SQL;
    CREATE TABLE Work.High_MPG_City AS
    SELECT Make, Model, Type, MPG_City
    FROM sashelp.cars
    WHERE MPG_City > 30;
QUIT;

PROC PRINT DATA=Work.High_MPG_City;
    TITLE "Cars with MPG City > 30";
RUN;

Here:

  • SELECT ... FROM ... specifies the columns and the source dataset.
  • WHERE MPG_City > 30 filters the results to include only cars with MPG_City values exceeding 30.

Filtering in PROC Steps (WHERE Statement)

Many SAS procedures (PROCs) support a WHERE statement to filter the data before the procedure performs its analysis or display. This is incredibly useful for analyzing specific subsets without creating new datasets.

Example: Printing only sedans from Europe, or calculating the average horsepower for specific car types.

PROC PRINT DATA=sashelp.cars;
    WHERE Origin = 'Europe' AND Type = 'Sedan';
    VAR Make Model Type Origin;
    TITLE "European Sedans";
RUN;

PROC MEANS DATA=sashelp.cars MAXDEC=1;
    WHERE Type IN ('Sedan', 'SUV');
    VAR Horsepower Weight;
    CLASS Type;
    TITLE "Average Horsepower and Weight for Sedans and SUVs";
RUN;

In the PROC PRINT example, only European sedans are displayed. In the PROC MEANS example, the statistics are calculated only for observations where the Type is either 'Sedan' or 'SUV'.

Common Comparison Operators for Filtering

Regardless of whether you're using a DATA step, PROC SQL, or a PROC step's WHERE statement, these are some frequently used comparison operators:

Operator Description Example
= Equal to Age = 30
< Less than Sales < 1000
> Greater than Rating > 4.5
<= Less than or equal to Score <= 90
>= Greater than or equal to Price >= 50
NE or ^= Not equal to Status NE 'Pending'
IN Value is in a list Region IN ('East', 'West')
CONTAINS Contains a substring (case-insensitive) Name CONTAINS 'John'
LIKE Matches a pattern (case-sensitive for exact match; use % for wildcards) Product LIKE 'A%'
IS NULL Value is missing EndDate IS NULL

You can combine multiple conditions using logical operators like AND and OR to create complex filters (e.g., WHERE City = 'New York' AND Sales > 5000).

Best Practices for Filtering

  • Be Specific: Clearly define your filtering criteria to ensure you're selecting the correct subset of data.
  • Test Your Filters: Especially with complex WHERE clauses, test your filters on smaller datasets or with PROC PRINT to verify they return the expected results.
  • Optimize for Performance: For very large datasets, place WHERE statements strategically. In DATA steps, placing WHERE after SET is generally efficient.
  • Use Parentheses: When combining AND and OR operators, use parentheses to control the order of evaluation and avoid ambiguity (e.g., WHERE (Age > 18 AND Gender = 'F') OR Income > 50000;).

Mastering both visual and code-based filtering techniques in SAS Studio will significantly enhance your data manipulation and analysis capabilities.