Filtering data in SAS Studio is a fundamental operation that allows you to work with specific subsets of your datasets, enhancing analysis efficiency and precision. You can achieve this through two primary methods: interactive visual filtering within the SAS Studio interface or by writing SAS code.
Visual Filtering in SAS Studio
For quick filtering without writing code, SAS Studio provides an intuitive visual interface, often found within data viewers, query builders, or specific tasks. This method is ideal for users who prefer a point-and-click approach.
Here's how to apply visual filters:
- Access the Filter Tab: When viewing a table or working within a task that supports filtering (like the "Filter and Sort" task or a data grid view), locate and click the Filter tab.
- Select Columns: From the Columns area, drag one or more columns that you wish to use for filtering into the filter section.
- Define Your Filter Condition: In the filter window that appears for the selected column:
- Choose an Operator: Select a comparison operator (e.g.,
=
,>
,CONTAINS
) from the operator drop-down list. - Specify a Value: If the chosen operator requires a value (such as
=
or<
), enter or select the desired value in the Value box. You might be able to select from a list of unique values, especially for categorical data.
- Choose an Operator: Select a comparison operator (e.g.,
- Add More Conditions: You can add multiple filter conditions for different columns or apply more complex logic (e.g., combining conditions with AND/OR) depending on the specific task or interface.
Once applied, the data display will update to show only the rows that satisfy your specified criteria.
Code-Based Filtering in SAS Studio
For more advanced, reproducible, and robust filtering, using SAS code is the most powerful method. SAS Studio provides a robust code editor where you can write and execute DATA
steps and PROC
steps with WHERE
statements or WHERE
clauses.
Filtering in a DATA Step (WHERE
Statement)
The WHERE
statement within a DATA
step is used to select observations (rows) from an existing dataset to create a new dataset. This is perfect for creating refined subsets of your data for further processing or analysis.
Example: Filtering sashelp.cars
to include only electric cars manufactured in the USA.
DATA Work.US_Electric_Cars;
SET sashelp.cars;
WHERE Origin = 'USA' AND Type = 'Electric';
RUN;
PROC PRINT DATA=Work.US_Electric_Cars;
TITLE "US-Made Electric Cars";
RUN;
In this example:
SET sashelp.cars;
reads data from the built-insashelp.cars
dataset.WHERE Origin = 'USA' AND Type = 'Electric';
selects only those observations where theOrigin
is 'USA' and theType
is 'Electric'.
Filtering with PROC SQL (WHERE
Clause)
PROC SQL
allows you to query SAS datasets using SQL syntax, offering a flexible way to filter and manipulate data. The WHERE
clause in PROC SQL
functions similarly to its counterpart in a DATA
step, selecting rows based on specified conditions.
Example: Using PROC SQL
to find all cars with an MPG_City greater than 30.
PROC SQL;
CREATE TABLE Work.High_MPG_City AS
SELECT Make, Model, Type, MPG_City
FROM sashelp.cars
WHERE MPG_City > 30;
QUIT;
PROC PRINT DATA=Work.High_MPG_City;
TITLE "Cars with MPG City > 30";
RUN;
Here:
SELECT ... FROM ...
specifies the columns and the source dataset.WHERE MPG_City > 30
filters the results to include only cars withMPG_City
values exceeding 30.
Filtering in PROC Steps (WHERE
Statement)
Many SAS procedures (PROCs) support a WHERE
statement to filter the data before the procedure performs its analysis or display. This is incredibly useful for analyzing specific subsets without creating new datasets.
Example: Printing only sedans from Europe, or calculating the average horsepower for specific car types.
PROC PRINT DATA=sashelp.cars;
WHERE Origin = 'Europe' AND Type = 'Sedan';
VAR Make Model Type Origin;
TITLE "European Sedans";
RUN;
PROC MEANS DATA=sashelp.cars MAXDEC=1;
WHERE Type IN ('Sedan', 'SUV');
VAR Horsepower Weight;
CLASS Type;
TITLE "Average Horsepower and Weight for Sedans and SUVs";
RUN;
In the PROC PRINT
example, only European sedans are displayed. In the PROC MEANS
example, the statistics are calculated only for observations where the Type
is either 'Sedan' or 'SUV'.
Common Comparison Operators for Filtering
Regardless of whether you're using a DATA
step, PROC SQL
, or a PROC
step's WHERE
statement, these are some frequently used comparison operators:
Operator | Description | Example |
---|---|---|
= |
Equal to | Age = 30 |
< |
Less than | Sales < 1000 |
> |
Greater than | Rating > 4.5 |
<= |
Less than or equal to | Score <= 90 |
>= |
Greater than or equal to | Price >= 50 |
NE or ^= |
Not equal to | Status NE 'Pending' |
IN |
Value is in a list | Region IN ('East', 'West') |
CONTAINS |
Contains a substring (case-insensitive) | Name CONTAINS 'John' |
LIKE |
Matches a pattern (case-sensitive for exact match; use % for wildcards) |
Product LIKE 'A%' |
IS NULL |
Value is missing | EndDate IS NULL |
You can combine multiple conditions using logical operators like AND
and OR
to create complex filters (e.g., WHERE City = 'New York' AND Sales > 5000
).
Best Practices for Filtering
- Be Specific: Clearly define your filtering criteria to ensure you're selecting the correct subset of data.
- Test Your Filters: Especially with complex
WHERE
clauses, test your filters on smaller datasets or withPROC PRINT
to verify they return the expected results. - Optimize for Performance: For very large datasets, place
WHERE
statements strategically. InDATA
steps, placingWHERE
afterSET
is generally efficient. - Use Parentheses: When combining
AND
andOR
operators, use parentheses to control the order of evaluation and avoid ambiguity (e.g.,WHERE (Age > 18 AND Gender = 'F') OR Income > 50000;
).
Mastering both visual and code-based filtering techniques in SAS Studio will significantly enhance your data manipulation and analysis capabilities.