You can efficiently find a substring within a variable in SAS using several powerful string functions and operators, primarily the FIND
, INDEX
, CONTAINS
, and PRXMATCH
functions. These tools allow you to locate specific text patterns within character variables in your datasets.
Understanding Substring Search in SAS
SAS provides robust capabilities for manipulating and searching character strings, which is crucial for data cleaning, transformation, and analysis. When you need to determine if a particular sequence of characters (a substring) exists within a larger text string (your variable), or to find its exact position, SAS offers a range of dedicated functions.
Key SAS Functions for Substring Search
Here are the primary methods for finding substrings in SAS, each with its own advantages:
1. The FIND
Function
The FIND
function is a versatile tool for locating substrings. It searches a string for the first occurrence of the specified substring and returns the starting position of that substring. If the substring is not found in the string, FIND
returns a value of 0. It is case-sensitive by default but can be made case-insensitive.
-
Syntax:
FIND(string, substring <, modifiers>);
string
: The character variable or literal to search within.substring
: The character string or literal to search for.modifiers
(optional):'i'
or'I'
: Performs a case-insensitive search.'t'
or'T'
: Trims trailing blanks from thestring
andsubstring
.
-
Example:
DATA search_example; text_var = 'SAS Programming is fun!'; position_find = FIND(text_var, 'Programming'); position_find_case_insensitive = FIND(text_var, 'programming', 'i'); position_not_found = FIND(text_var, 'SQL'); PUT 'FIND Position (case-sensitive): ' position_find; PUT 'FIND Position (case-insensitive): ' position_find_case_insensitive; PUT 'FIND Position (not found): ' position_not_found; RUN;
- Output:
FIND Position (case-sensitive): 5 FIND Position (case-insensitive): 5 FIND Position (not found): 0
- Output:
2. The INDEX
Function
The INDEX
function is very similar to FIND
and often used interchangeably. It also returns the starting position of the first occurrence of a substring within a string. Like FIND
, it returns 0 if the substring is not found.
-
Syntax:
INDEX(string, substring);
string
: The character variable or literal to search within.substring
: The character string or literal to search for.
-
Key Difference from FIND:
INDEX
is always case-sensitive and does not have built-in modifiers for case insensitivity or trimming. For case-insensitive searches withINDEX
, you would typically convert both the string and substring to the same case usingUPCASE
orLOWCASE
functions before applyingINDEX
. -
Example:
DATA search_example_index; description = 'Advanced SAS Analytics'; pos_analytics = INDEX(description, 'Analytics'); pos_sas_case_sensitive = INDEX(description, 'sas'); /* Will not find 'sas' */ pos_sas_case_insensitive = INDEX(UPCASE(description), 'SAS'); /* Use UPCASE for case insensitivity */ PUT 'INDEX Position (Analytics): ' pos_analytics; PUT 'INDEX Position (sas case-sensitive): ' pos_sas_case_sensitive; PUT 'INDEX Position (SAS case-insensitive): ' pos_sas_case_insensitive; RUN;
- Output:
INDEX Position (Analytics): 13 INDEX Position (sas case-sensitive): 0 INDEX Position (SAS case-insensitive): 9
- Output:
3. The CONTAINS
Operator
For a simple check to see if a substring exists within a variable (a boolean check), the CONTAINS
operator is highly convenient. It returns TRUE
(1) if the substring is found, and FALSE
(0) if it is not found. It's often used in IF
statements or WHERE
clauses.
-
Syntax:
string CONTAINS substring
-
Behavior: It is case-sensitive. To make it case-insensitive, use
UPCASE
orLOWCASE
. -
Example:
DATA product_check; product_name = 'SAS Viya Platform'; IF product_name CONTAINS 'Viya' THEN category = 'Cloud Product'; ELSE category = 'Other'; product_name2 = 'SAS Base Software'; IF UPCASE(product_name2) CONTAINS 'SOFTWARE' THEN /* Case-insensitive check */ category2 = 'Core Product'; ELSE category2 = 'Other'; PUT 'Category for Viya: ' category; PUT 'Category for Base Software: ' category2; RUN;
- Output:
Category for Viya: Cloud Product Category for Base Software: Core Product
- Output:
4. The PRXMATCH
Function (Regular Expressions)
For more complex pattern matching, including wildcards, character classes, and quantifiers, the PRXMATCH
function is invaluable. It uses Perl-compatible regular expressions (PCRE). It returns the position of the first match or 0 if no match is found.
-
Syntax:
PRXMATCH(regular_expression, string);
regular_expression
: A regular expression pattern enclosed in forward slashes/pattern/
with optional modifiers (e.g.,/pattern/i
for case-insensitive).string
: The character variable or literal to search within.
-
Example:
DATA regex_search; email_address = '[email protected]'; phone_number = 'Call 555-123-4567 for support.'; /* Find if it looks like an email address (contains @ and .) */ is_email = PRXMATCH('/@.+\./', email_address); /* Pattern for '@' followed by chars, then '.' */ /* Find if a 10-digit phone number pattern exists (case-insensitive for 'call') */ is_phone = PRXMATCH('/call \d{3}-\d{3}-\d{4}/i', phone_number); /* \d{n} for n digits */ PUT 'Is Email: ' is_email; PUT 'Is Phone: ' is_phone; RUN;
- Output:
Is Email: 1 Is Phone: 1
- Output:
Comparison of Substring Search Functions
Function | Purpose | Case Sensitivity | Returns | Regular Expressions |
---|---|---|---|---|
FIND |
Position of first occurrence, with modifiers | Default: Yes (Can be No with 'i' modifier) | Position (0 if not found) | No |
INDEX |
Position of first occurrence | Yes | Position (0 if not found) | No |
CONTAINS |
Boolean check for existence | Yes | 1 (true) or 0 (false) | No |
PRXMATCH |
Position of first regex match | Default: Yes (Can be No with 'i' modifier) | Position (0 if not found) | Yes |
Practical Considerations
- Case Sensitivity: Always be mindful of case. Use
UPCASE()
orLOWCASE()
forINDEX
andCONTAINS
if you need a case-insensitive search.FIND
andPRXMATCH
offer direct modifiers. - Trailing Blanks: Character variables in SAS often have fixed lengths, leading to trailing blanks. If your substring search is failing unexpectedly, consider using the
TRIM()
function on the variable being searched, or the't'
modifier withFIND
. - Performance: For simple existence checks,
CONTAINS
is generally efficient. For positional information or complex patterns,FIND
,INDEX
, andPRXMATCH
are your go-to options.PRXMATCH
can be powerful but might have a higher overhead for very simple tasks. - Multiple Occurrences: These functions typically find only the first occurrence. To find all occurrences, you would usually combine them with
SUBSTR()
and loop through the string.
By understanding these functions and operators, you can effectively search for and identify substrings within your SAS variables, enabling robust data manipulation and analysis.