Ora

How Do I Drop Two Variables in SAS?

Published in SAS Data Management 6 mins read

To efficiently remove two variables from a SAS dataset, you have several straightforward options, including using PROC SQL for in-place modifications or leveraging the DROP or KEEP statements within a DATA step to create a new dataset.

Understanding Variable Management in SAS

Effectively managing variables is a core task in SAS programming. Whether you're streamlining a dataset, preparing for analysis, or removing sensitive information, knowing how to drop variables is essential. SAS provides flexible methods to achieve this, suitable for various scenarios.

Method 1: Dropping Variables with PROC SQL (Recommended for In-Place Modification)

One of the most efficient and safest ways to drop variables from a SAS dataset is by utilizing PROC SQL. This method allows you to alter your dataset directly, removing variables "in place" without necessarily creating a new copy of the dataset. This can be particularly advantageous for very large datasets, as it often saves processing time and disk space.

To drop two variables using PROC SQL, you simply specify the variables by name, separated by commas, within an ALTER TABLE statement.

Syntax:

PROC SQL;
    ALTER TABLE your_library.your_dataset
    DROP variable1, variable2;
QUIT;

Example:

Let's say you have a dataset named MyData in your Work library, and you want to drop the variables Age and Address.

* Create a sample dataset for demonstration;
data Work.MyData;
    input Name $ Age Score Address $;
    datalines;
John 30 95 NewYork
Jane 25 88 London
Mike 40 72 Paris
;
run;

* Drop two variables (Age and Address) using PROC SQL;
PROC SQL;
    ALTER TABLE Work.MyData
    DROP Age, Address;
QUIT;

* View the modified dataset;
PROC PRINT data=Work.MyData;
RUN;

After executing this PROC SQL statement, the Work.MyData dataset will no longer contain the Age and Address variables. The original dataset is modified directly.

Method 2: Dropping Variables Using a DATA Step with the DROP Statement

The DATA step is a fundamental component of SAS programming for creating and manipulating datasets. When you use a DROP statement within a DATA step, SAS creates a new dataset that includes all variables from the original dataset except the ones you've specified to drop.

Syntax:

DATA new_library.new_dataset;
    SET old_library.old_dataset;
    DROP variable1 variable2; /* or DROP variable1, variable2; */
RUN;

Example:

If you want to drop Score and Address from Work.MyData and create a new dataset called Work.MyData_NoScoreAddress:

* Assume Work.MyData exists from previous example (or create it again);
data Work.MyData;
    input Name $ Age Score Address $;
    datalines;
John 30 95 NewYork
Jane 25 88 London
Mike 40 72 Paris
;
run;

* Drop two variables (Score and Address) using a DATA step DROP statement;
DATA Work.MyData_NoScoreAddress;
    SET Work.MyData;
    DROP Score Address; /* Variables can be separated by spaces or commas */
RUN;

* View the new dataset;
PROC PRINT data=Work.MyData_NoScoreAddress;
RUN;

Variations of the DROP statement:

  • DROP variable1 variable2;: Lists variables separated by spaces.
  • DROP variable1, variable2;: Lists variables separated by commas.
  • DROP variable-list;: Can use shorthand for sequential variables (e.g., DROP var1--var5;).
  • DROP = variable1 variable2;: Used as a dataset option on the SET or DATA statement (e.g., DATA new; SET old(DROP=var1 var2); RUN;). This processes faster as the variables are dropped before processing.

Method 3: Dropping Variables Implicitly with the DATA Step KEEP Statement

Instead of specifying what to drop, you can explicitly tell SAS what to keep. All variables not listed in the KEEP statement will be excluded from the new dataset. This method is often preferred when you have many variables to drop but only a few to retain.

Syntax:

DATA new_library.new_dataset;
    SET old_library.old_dataset;
    KEEP variable_to_keep1 variable_to_keep2;
RUN;

Example:

To keep only Name and Age from Work.MyData, effectively dropping Score and Address:

* Assume Work.MyData exists;
data Work.MyData;
    input Name $ Age Score Address $;
    datalines;
John 30 95 NewYork
Jane 25 88 London
Mike 40 72 Paris
;
run;

* Keep only Name and Age, implicitly dropping Score and Address;
DATA Work.MyData_NameAgeOnly;
    SET Work.MyData;
    KEEP Name Age;
RUN;

* View the new dataset;
PROC PRINT data=Work.MyData_NameAgeOnly;
RUN;

Variations of the KEEP statement:

  • KEEP variable1 variable2;: Lists variables separated by spaces.
  • KEEP variable1, variable2;: Lists variables separated by commas.
  • KEEP variable-list;: Can use shorthand for sequential variables (e.g., KEEP var1--var5;).
  • KEEP = variable1 variable2;: Used as a dataset option on the SET or DATA statement (e.g., DATA new; SET old(KEEP=var1 var2); RUN;).

Choosing the Right Method

The best method depends on your specific needs:

Feature PROC SQL ALTER TABLE DROP DATA Step DROP Statement DATA Step KEEP Statement
Output Modifies dataset in place Creates a new dataset Creates a new dataset
Syntax Explicitly lists variables to drop Explicitly lists variables to drop Explicitly lists variables to retain
Efficiency Often faster for large datasets Good for moderate datasets Good for moderate datasets
Use Case When you want to modify original When you need a new dataset with fewer variables When you want only a few specific variables
Readability Clear for dropping specific variables Clear for dropping specific variables Clear for retaining specific variables
  • For in-place modification: Use PROC SQL ALTER TABLE DROP. This is often the most direct approach and efficient for modifying an existing dataset without generating a full copy.
  • For creating a new dataset: Use the DATA step with either DROP or KEEP.
    • Choose DROP when you have few variables to remove and many to keep.
    • Choose KEEP when you have many variables to remove but few to keep.
  • For performance: When dropping many variables, using DROP= or KEEP= as dataset options on the SET statement within a DATA step can be more efficient, as it tells SAS not to load those variables into the Program Data Vector (PDV) in the first place.

Best Practices for Variable Management

  • Backup Your Data: Always consider creating a backup copy of your dataset before performing in-place modifications with PROC SQL, especially if the original dataset is critical.
  • Test on Small Data: When experimenting with new code or complex modifications, test your logic on a small subset of your data first.
  • Document Your Code: Add comments to your SAS code explaining why certain variables are dropped or kept, aiding future understanding and maintenance.
  • Use Descriptive Names: Ensure your new dataset names clearly indicate the changes made (e.g., MyData_NoAge).

For more detailed information, consult the official SAS documentation on DATA step processing and PROC SQL statements.