BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
pavank
Quartz | Level 8

Hi guys,

Which best for optimization code 

KEEP statement 

DROP statement

Could you explain in PDV execution 

If we use drop input stack read drop varibales or not  which is best 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

If the definition of "optimization" includes reducing the amount of typing involved then it might be time to examine the names of the variables. The Keep and Drop statements and data set options will accept lists of variables which in SAS can be done simply in 3 ways. Two of them work better carefully thought out names.

 

1)  varstem:  where varstem is commons start to the names of multiple variables. The : immediately afterwards tells SAS "use all variables whose name starts with varstem"

2) var<numeric suffix>  . If variables with a common name differing only by a numeric suffix and in sequence then you can use the list form varxx - varyy  where xx and yy indicate the first and last suffix without a gap in the sequence desired. If you can provide multiple sequential lists   var3-var15 var18-var24 for example to get some of them.

3) the --  list which uses the position in the data set .   Varone --  someothervar  selects all of the variables in data set position order that are adjacent.

 

AND you can mix combinations of these lists in a single Keep or Drop.

 

Also many of the data step functions that will accept multiple variables as parameters such as MAX, MIN, Call Missing, Coalesce, Coalescec, Whichn, Whichc, Largest, Smallest and others will allow a list as one of the parameters by prefacing the list with the keyword OF.

 

y = max(of var1-var25);

 

Note that the ARRAY statement will accept any of the list forms or mixtures for use with existing variables. If you are using an Array statement to create new variables on the second list type is accepted.

View solution in original post

4 REPLIES 4
SASJedi
Ammonite | Level 13

There is no difference in computer efficiency between the DROP and KEEP statements. The difference is in the efficiency of the computer programmer - that is, in how long it takes you to write your code. For example, consider reading a table with 100 columns (COL1-COL100). If you only want to include COL1, COL30, and COL50 in the output, it would be much more efficient from a typing perspective to write

KEEP COL1 COL30 COL50;

then to write: 

DROP COL2-COL29 COL32-COL49 COL51-COL100;

Conversely, if you wanted all of the columns except COL1, COL30, and COL50 in the output, it would be much more efficient from a typing perspective to write

DROP COL1 COL30 COL50;

then to write: 

KEEP COL2-COL29 COL32-COL49 COL51-COL100;

However, using the KEEP= or DROP= dataset option, you may be able to write code that runs more efficiently and produces the same result. See Jedi SAS Tricks: 5 Ways to Make Your SAS Code Run Faster for more details.

Check out my Jedi SAS Tricks for SAS Users
andreas_lds
Jade | Level 19

The statements control which variable will be present in the created data set. I don't think that they have a different impact on performance, if nearly the same number of variables is listed.

Astounding
PROC Star

The most important concept is, I'm sure, covered in @SASJedi 's link.  Just to isolate it, there is a difference between variables read in, vs. variables saved.  Compare:

 

data want;
   set have (keep=a b c);
run;

data want (keep=a b c);
   set have;
run;

The top DATA step reads in just the variables needed so the PDV contains only a, b, and c.  The bottom data step reads in all the variables, which could be hundreds of variables.  Then it saves only a few.  All variables from HAVE would be in the PDV, but only three would be written to WANT.

ballardw
Super User

If the definition of "optimization" includes reducing the amount of typing involved then it might be time to examine the names of the variables. The Keep and Drop statements and data set options will accept lists of variables which in SAS can be done simply in 3 ways. Two of them work better carefully thought out names.

 

1)  varstem:  where varstem is commons start to the names of multiple variables. The : immediately afterwards tells SAS "use all variables whose name starts with varstem"

2) var<numeric suffix>  . If variables with a common name differing only by a numeric suffix and in sequence then you can use the list form varxx - varyy  where xx and yy indicate the first and last suffix without a gap in the sequence desired. If you can provide multiple sequential lists   var3-var15 var18-var24 for example to get some of them.

3) the --  list which uses the position in the data set .   Varone --  someothervar  selects all of the variables in data set position order that are adjacent.

 

AND you can mix combinations of these lists in a single Keep or Drop.

 

Also many of the data step functions that will accept multiple variables as parameters such as MAX, MIN, Call Missing, Coalesce, Coalescec, Whichn, Whichc, Largest, Smallest and others will allow a list as one of the parameters by prefacing the list with the keyword OF.

 

y = max(of var1-var25);

 

Note that the ARRAY statement will accept any of the list forms or mixtures for use with existing variables. If you are using an Array statement to create new variables on the second list type is accepted.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1809 views
  • 5 likes
  • 5 in conversation