turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- "Efficient" form for SAS data step

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-21-2016 10:03 PM - last edited on 06-21-2016 10:32 PM by Reeza

Some of the websites out there talk of using the "most efficient" form for writing a program. Please take a look at the following:

```
data SASData__N (keep = Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
set Moments_Combined (where = (Label1='N'));
run;
```

I was very glad to simply get the above program to run. Does it look "efficient" enough? Should I attempt to try some other route?

Thanks!

Nicholas Kormanik

Accepted Solutions

Solution

06-22-2016
12:18 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to NicholasKormanik

06-21-2016 10:50 PM

Moving KEEP= to the SET statement will make the program run faster:

```
data SASData__N;
set Moments_Combined (where = (Label1='N') keep=Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
```

run;

Assuming that there are other variables in the incoming data set, the SET statement no longer has to read in those other variables.

In most cases, WHERE runs faster than a subsetting IF. The most important factors are the percentage of observations that meet the WHERE condition, and the number of variables in the data set. If most observations have Label1='N', you could try this and compare the speed:

```
data SASData__N;
set Moments_Combined (keep=Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
```

if Label1='N';

run;

It is likely to be slower, but running it is the sure way to find out.

Finally, if your data sets are small, you might not be able to measure which technique is faster. You would probably need a minimum of tens of thousands of observations to notice a difference (possibly more than that).

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to NicholasKormanik

06-21-2016 10:32 PM - edited 06-21-2016 10:33 PM

The only efficiency here is adding the WHERE clause to your SET statement.

The KEEP on the output vs in the datastep doesn't make a difference AFAIK.

PS. Please post your code in a code block to make it easier to read. See the little running man icon on the editor.

Solution

06-22-2016
12:18 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to NicholasKormanik

06-21-2016 10:50 PM

Moving KEEP= to the SET statement will make the program run faster:

```
data SASData__N;
set Moments_Combined (where = (Label1='N') keep=Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
```

run;

Assuming that there are other variables in the incoming data set, the SET statement no longer has to read in those other variables.

In most cases, WHERE runs faster than a subsetting IF. The most important factors are the percentage of observations that meet the WHERE condition, and the number of variables in the data set. If most observations have Label1='N', you could try this and compare the speed:

```
data SASData__N;
set Moments_Combined (keep=Combo_Plus i1_X i2_X Label1 nValue1 Label2 nValue2);
```

if Label1='N';

run;

It is likely to be slower, but running it is the sure way to find out.

Finally, if your data sets are small, you might not be able to measure which technique is faster. You would probably need a minimum of tens of thousands of observations to notice a difference (possibly more than that).

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Astounding

06-22-2016 12:21 AM

Thank you Astounding and Reeza. So glad I asked, and you were so available. I tried experimenting with other formats, but failed. Great to have the answer.

Nicholas

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to NicholasKormanik

06-22-2016 12:20 AM

Moving Keep to SET only works if you're not calculating new variables which it appears you aren't in this case.

Another way to make this faster, if required, may be to use PROC COPY because it doesn't read the data set line by line but can copy it over in blocks. It doesn't look like you're doing any processing in the step.