turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Is it possible to nest one step within another?

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-11-2015 12:07 PM

I need to create a new dataset that is like an existing one, but with some additional new variables. I understand that this can be done in the DATA step. To calculate some of the new variables, I need a PROC SUMMARY step. But that obviously ends the DATA step, even though it has not yet computed everything I need to compute. The process is as follows:

Part 1. DATA (newdata set olddata)

*creates newdata from olddata, calculating some new variables in it along the way:*

Part 2. PROC SUMMARY

*continues the calcualtion begun in Part 1, summing up some column and storing the sum as a variable*

Part 3. This is where I wish the DATA step in Part 1 could "pick up where it left off" and finish computing the new variables in NEWDATA, using the variable from Part 2 when needed. What is the easiest way of doing this?

Could I perhaps somehow open the dataset created in Part 1 and do the rest of the calculations on it with the variable from Part 2?

What I am doing now is having Part 1 create an intermediate dataset (Int_Data), from which I then compute the variable in Part 2, and then in Part 3 I am thinking of having another DATA step do the remaining calculations, with Int_Data now in the role of "old data" and the final dataset I want as the "new data". Is there some more elegant way of doing this, without intermediate datasets?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to dvtarasov

11-11-2015 12:28 PM

Sort of, but you will still need to read the data twice in the same DATA step. For example:

data want;

do until (done1);

set have end=done1;

** compute variables and sums;

end;

do until (done2);

set have end=done2;

** repeat computations for same variables as above, and use sums in additional calculations;

output;

end;

run;

Other strategies might exist where you may be able to run PROC SUMMARY first, before creating new variables. For example, if you have this as one of your computations:

liquids = 2* pints + 4*quarts;

If you are intending to run PROC SUMMARY to get the total for LIQUIDS, you can work around that. You can run an initial PROC SUMMARY to get the total of PINTS and the total of QUARTS. Then use that in a single DATA step. That sort of a strategy really depends on what the calculations are.

Good luck.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to dvtarasov

11-11-2015 02:10 PM

One *elegant* technique to do this kind of operation is **automatic remerging **within** proc SQL**. Summaries can be used directly in computations involving the original variables. For example, using sashelp.class, calculate the age difference relative to the median age for boys and girls, and keep only those within 1.5 years of the median. All this in a single query :

```
proc sql;
select *, age - median(age) as ageDiff
from sashelp.class
group by sex
having abs(age - median(age)) <= 1.5
order by sex, ageDiff, name;
quit;
```

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to dvtarasov

11-11-2015 03:04 PM

It probably is just a matter of thinking of the problem in a different way. Unless your data is huge it is probably easier to create a simple solution that might process the same data multiple times, but be clearer to understand.

PROC SQL can also help since you can express your calculations in terms of set operators and let the PROC optimize the way that it combines the sets for you.