BookmarkSubscribeRSS Feed
dvtarasov
Obsidian | Level 7

I need to create a new dataset that is like an existing one, but with some additional new variables. I understand that this can be done in the DATA step. To calculate some of the new variables, I need a PROC SUMMARY step. But that obviously ends the DATA step, even though it has not yet computed everything I need to compute. The process is as follows:

 

Part 1. DATA (newdata set olddata)

creates newdata from olddata, calculating some new variables in it along the way:

 

Part 2. PROC SUMMARY 

continues the calcualtion begun in Part 1, summing up some column and storing the sum as a variable

 

Part 3. This is where I wish the DATA step in Part 1 could "pick up where it left off" and finish computing the new variables in NEWDATA, using the variable from Part 2 when needed. What is the easiest way of doing this? 

 

Could I perhaps somehow open the dataset created in Part 1 and do the rest of the calculations on it with the variable from Part 2? 

What I am doing now is having Part 1 create an intermediate dataset (Int_Data), from which I then compute the variable in Part 2, and then in Part 3 I am thinking of having another DATA step do the remaining calculations, with Int_Data now in the role of "old data" and the final dataset I want as the "new data". Is there some more elegant way of doing this, without intermediate datasets?

3 REPLIES 3
Astounding
PROC Star

Sort of, but you will still need to read the data twice in the same DATA step.  For example:

 

data want;

do until (done1);

   set have end=done1;

   ** compute variables and sums;

 end;

 do until (done2);

      set have end=done2;

      ** repeat computations for same variables as above, and use sums in additional calculations;

      output;

  end;

run;

 

Other strategies might exist where you may be able to run PROC SUMMARY first, before creating new variables.  For example, if you have this as one of your computations:

 

liquids = 2* pints + 4*quarts;

 

If you are intending to run PROC SUMMARY to get the total for LIQUIDS, you can work around that.  You can run an initial PROC SUMMARY to get the total of PINTS and the total of QUARTS.  Then use that in a single DATA step.  That sort of a strategy really depends on what the calculations are.

 

Good luck.

PGStats
Opal | Level 21

One elegant technique to do this kind of operation is automatic remerging within proc SQL. Summaries can be used directly in computations involving the original variables. For example, using sashelp.class, calculate the age difference relative to the median age for boys and girls, and keep only those within 1.5 years of the median. All this in a single query :

 

proc sql;
select *, age - median(age) as ageDiff
from sashelp.class
group by sex
having abs(age - median(age)) <= 1.5
order by sex, ageDiff, name;
quit;
PG
Tom
Super User Tom
Super User

It probably is just a matter of thinking of the problem in a different way.  Unless your data is huge it is probably easier to create a simple solution that might process the same data multiple times, but be clearer to understand.

 

PROC SQL can also help since you can express your calculations in terms of set operators and let the PROC optimize the way that it combines the sets for you.

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1283 views
  • 0 likes
  • 4 in conversation