BookmarkSubscribeRSS Feed
David_Billa
Rhodochrosite | Level 12

Any help to combine the below two steps into one? I'm trying to understand these two steps which was written by other programmer

 

DATA WORK.BASELINE;
     MERGE WORK.BASELINE (in=A) WORK.POR (in=B);
     BY PLANT PO;
     IF A=1 AND B=0 THEN OUTPUT;
RUN;

DATA WORK.BASELINE;
     SET WORK.BASELINE (in=A) WORK.POR (in=B);
     BY PLANT PO;
RUN;
3 REPLIES 3
benjamin_2018
Fluorite | Level 6

First, it's not a good idea to work with the same dataset "Baseline" so many times. 

For the merge to work, the datasets Baseline and POR must be sorted by PLANT and PO, for example:

 

proc sort data=work.baseline;

by PLANT PO;

proc sort data=work.por;

by PLANT PO; run; 

The first data step says to merge BASELINE and PO by matching PLANT and PO. If a unique combination of those variables is in BASELINE but not in POR, then those observations are output to BASELINE. (BASELINE is overwritten). To avoid overwriting you can change the first line to anything other than BASELINE, such as data work.baseline1. The first data step is more like a filter, I guess. If unique combinations of PLANT and PO are found in both datasets, then these are not output to BASELINE. 

The second data step requires A=1 and B=1 implicitly, so only matching combinations of PLANT and PO in each dataset are written to the output dataset BASELINE. If these are run sequentially, the second data step will find no matching observations because the output from the first data step will have removed any matching combinations of PLANT and PO from BASELINE. 

 

PaigeMiller
Diamond | Level 26

Adding to the comment from @benjamin_2018 , suppose you want a new data set named BASELINE2 and another data set named BASELINE3 (instead of two data sets named BASELINE which is impossible, and a bad idea to overwrite the original BASELINE with a new BASELINE, which also won't work the way it is programmed)

 

DATA BASELINE2 baseline3;
     MERGE WORK.BASELINE (in=A) WORK.POR (in=B);
     BY PLANT PO;
     IF A=1 AND B=0 THEN OUTPUT baseline2;
     output baseline3;
RUN;

 

--
Paige Miller
Astounding
PROC Star

It is highly unlikely (but remotely possible) that the first step is needed.  If it is needed, here is the scenario that it addresses.

 

Suppose that there is a combination of PLANT/PO that appears in both data sets.  Also suppose that the particular combination has more observations in A than in B.  Then both steps would be needed to remove the observations in A for that PLANT/PO combination and replace them with the observation(s) in B.

 

For all other scenarios, the first step is useless and can be removed ... unless you actually removed pieces of the DATA step(s) before posting them.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1197 views
  • 0 likes
  • 4 in conversation