BookmarkSubscribeRSS Feed
David_Billa
Rhodochrosite | Level 12

Any help to combine the below two steps into one? I'm trying to understand these two steps which was written by other programmer

 

DATA WORK.BASELINE;
     MERGE WORK.BASELINE (in=A) WORK.POR (in=B);
     BY PLANT PO;
     IF A=1 AND B=0 THEN OUTPUT;
RUN;

DATA WORK.BASELINE;
     SET WORK.BASELINE (in=A) WORK.POR (in=B);
     BY PLANT PO;
RUN;
3 REPLIES 3
benjamin_2018
Fluorite | Level 6

First, it's not a good idea to work with the same dataset "Baseline" so many times. 

For the merge to work, the datasets Baseline and POR must be sorted by PLANT and PO, for example:

 

proc sort data=work.baseline;

by PLANT PO;

proc sort data=work.por;

by PLANT PO; run; 

The first data step says to merge BASELINE and PO by matching PLANT and PO. If a unique combination of those variables is in BASELINE but not in POR, then those observations are output to BASELINE. (BASELINE is overwritten). To avoid overwriting you can change the first line to anything other than BASELINE, such as data work.baseline1. The first data step is more like a filter, I guess. If unique combinations of PLANT and PO are found in both datasets, then these are not output to BASELINE. 

The second data step requires A=1 and B=1 implicitly, so only matching combinations of PLANT and PO in each dataset are written to the output dataset BASELINE. If these are run sequentially, the second data step will find no matching observations because the output from the first data step will have removed any matching combinations of PLANT and PO from BASELINE. 

 

PaigeMiller
Diamond | Level 26

Adding to the comment from @benjamin_2018 , suppose you want a new data set named BASELINE2 and another data set named BASELINE3 (instead of two data sets named BASELINE which is impossible, and a bad idea to overwrite the original BASELINE with a new BASELINE, which also won't work the way it is programmed)

 

DATA BASELINE2 baseline3;
     MERGE WORK.BASELINE (in=A) WORK.POR (in=B);
     BY PLANT PO;
     IF A=1 AND B=0 THEN OUTPUT baseline2;
     output baseline3;
RUN;

 

--
Paige Miller
Astounding
PROC Star

It is highly unlikely (but remotely possible) that the first step is needed.  If it is needed, here is the scenario that it addresses.

 

Suppose that there is a combination of PLANT/PO that appears in both data sets.  Also suppose that the particular combination has more observations in A than in B.  Then both steps would be needed to remove the observations in A for that PLANT/PO combination and replace them with the observation(s) in B.

 

For all other scenarios, the first step is useless and can be removed ... unless you actually removed pieces of the DATA step(s) before posting them.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 522 views
  • 0 likes
  • 4 in conversation