Re: Combine Steps

David_Billa · Posted 04-18-2022 05:32 AM

Any help to combine the below two steps into one? I'm trying to understand these two steps which was written by other programmer

DATA WORK.BASELINE;
     MERGE WORK.BASELINE (in=A) WORK.POR (in=B);
     BY PLANT PO;
     IF A=1 AND B=0 THEN OUTPUT;
RUN;

DATA WORK.BASELINE;
     SET WORK.BASELINE (in=A) WORK.POR (in=B);
     BY PLANT PO;
RUN;

benjamin_2018 · Posted 04-18-2022 05:50 AM

First, it's not a good idea to work with the same dataset "Baseline" so many times.

For the merge to work, the datasets Baseline and POR must be sorted by PLANT and PO, for example:

proc sort data=work.baseline;

by PLANT PO;

proc sort data=work.por;

by PLANT PO; run;

The first data step says to merge BASELINE and PO by matching PLANT and PO. If a unique combination of those variables is in BASELINE but not in POR, then those observations are output to BASELINE. (BASELINE is overwritten). To avoid overwriting you can change the first line to anything other than BASELINE, such as data work.baseline1. The first data step is more like a filter, I guess. If unique combinations of PLANT and PO are found in both datasets, then these are not output to BASELINE.

The second data step requires A=1 and B=1 implicitly, so only matching combinations of PLANT and PO in each dataset are written to the output dataset BASELINE. If these are run sequentially, the second data step will find no matching observations because the output from the first data step will have removed any matching combinations of PLANT and PO from BASELINE.

PaigeMiller · Posted 04-18-2022 06:17 AM

Adding to the comment from @benjamin_2018 , suppose you want a new data set named BASELINE2 and another data set named BASELINE3 (instead of two data sets named BASELINE which is impossible, and a bad idea to overwrite the original BASELINE with a new BASELINE, which also won't work the way it is programmed)

DATA BASELINE2 baseline3;
     MERGE WORK.BASELINE (in=A) WORK.POR (in=B);
     BY PLANT PO;
     IF A=1 AND B=0 THEN OUTPUT baseline2;
     output baseline3;
RUN;

--
Paige Miller

Astounding · Posted 04-18-2022 08:46 AM

It is highly unlikely (but remotely possible) that the first step is needed. If it is needed, here is the scenario that it addresses.

Suppose that there is a combination of PLANT/PO that appears in both data sets. Also suppose that the particular combination has more observations in A than in B. Then both steps would be needed to remove the observations in A for that PLANT/PO combination and replace them with the observation(s) in B.

For all other scenarios, the first step is useless and can be removed ... unless you actually removed pieces of the DATA step(s) before posting them.

Combine Steps

Re: Combine Steps

Re: Combine Steps

Re: Combine Steps

Registration is open

SAS Training: Just a Click Away