Hi all, I'm trying to figure out a way to combine a dataset back onto itself, while merging on rows only when the ID variable doesn't match. For instance, let's say the dataset HAVE has 30 rows, with ID = 1 to 30, and a bunch of other variables. I want to merge HAVE back onto itself, so that the row with ID = 1 now becomes 29 rows, with each of the 29 taking on the values from ID = 2 to 30 for the other variables. Then, the row that originally had ID = 2 would take on the values from ID = 1 and then ID = 3 to 30. And so on.
Initially, I was able to do this with hashing:
data WANT;
if _N_=1 then do;
declare hash h(dataset: 'HAVE(rename=(ID=CompID A=CompA B=CompB))') ;
declare hiter iter('h') ;
h.defineKey('CompID') ;
h.defineData('CompID','CompA','CompB') ;
h.defineDone() ;
call missing(CompID, CompA, CompB);
end;
set HAVE ;
by ID ;
HashCheck = iter.first();
do while (HashCheck = 0);
if ID^=CompID then do;
output;
end;
HashCheck = iter.next();
end;
run;
That worked when ID and A and B were all numeric. However, then I needed to also bring on C and D (as CompC and CompD), which were character variables, and it no longer worked, with the error message:
ERROR: Type mismatch for data variable CompC at line 1237 column 5.
ERROR: Hash data set load failed at line 1237 column 5.
ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.
I thought I might be able to do it with something kind of like a DoW-loop, or something similar where there was a second SET statement within a DO loop, but that wasn't working either. It would output all 29 observations for the first value of ID from the base dataset, but then nothing else.
I might just end up doing a CROSS JOIN or something like that in PROC SQL. However, it's been bugging me that I can't figure out a way to do it in a DATA step. So, I thought I would ask if anyone had a suggestion for how it could be done in a DATA step.
Thanks!
... View more