Hi there
Please assist -I need to achieve the below:
I have 2datasets. I need to check if ID B in dataset1 matches ID_A in dataset 2
and then do a grouping of the HOUR.
EG: If hour
matches or is an hour later than these should be grouped as '1-2 Hours' and another category 'Other Hours'
DATASET 1
ID_A ID_B HOUR
45 12 12
45 12 11
56 89 00
DATASET 2
ID_A ID_B HOUR
12 45 13
12 45 12
45 12 05
89 56 01
DESIRED OUTPUT:
ID_A ID_B HOUR GROUPING
12 45 13 1-2 Hours
12 45 12 1-2 Hours
45 12 05 Other Hours
89 56 01 1-2 Hours
Sorry, I find your desired output data to be confusing. You have both IDA and B in both datasets, and to my understanding, the first two rows of the second table would match to row 1 of dataset, hence giving 5 output rows?
As for code, it should be simple, something like:
proc sql;
create table WANT as
select A.ID as ID_A,
B.ID as ID_B,
A.HOUR,
from DATASET1 A
full join DATASET2 B
on (A.HOUR-1 <= B.HOUR or A.HOUR+1 >= B.HOUR);
quit;
Thank you RW9.
I have corrected dataset2. What I want is basically to check how many times ID_A(dataset2) HOUR matches ID_B(dataset1) HOUR with a +1 hour difference and than how many times it fell out of that category.
DATASET 1
ID_A ID_B HOUR
45 12 12
45 12 11
56 89 00
DATASET 2
ID_A ID_B HOUR
12 45 13
12 45 12
45 12 05
89 56 01
DESIRED OUTPUT:
ID_A ID_B HOUR GROUPING
12 45 13 1-2 Hours
12 45 12 1-2 Hours
12 45 05 Other Hours
89 56 01 1-2 Hours
Yes, its the merge part I am a bit unsure of as the logic which applies to one record also applies in other ways. If your run the below you will see what I mean as there a five output records:
data dataset1;
input ID_A ID_B HOUR;
datalines;
45 12 12
45 12 11
56 89 00
;
run;
data dataset2;
input ID_A ID_B HOUR;
datalines;
12 45 13
12 45 12
45 12 05
89 56 01
;
run;
proc sql;
create table WANT as
select COALESCE(MIN(A.ID_A,A.ID_B),MIN(B.ID_A,B.ID_B)) as ID_A,
COALESCE(MAX(A.ID_A,A.ID_B),MAX(B.ID_A,B.ID_B)) as ID_B,
A.HOUR as HOUR_A,
B.HOUR as HOUR_B,
case when A.HOUR is null then "Other Hours"
else "1-2 Hours" end as GROUPING
from DATASET1 A
right join DATASET2 B
on MIN(A.ID_A,A.ID_B)=MIN(B.ID_A,B.ID_B)
and MAX(A.ID_A,A.ID_B)=MAX(B.ID_A,B.ID_B)
and A.HOUR - 1 <= B.HOUR <= A.HOUR + 1;
qui
Are you doing Multi-Multi matched ?
data dataset1; input ID_A ID_B HOUR; datalines; 45 12 12 45 12 11 56 89 00 ; run; data dataset2; input ID_A ID_B HOUR; datalines; 12 45 13 12 45 12 45 12 05 89 56 01 ; run; data want; merge dataset2(in=ina) dataset1(keep=ID_B HOUR rename=(ID_B=ID_A HOUR=_HOUR)); by ID_A; group=ifc(. lt abs(hour-_hour) lt 3,'1-2 hours ','other hours'); drop _hour; run;
Xia Keshan
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.