Hi there
Please assist -I need to achieve the below:
I have 2datasets. I need to check if ID B in dataset1 matches ID_A in dataset 2
and then do a grouping of the HOUR.
EG: If hour
matches or is an hour later than these should be grouped as '1-2 Hours' and another category 'Other Hours'
DATASET 1
ID_A ID_B HOUR
45 12 12
45 12 11
56 89 00
DATASET 2
ID_A ID_B HOUR
12 45 13
12 45 12
45 12 05
89 56 01
DESIRED OUTPUT:
ID_A ID_B HOUR GROUPING
12 45 13 1-2 Hours
12 45 12 1-2 Hours
45 12 05 Other Hours
89 56 01 1-2 Hours
Sorry, I find your desired output data to be confusing. You have both IDA and B in both datasets, and to my understanding, the first two rows of the second table would match to row 1 of dataset, hence giving 5 output rows?
As for code, it should be simple, something like:
proc sql;
create table WANT as
select A.ID as ID_A,
B.ID as ID_B,
A.HOUR,
from DATASET1 A
full join DATASET2 B
on (A.HOUR-1 <= B.HOUR or A.HOUR+1 >= B.HOUR);
quit;
Thank you RW9.
I have corrected dataset2. What I want is basically to check how many times ID_A(dataset2) HOUR matches ID_B(dataset1) HOUR with a +1 hour difference and than how many times it fell out of that category.
DATASET 1
ID_A ID_B HOUR
45 12 12
45 12 11
56 89 00
DATASET 2
ID_A ID_B HOUR
12 45 13
12 45 12
45 12 05
89 56 01
DESIRED OUTPUT:
ID_A ID_B HOUR GROUPING
12 45 13 1-2 Hours
12 45 12 1-2 Hours
12 45 05 Other Hours
89 56 01 1-2 Hours
Yes, its the merge part I am a bit unsure of as the logic which applies to one record also applies in other ways. If your run the below you will see what I mean as there a five output records:
data dataset1;
input ID_A ID_B HOUR;
datalines;
45 12 12
45 12 11
56 89 00
;
run;
data dataset2;
input ID_A ID_B HOUR;
datalines;
12 45 13
12 45 12
45 12 05
89 56 01
;
run;
proc sql;
create table WANT as
select COALESCE(MIN(A.ID_A,A.ID_B),MIN(B.ID_A,B.ID_B)) as ID_A,
COALESCE(MAX(A.ID_A,A.ID_B),MAX(B.ID_A,B.ID_B)) as ID_B,
A.HOUR as HOUR_A,
B.HOUR as HOUR_B,
case when A.HOUR is null then "Other Hours"
else "1-2 Hours" end as GROUPING
from DATASET1 A
right join DATASET2 B
on MIN(A.ID_A,A.ID_B)=MIN(B.ID_A,B.ID_B)
and MAX(A.ID_A,A.ID_B)=MAX(B.ID_A,B.ID_B)
and A.HOUR - 1 <= B.HOUR <= A.HOUR + 1;
qui
Are you doing Multi-Multi matched ?
data dataset1; input ID_A ID_B HOUR; datalines; 45 12 12 45 12 11 56 89 00 ; run; data dataset2; input ID_A ID_B HOUR; datalines; 12 45 13 12 45 12 45 12 05 89 56 01 ; run; data want; merge dataset2(in=ina) dataset1(keep=ID_B HOUR rename=(ID_B=ID_A HOUR=_HOUR)); by ID_A; group=ifc(. lt abs(hour-_hour) lt 3,'1-2 hours ','other hours'); drop _hour; run;
Xia Keshan
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.