DATA Step, Macro, Functions and more

Find matching records and group

Reply
Contributor
Posts: 36

Find matching records and group

Hi there

Please assist -I need to achieve the below:

I have 2datasets. I need to check if ID B in dataset1 matches ID_A in dataset 2
and then do a grouping of the
HOUR.

EG: If hour
matches or is an hour later than these should be grouped as
'1-2 Hours' and another category 'Other Hours'

  

DATASET 1

ID_A   ID_B  HOUR

45          12        12

45          12        11

56          89        00

  

DATASET 2

ID_A  ID_B  HOUR

  

12          45        13

12          45        12

45          12        05

89          56        01

DESIRED OUTPUT:

 

ID_A   ID_B  HOUR  GROUPING

12          45        13         1-2 Hours

12          45        12         1-2 Hours

45          12        05    Other Hours

89          56        01         1-2 Hours

Super User
Super User
Posts: 7,392

Re: Find matching records and group

Sorry, I find your desired output data to be confusing.  You have both IDA and B in both datasets, and to my understanding, the first two rows of the second table would match to row 1 of dataset, hence giving 5 output rows?

As for code, it should be simple, something like:

proc sql;

     create table WANT as

     select     A.ID as ID_A,

                   B.ID as ID_B,

                   A.HOUR,

     from       DATASET1 A

     full join   DATASET2 B

     on          (A.HOUR-1 <= B.HOUR or A.HOUR+1 >= B.HOUR);

quit;

Contributor
Posts: 36

Re: Find matching records and group

Thank you RW9.

I have corrected dataset2. What I want is basically to check how many times ID_A(dataset2) HOUR matches ID_B(dataset1) HOUR with a +1 hour difference and than how many times it fell out of that category.

DATASET 1

ID_A   ID_B  HOUR

45          12        12

45          12        11

56          89        00

  

DATASET 2

ID_A  ID_B  HOUR

  

12          45        13

12          45        12

45          12        05

89          56        01

DESIRED OUTPUT:

 

ID_A   ID_B  HOUR  GROUPING

12          45        13         1-2 Hours

12          45        12         1-2 Hours

12          45        05    Other Hours

89          56        01         1-2 Hours

Super User
Super User
Posts: 7,392

Re: Find matching records and group

Yes, its the merge part I am a bit unsure of as the logic which applies to one record also applies in other ways.  If your run the below you will see what I mean as there a five output records:

data dataset1;
  input ID_A ID_B HOUR;
datalines;
45          12        12
45          12        11
56          89        00
;
run;

data dataset2;
  input ID_A ID_B HOUR;
datalines;
12          45        13
12          45        12
45          12        05
89          56        01
;
run;

proc sql;
  create table WANT as
  select  COALESCE(MIN(A.ID_A,A.ID_B),MIN(B.ID_A,B.ID_B)) as ID_A,
          COALESCE(MAX(A.ID_A,A.ID_B),MAX(B.ID_A,B.ID_B)) as ID_B,
          A.HOUR as HOUR_A,
          B.HOUR as HOUR_B,
          case  when A.HOUR is null then "Other Hours"
                else "1-2 Hours" end as GROUPING
  from    DATASET1 A
  right join DATASET2 B
  on      MIN(A.ID_A,A.ID_B)=MIN(B.ID_A,B.ID_B)
  and     MAX(A.ID_A,A.ID_B)=MAX(B.ID_A,B.ID_B)
  and     A.HOUR - 1 <= B.HOUR <= A.HOUR + 1;
qui

Super User
Posts: 9,671

Re: Find matching records and group

Are you doing Multi-Multi matched ?


data dataset1;
  input ID_A ID_B HOUR;
datalines;
45          12        12
45          12        11
56          89        00
;
run;
data dataset2;
  input ID_A ID_B HOUR;
datalines;
12          45        13
12          45        12
45          12        05
89          56        01
;
run;
data want;
 merge dataset2(in=ina) dataset1(keep=ID_B HOUR rename=(ID_B=ID_A HOUR=_HOUR));
 by ID_A;
 group=ifc(. lt abs(hour-_hour) lt 3,'1-2 hours ','other hours');
 drop _hour;
run;

Xia Keshan

Ask a Question
Discussion stats
  • 4 replies
  • 174 views
  • 0 likes
  • 3 in conversation