Re: compare two tables and find the participants in common

knighsson · Posted 04-07-2021 10:44 AM

Hello,

I have two tables that both contain variables "participant sequence number" (value: 1-10000) and "participant group number" (value: 1-4). The sample of two tables are the same, so "participant sequence number" are same in two tables, but the methods of grouping are different, so their corresponding "participant group number" may be different between two tables.

I want to know how many participants that are included in the same group are in both tables? (for example， how many participants that are included in group 1 in the first table, and also included in group 1 in the second table) .

Could you tell me how to do this?

Thanks in advance.

sbxkoenk · Posted 04-07-2021 12:00 PM

Maybe a PROC FREQ?

Submit this as an example:

data have1;
 call streaminit(123);
 do participant_sequence_number = 1 to 10000;
  participant_group_number=floor(rand('uniform')*5);
  output;
 end;
run;
data have2;
 call streaminit(456);
 do participant_sequence_number = 1 to 10000;
  participant_group_number=floor(rand('uniform')*5);
  output;
 end;
run;
data want;
 merge have1(rename=(participant_group_number=participant_group_number1)) 
       have2(rename=(participant_group_number=participant_group_number2)); 
 by participant_sequence_number /* participant_group_number */;
run;

PROC FREQ data=want;
 tables participant_group_number1 * participant_group_number2;
run;
/* end of program */

Cheers,

Koen

ballardw · Posted 04-07-2021 12:03 PM

@sbxkoenk wrote:

Maybe a PROC FREQ?

Submit this as an example:

data have1;
 call streaminit(123);
 do participant_sequence_number = 1 to 10000;
  participant_group_number=floor(rand('uniform')*5);
  output;
 end;
run;
data have2;
 call streaminit(456);
 do participant_sequence_number = 1 to 10000;
  participant_group_number=floor(rand('uniform')*5);
  output;
 end;
run;
data want;
 merge have1(rename=(participant_group_number=participant_group_number1)) 
       have2(rename=(participant_group_number=participant_group_number2)); 
 by participant_sequence_number /* participant_group_number */;
run;

PROC FREQ data=want;
 tables participant_group_number1 * participant_group_number2;
run;
/* end of program */

Cheers,

Koen

I think the LIST option may make the output easier to read for the original question:

PROC FREQ data=want;
 tables participant_group_number1 * participant_group_number2/ list ;
run;

knighsson · Posted 04-07-2021 03:11 PM

Thank you!

knighsson · Posted 04-07-2021 03:11 PM

Thank you! this is helpful!

PGStats · Posted 04-07-2021 01:46 PM

Or use SQL:

proc sql;
select 
    a.participant_group_number, 
    count(*) as nbInBothTables
from
    have1 as a inner join
    have2 as b 
        on  a.participant_sequence_number = b.participant_sequence_number and 
            a.participant_group_number = b.participant_group_number
group by a.participant_group_number;
quit;

Using simulated data from @sbxkoenk ,

PG

knighsson · Posted 04-07-2021 03:12 PM

Thank you! This is helpful!

compare two tables and find the participants in common