Hello,
I have two tables that both contain variables "participant sequence number" (value: 1-10000) and "participant group number" (value: 1-4). The sample of two tables are the same, so "participant sequence number" are same in two tables, but the methods of grouping are different, so their corresponding "participant group number" may be different between two tables.
I want to know how many participants that are included in the same group are in both tables? (for example, how many participants that are included in group 1 in the first table, and also included in group 1 in the second table) .
Could you tell me how to do this?
Thanks in advance.
Maybe a PROC FREQ?
Submit this as an example:
data have1;
call streaminit(123);
do participant_sequence_number = 1 to 10000;
participant_group_number=floor(rand('uniform')*5);
output;
end;
run;
data have2;
call streaminit(456);
do participant_sequence_number = 1 to 10000;
participant_group_number=floor(rand('uniform')*5);
output;
end;
run;
data want;
merge have1(rename=(participant_group_number=participant_group_number1))
have2(rename=(participant_group_number=participant_group_number2));
by participant_sequence_number /* participant_group_number */;
run;
PROC FREQ data=want;
tables participant_group_number1 * participant_group_number2;
run;
/* end of program */
Cheers,
Koen
@sbxkoenk wrote:
Maybe a PROC FREQ?
Submit this as an example:
data have1; call streaminit(123); do participant_sequence_number = 1 to 10000; participant_group_number=floor(rand('uniform')*5); output; end; run; data have2; call streaminit(456); do participant_sequence_number = 1 to 10000; participant_group_number=floor(rand('uniform')*5); output; end; run; data want; merge have1(rename=(participant_group_number=participant_group_number1)) have2(rename=(participant_group_number=participant_group_number2)); by participant_sequence_number /* participant_group_number */; run; PROC FREQ data=want; tables participant_group_number1 * participant_group_number2; run; /* end of program */
Cheers,
Koen
I think the LIST option may make the output easier to read for the original question:
PROC FREQ data=want; tables participant_group_number1 * participant_group_number2/ list ; run;
Or use SQL:
proc sql;
select
a.participant_group_number,
count(*) as nbInBothTables
from
have1 as a inner join
have2 as b
on a.participant_sequence_number = b.participant_sequence_number and
a.participant_group_number = b.participant_group_number
group by a.participant_group_number;
quit;
Using simulated data from @sbxkoenk ,
Thank you! This is helpful!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.