Hi! I used data merge to match two tables with a ratio of 1:4, but the merged table presented a ratio of 1:6 or even more. I would like to only keep the first 4 rows for each case ID. How could I do it?
Here are the codes that I used for data merge:
proc sql ;
create view _controls
as select control_ID,agegrp, sex, A, B, C, rand('uniform') as random1, year
from &controls
order by year,agegrp, sex, A, B, C,random1;
quit;
data _cases;
set &cases;
do rep=1 to 4;
random2=rand('uniform');
output;
end;
keep case_id year agegrp sex A B C rep random2;
run;
proc sort data=_cases;
by year agegrp sex A B C rep random2;
run;
data &dataout &missing;
merge _cases(in=in1) _controls(in=in2);
by year agegrp sex A B C;
if in1=1 and in2=1 then output &dataout;
else if in1=1 and in2=0 then output &missing;
run;
proc sort data=&dataout;
by case_id;
run;
then the merged table looks like this:
control_id case_id rep
0001 1 1
0002 1 2
0003 1 3
0004 1 4
0005 1 4
0006 1 4
0007 2 1
0008 2 2
0009 2 3
0010 2 4
0011 2 4
0012 3 1
I only want the first 4 rows for each unique case_id.
Thanks!
data want;
set &dataout.;
by case_id;
if first.case_id
then count = 1;
else count + 1;
if count le 4;
drop count;
run;
data want;
set &dataout.;
by case_id;
if first.case_id
then count = 1;
else count + 1;
if count le 4;
drop count;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.