Good afternoon,
I have data with multiple entries per participant, but certain variables only have values at the last visit. Unfortunately that last visit has a different name and date for each participant. Additionally there is a different number of visits for each participant.
I've ranked the visits and would like to carry the values for sex and study arm from the last visit across all the visits. How can I do this?
In the past I've carried values for dates using something like this:
data C;
set B;
Keep ID event_name DOB var4 var5 var6;
merge B(where=(event_name= 'delivery') rename=(DOB=DOB1)) A;
by ID;
run;
I can't figure out how to make this work without a constant value for the where statement.
For clarity, what I have looks something like this:
ID | count | Sex | Study Arm |
1 | 1 | . | . |
1 | … | . | . |
1 | 7 | 2 | 2 |
4 | 1 | . | . |
4 | 2 | . | . |
4 | 3 | . | . |
4 | … | . | . |
4 | 18 | 1 | 1 |
8 | 1 | . | . |
8 | 2 | . | . |
8 | 3 | . | . |
8 | … | . | . |
8 | 18 | 2 | 1 |
13 | 1 | . | . |
13 | 2 | . | . |
13 | 3 | . | . |
13 | … | . | . |
13 | 14 | 2 | 2 |
15 | 1 | . | . |
15 | 2 | . | . |
15 | 3 | . | . |
15 | 4 | . | . |
15 | 5 | 1 | 2 |
And I'd like it to look like this:
ID | count | Sex | Study Arm |
1 | 1 | 2 | 2 |
1 | … | 2 | 2 |
1 | 7 | 2 | 2 |
4 | 1 | 1 | 1 |
4 | 2 | 1 | 1 |
4 | 3 | 1 | 1 |
4 | … | 1 | 1 |
4 | 18 | 1 | 1 |
8 | 1 | 2 | 1 |
8 | 2 | 2 | 1 |
8 | 3 | 2 | 1 |
8 | … | 2 | 1 |
8 | 18 | 2 | 1 |
13 | 1 | 2 | 2 |
13 | 2 | 2 | 2 |
13 | 3 | 2 | 2 |
13 | … | 2 | 2 |
13 | 14 | 2 | 2 |
15 | 1 | 1 | 2 |
15 | 2 | 1 | 2 |
15 | 3 | 1 | 2 |
15 | 4 | 1 | 2 |
15 | 5 | 1 | 2 |
Try this
data have;
input ID count Sex StudyArm;
datalines;
1 1 . .
1 7 2 2
4 1 . .
4 2 . .
4 3 . .
4 18 1 1
8 1 . .
8 2 . .
8 3 . .
8 18 2 1
13 1 . .
13 2 . .
13 3 . .
13 14 2 2
15 1 . .
15 2 . .
15 3 . .
15 4 . .
15 5 1 2
;
data want(drop=_:);
do until (last.ID);
set have;
by ID;
if Sex > _Sex then do;
_Sex = Sex;
_StudyArm = StudyArm;
end;
end;
do until (last.ID);
set have;
by ID;
Sex = _Sex;
StudyArm = _StudyArm;
output;
end;
run;
Try this
data have;
input ID count Sex StudyArm;
datalines;
1 1 . .
1 7 2 2
4 1 . .
4 2 . .
4 3 . .
4 18 1 1
8 1 . .
8 2 . .
8 3 . .
8 18 2 1
13 1 . .
13 2 . .
13 3 . .
13 14 2 2
15 1 . .
15 2 . .
15 3 . .
15 4 . .
15 5 1 2
;
data want(drop=_:);
do until (last.ID);
set have;
by ID;
if Sex > _Sex then do;
_Sex = Sex;
_StudyArm = StudyArm;
end;
end;
do until (last.ID);
set have;
by ID;
Sex = _Sex;
StudyArm = _StudyArm;
output;
end;
run;
SQL let's you cheat here quite easily.
proc sql;
create table want as
select *, max(sex) as Sex_filled, max(study_arm) as study_arm_filled
from have
group by ID
order by ID, count;
quit;
The variables with _filled will have the values you want in it.
@EpiNovice wrote:
Good afternoon,
I have data with multiple entries per participant, but certain variables only have values at the last visit. Unfortunately that last visit has a different name and date for each participant. Additionally there is a different number of visits for each participant.
I've ranked the visits and would like to carry the values for sex and study arm from the last visit across all the visits. How can I do this?
In the past I've carried values for dates using something like this:
data C;
set B;
Keep ID event_name DOB var4 var5 var6;
merge B(where=(event_name= 'delivery') rename=(DOB=DOB1)) A;
by ID;
run;
I can't figure out how to make this work without a constant value for the where statement.
For clarity, what I have looks something like this:
ID count Sex Study Arm 1 1 . . 1 … . . 1 7 2 2 4 1 . . 4 2 . . 4 3 . . 4 … . . 4 18 1 1 8 1 . . 8 2 . . 8 3 . . 8 … . . 8 18 2 1 13 1 . . 13 2 . . 13 3 . . 13 … . . 13 14 2 2 15 1 . . 15 2 . . 15 3 . . 15 4 . . 15 5 1 2
And I'd like it to look like this:
ID count Sex Study Arm 1 1 2 2 1 … 2 2 1 7 2 2 4 1 1 1 4 2 1 1 4 3 1 1 4 … 1 1 4 18 1 1 8 1 2 1 8 2 2 1 8 3 2 1 8 … 2 1 8 18 2 1 13 1 2 2 13 2 2 2 13 3 2 2 13 … 2 2 13 14 2 2 15 1 1 2 15 2 1 2 15 3 1 2 15 4 1 2 15 5 1 2
data have;
input ID count Sex StudyArm;
datalines;
1 1 . .
1 7 2 2
4 1 . .
4 2 . .
4 3 . .
4 18 1 1
8 1 . .
8 2 . .
8 3 . .
8 18 2 1
13 1 . .
13 2 . .
13 3 . .
13 14 2 2
15 1 . .
15 2 . .
15 3 . .
15 4 . .
15 5 1 2
;
run;
data want;
if _n_=0 then set have;
if _n_=1 then do;
declare hash h(dataset:"have(where=(StudyArm ne .))",ordered:'y');
h.definekey('ID');
h.definedata('StudyArm','Sex');
h.definedone();
call missing(StudyArm,Sex);
end;
set have(drop=Sex StudyArm);
rc=h.find();
drop rc;
run;
Hi @EpiNovice How about tweaking your original idea --merge
data have;
input ID count Sex StudyArm;
datalines;
1 1 . .
1 7 2 2
4 1 . .
4 2 . .
4 3 . .
4 18 1 1
8 1 . .
8 2 . .
8 3 . .
8 18 2 1
13 1 . .
13 2 . .
13 3 . .
13 14 2 2
15 1 . .
15 2 . .
15 3 . .
15 4 . .
15 5 1 2
;
data want;
merge have(keep=id count) have(drop=count where=(StudyArm and sex));
by id;
run;
And the same can also be written as
data want;
merge have(where=(StudyArm and sex)) have(keep=id count);
by id;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.