Hi,
I have a longitudinal dataset of 800 subjects who got measurements at different time points over time. I want to create a dataset that includes only first measurements (baseline)....The data looks like the below:
ID VISIT feature
1 2 .
1 3 3.77
1 4 8.88
1 5 5.22
1 6 2.21
2 1 .
2 3 .
2 4 6.88
2 5 7.77
2 6 8.54
I am using the following code:
data want;
set have;
by ID VISIT;
if first.ID and first.VISIT;
run;
However, it doesn't seem to work. The data should look like this:
ID VISIT feature
1 3 3.77
2 4 6.88
any advice would be appreciated...
Thanks!
Because you want to ignore missing you should drop them as well.
data want;
set have (where =(not missing(feature)));
by ID VISIT;
if first.ID;
run;
Because you want to ignore missing you should drop them as well.
data want;
set have (where =(not missing(feature)));
by ID VISIT;
if first.ID;
run;
Thank you Reeza. This was very helpful.
Quick question: Is there also a way to pick only the third or fourth observation, and not just the first?
In other words is there a third.ID command?
Thanks.
No, there’s first and last. But you can use first to set a counter and use the counter as desired.
If first.id then counter =1;
else counter+1;
You need to eliminate the "and first.VISIT". The sort requires using the VISIT variable to order the records so the first visit is the first record for the ID. But for the actual selection, you only need first.ID.
Also note that
first.VIST
is always true when
first.ID
so
if first.ID and first.VISIT;
is the same as
if first.ID;
To summarise the suggestions:
data WANT;
set HAVE;
where FEATURE;
by ID VISIT;
if first.ID ;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.