Hello,
I have a repeated-measures dataset in long format with each person represented by the numeric variable "ID." Each ID has a date for each row, with the number of rows/dates per person ranging from 3 to 16. The dates per person are mostly consecutive, but there is some data missingness, such that a date for a person may skip from "9/1/2014" to "9/3/2014" for the next row. I would like to create a variable that is "1" for the very first date for each person, and then the following rows represent the number of days from that first date (plus 1). Below is the sample code for what I would like, with the variable "want" representing the variable I want to create:
data data;
input ID date : mmddyy10. want;
format date mmddyy10.;
datalines;
1000007 10/26/2014 1
1000007 10/27/2014 2
1000007 10/28/2014 3
1000007 10/30/2014 5
1000007 10/31/2014 6
1000007 11/1/2014 7
1000007 11/2/2014 8
1000011 9/1/2014 1
1000011 9/3/2014 3
1000011 9/4/2014 4
1000011 9/7/2014 7
1000011 9/8/2014 8
1000011 9/9/2014 9
1000055 9/23/2015 1
1000055 9/24/2015 2
1000055 9/25/2015 3
1000055 9/28/2015 6
;
run;
As you can see, for ID 1000007
, the first date (10/26/2014
) is given a 1, followed by 2 for 10/27/2014
and 3 for 10/28/2014
. Since 10/28/2014
skips to 10/30/2014
for the next row, 10/30/2014
is given a 5 (4 days from the original first date, +1). And so on. The next ID (1000011
) again begins with 1 for the "want" variable and follows this pattern.
Thanks for your assistance.
Hi Reeza,
Thanks, almost perfect! Just needed to add a "1 + " to the calculation of the want variable:
want = 1+(date-start_date);
Best,
confooseddesi90
Create a new variable called START_DATE that you initialize at the beginning of each ID. Use RETAIN to hold that value across the rows until it resets at the next ID. Subtract the date from the start_date to get the difference. This assumes your data is sorted and ordered by ID and DATE. If not, sort it ahead of time.
data want;
set data;
by ID date;
retain start_date;
if first.ID then start_date = date;
want = date-start_date;
run;
@confooseddesi89 wrote:
Hello,
I have a repeated-measures dataset in long format with each person represented by the numeric variable "ID." Each ID has a date for each row, with the number of rows/dates per person ranging from 3 to 16. The dates per person are mostly consecutive, but there is some data missingness, such that a date for a person may skip from "9/1/2014" to "9/3/2014" for the next row. I would like to create a variable that is "1" for the very first date for each person, and then the following rows represent the number of days from that first date (plus 1). Below is the sample code for what I would like, with the variable "want" representing the variable I want to create:
data data; input ID date : mmddyy10. want; format date mmddyy10.; datalines; 1000007 10/26/2014 1 1000007 10/27/2014 2 1000007 10/28/2014 3 1000007 10/30/2014 5 1000007 10/31/2014 6 1000007 11/1/2014 7 1000007 11/2/2014 8 1000011 9/1/2014 1 1000011 9/3/2014 3 1000011 9/4/2014 4 1000011 9/7/2014 7 1000011 9/8/2014 8 1000011 9/9/2014 9 1000055 9/23/2015 1 1000055 9/24/2015 2 1000055 9/25/2015 3 1000055 9/28/2015 6 ; run;
As you can see, for ID
1000007
, the first date (10/26/2014
) is given a 1, followed by 2 for10/27/2014
and 3 for10/28/2014
. Since10/28/2014
skips to10/30/2014
for the next row,10/30/2014
is given a 5 (4 days from the original first date, +1). And so on. The next ID (1000011
) again begins with 1 for the "want" variable and follows this pattern.
Thanks for your assistance.
Hi Reeza,
Thanks, almost perfect! Just needed to add a "1 + " to the calculation of the want variable:
want = 1+(date-start_date);
Best,
confooseddesi90
Not recommending, but fun-
data data;
input ID date : mmddyy10. want;
format date mmddyy10.;
datalines;
1000007 10/26/2014 1
1000007 10/27/2014 2
1000007 10/28/2014 3
1000007 10/30/2014 5
1000007 10/31/2014 6
1000007 11/1/2014 7
1000007 11/2/2014 8
1000011 9/1/2014 1
1000011 9/3/2014 3
1000011 9/4/2014 4
1000011 9/7/2014 7
1000011 9/8/2014 8
1000011 9/9/2014 9
1000055 9/23/2015 1
1000055 9/24/2015 2
1000055 9/25/2015 3
1000055 9/28/2015 6
;
run;
proc sql;
create table want as
select *, date-min(date)+1 as want2
from data
group by id
order by id, date;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.