Hello,
I have a longitudinal dataset, where subjects are followed up for 2 weeks. An observation is recorded daily. The subjects each have a start and end date that is different. There are some missing values in there too. Data set example:
Subject_ID Date Obs
10 01/02/21 1
10 01/03/21 1
10 01/05/21 2
11 02/05/21 4
11 02/06/21 3
11 02/07/21 2
12 03/05/21 1
12 03/06/21 5
How do I add a column for the day order (i.e. day1, day2.....day14) which includes a missing values e.g. subject 10 is missing 01/04/21 and the rest upto two weeks.
Thanks.
So you want to insure that each subject has at least 14 days/observations?
You need to know the first/last day that was actually collected. Or the first expected day if it is possible the first day is missing. For now let's just compute that from your existing dataset.
So first let's convert your listing into an actual SAS dataset so we have something to test our program against.
data have ;
input Subject_ID Date :yymmdd. Var1 ;
format date yymmdd10.;
cards;
10 2021-01-02 1
10 2021-01-03 1
10 2021-01-05 2
11 2021-02-05 4
11 2021-02-06 3
11 2021-02-07 2
12 2021-03-05 1
12 2021-03-06 5
;
Note: To avoid confusing half of your audience do not display dates in either MDY or DMY order. Use YMD order or the DATE9. format instead.
Now calculate the min/max date per subject and then generate a dataset with all of the observations for each subject. Then combine it with your actual data.
proc means data=have;
by subject_id;
var date;
output out=skeleton min=start max=end;
run;
data skeleton;
set skeleton;
end=max(end,start+13);
do date=start to end;
day=date-start+1;
output;
end;
format date yymmdd10.;
keep subject_id date day ;
run;
data want;
merge skeleton have;
by subject_id date;
run;
Result:
Subject_ Obs ID date day Var1 1 10 2021-01-02 1 1 2 10 2021-01-03 2 1 3 10 2021-01-04 3 . 4 10 2021-01-05 4 2 5 10 2021-01-06 5 . 6 10 2021-01-07 6 . 7 10 2021-01-08 7 . 8 10 2021-01-09 8 . 9 10 2021-01-10 9 . 10 10 2021-01-11 10 . 11 10 2021-01-12 11 . 12 10 2021-01-13 12 . 13 10 2021-01-14 13 . 14 10 2021-01-15 14 . 15 11 2021-02-05 1 4 16 11 2021-02-06 2 3 17 11 2021-02-07 3 2 18 11 2021-02-08 4 . 19 11 2021-02-09 5 . 20 11 2021-02-10 6 . 21 11 2021-02-11 7 . 22 11 2021-02-12 8 . 23 11 2021-02-13 9 . 24 11 2021-02-14 10 . 25 11 2021-02-15 11 . 26 11 2021-02-16 12 . 27 11 2021-02-17 13 . 28 11 2021-02-18 14 . 29 12 2021-03-05 1 1 30 12 2021-03-06 2 5 31 12 2021-03-07 3 . 32 12 2021-03-08 4 . 33 12 2021-03-09 5 . 34 12 2021-03-10 6 . ...
Your have data is monthly but in your narrative you're talking about two week periods.
Can you please post the desired result based on the sample data you've shared? That should reduce most of the ambiguity.
Why is 1 repeated for two different dates for the same subject?
@ppl wrote:
Hello,
I have a longitudinal dataset, where subjects are followed up for 2 weeks. An observation is recorded daily. The subjects each have a start and end date that is different. There are some missing values in there too. Data set example:
Subject_ID Date Obs
10 01/02/21 1
10 01/03/21 1
10 01/05/21 2
11 02/05/21 4
11 02/06/21 3
11 02/07/21 2
12 03/05/21 1
12 03/06/21 5
How do I add a column for the day order (i.e. day1, day2.....day14) which includes a missing values e.g. subject 10 is missing 01/04/21 and the rest upto two weeks.
Thanks.
I suspect your question is related to this persons?
@ppl wrote:
Hello,
I have a longitudinal dataset, where subjects are followed up for 2 weeks. An observation is recorded daily. The subjects each have a start and end date that is different. There are some missing values in there too. Data set example:
Subject_ID Date Obs
10 01/02/21 1
10 01/03/21 1
10 01/05/21 2
11 02/05/21 4
11 02/06/21 3
11 02/07/21 2
12 03/05/21 1
12 03/06/21 5
How do I add a column for the day order (i.e. day1, day2.....day14) which includes a missing values e.g. subject 10 is missing 01/04/21 and the rest upto two weeks.
Thanks.
An observation is recorded daily.
The obs you have posted seem to be monthly, or are the dates in the obscure month-day-year format? Posting data as working data step always helps to avoid such confusion. Also add an explanation for the variable "obs" it is not clear whether it is just additional information in the data or should be used to solve the task.
There are some missing values in there too
You don't have a single missing value in the data you have posted, maybe obs are missing.
How do I add a column for the day order (i.e. day1, day2.....day14) which includes a missing values e.g. subject 10 is missing 01/04/21 and the rest upto two weeks.
Not clear what want actually. Please add a table showing the expected results.
After some mind-wrangling this may or may not be want you expect:
data want;
set have;
by Subject_ID;
retain expected_date first_date day;
if first.Subject_Id then do;
day = 1;
expected_date = date;
first_date = date;
end;
else do;
day = day + 1;
end;
if expected_date = date then do;
output;
end;
else do;
do i = 0 to date - expected_date;
date = expected_date + i;
day = day + i;
output;
end;
end;
if last.Subject_Id then do;
d = 14 - intck('day', first_date, date) - 1;
if d > 0 then do i = 1 to d;
date = date + 1;
day = day + 1;
output;
end;
end;
expected_date = date + 1;
drop i d expected_date first_date ;
run;
Thanks, I think this is almost what is expected. For those asking why I have data for different months, the data was collected over several months, but all subjects have data for 2 weeks only. It is not practical to have all subjects have the same start date. Some will be recruited this month, others next month and so on.
Regarding the date format it was meant to be in mmddyy10. format. I also meant to say Var instead of obs.
There is a lot of missing data in the table as the observations are meant to be taken daily for two weeks only.
Expected table:
Subject_ID Date Obs Day
10 01/02/21 1 Day1
10 01/03/21 1 Day2
10 01/04/21 - Day 3
10 01/05/21 2 Day 4
10 01/06/21 - Day 5
10 01/07/21 - Day 6
10 01/08/21 - Day 7
10 01/09/21 - Day 8
etc upto Day 14
11 02/05/21 4 Day1
11 02/06/21 3 Day2
11 02/07/21 2 Day3
etc upto Day 14
12 03/05/21 1 Day 1
12 03/06/21 5 Day 2
I hope this helps.
Hi @ppl,
I assume that you have a table (work.intervals) with the start and end days per subject, e.g. with the following three columns:
Subject_ID Start_Date Enddate
You could create a new table (work.day_order) for the day order (i.e. day1, day2.....day14) per subject with the following three columns via a DO loop in a data step
Subject_ID Day_order Date
Then you could merge the day order table with your observation table (work.observations) via the column “Date”.
But that is just one of several possibilites.
A result like this
Subject_ID | date | day_order | Obs |
10 | 01/02/2021 | Day 1 | 1 |
10 | 01/03/2021 | Day 2 | 1 |
10 | 01/04/2021 | Day 3 | . |
10 | 01/05/2021 | Day 4 | 2 |
10 | 01/06/2021 | Day 5 | . |
10 | 01/07/2021 | Day 6 | . |
10 | 01/08/2021 | Day 7 | . |
10 | 01/09/2021 | Day 8 | . |
can be produced with the following code (I created the input data in excel):
libname Intval xlsx "/home/u59608141/sasuser.v94/Intervals.xlsx";
libname Observat xlsx "/home/u59608141/sasuser.v94/Observations.xlsx";
data day_order;
set intval.Tabelle1;
date=start_date-1;
Do i=1 to 14;
day_order = cat("Day ",i);
date=date+1;
Format date MMDDYY10.;
output;
end;
keep Subject_ID day_order date;
run;
data Observat;
set Observat.Observations;
Datenum=input(Date,MMDDYY8.);
drop date;
run;
data result;
merge day_order observat (rename=(datenum=date));
by Subject_ID date;
run;
proc print data=result noobs;
run;
Thanks for your response. Unfortunately, I do not have a start or end date column. Just dates arranged in chronological order with a variable collected at that date.
I just noticed, that you don't need an enddate, if you assume that the intervall is always 14 days.
For the start date you might use the first. Column as above.
data have;
input Subject_ID Date : mmddyy10. Obs;
format Date mmddyy10.;
cards;
10 01/02/21 1
10 01/03/21 1
10 01/05/21 2
11 02/05/21 4
11 02/06/21 3
11 02/07/21 2
12 03/05/21 1
12 03/06/21 5
;
data want;
merge have have(keep=Subject_ID Date rename=(Subject_ID=_Subject_ID Date=_Date) firstobs=2);
output;
if Subject_ID=_Subject_ID then do;
do date=date+1 to _date-1;
obs=.;output;
end;
end;
drop _:;
run;
So you want to insure that each subject has at least 14 days/observations?
You need to know the first/last day that was actually collected. Or the first expected day if it is possible the first day is missing. For now let's just compute that from your existing dataset.
So first let's convert your listing into an actual SAS dataset so we have something to test our program against.
data have ;
input Subject_ID Date :yymmdd. Var1 ;
format date yymmdd10.;
cards;
10 2021-01-02 1
10 2021-01-03 1
10 2021-01-05 2
11 2021-02-05 4
11 2021-02-06 3
11 2021-02-07 2
12 2021-03-05 1
12 2021-03-06 5
;
Note: To avoid confusing half of your audience do not display dates in either MDY or DMY order. Use YMD order or the DATE9. format instead.
Now calculate the min/max date per subject and then generate a dataset with all of the observations for each subject. Then combine it with your actual data.
proc means data=have;
by subject_id;
var date;
output out=skeleton min=start max=end;
run;
data skeleton;
set skeleton;
end=max(end,start+13);
do date=start to end;
day=date-start+1;
output;
end;
format date yymmdd10.;
keep subject_id date day ;
run;
data want;
merge skeleton have;
by subject_id date;
run;
Result:
Subject_ Obs ID date day Var1 1 10 2021-01-02 1 1 2 10 2021-01-03 2 1 3 10 2021-01-04 3 . 4 10 2021-01-05 4 2 5 10 2021-01-06 5 . 6 10 2021-01-07 6 . 7 10 2021-01-08 7 . 8 10 2021-01-09 8 . 9 10 2021-01-10 9 . 10 10 2021-01-11 10 . 11 10 2021-01-12 11 . 12 10 2021-01-13 12 . 13 10 2021-01-14 13 . 14 10 2021-01-15 14 . 15 11 2021-02-05 1 4 16 11 2021-02-06 2 3 17 11 2021-02-07 3 2 18 11 2021-02-08 4 . 19 11 2021-02-09 5 . 20 11 2021-02-10 6 . 21 11 2021-02-11 7 . 22 11 2021-02-12 8 . 23 11 2021-02-13 9 . 24 11 2021-02-14 10 . 25 11 2021-02-15 11 . 26 11 2021-02-16 12 . 27 11 2021-02-17 13 . 28 11 2021-02-18 14 . 29 12 2021-03-05 1 1 30 12 2021-03-06 2 5 31 12 2021-03-07 3 . 32 12 2021-03-08 4 . 33 12 2021-03-09 5 . 34 12 2021-03-10 6 . ...
Thank you so much. Appreciate your help in teaching me this one.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.