BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ppl
Fluorite | Level 6 ppl
Fluorite | Level 6

Hello,

I have a longitudinal dataset, where subjects are followed up for 2 weeks. An observation is recorded daily. The subjects each have a start and end date that is different. There are some missing values in there too. Data set example:

Subject_ID   Date    Obs

10       01/02/21    1

10      01/03/21     1

10      01/05/21     2

11      02/05/21     4

11      02/06/21     3

11      02/07/21     2

12      03/05/21    1

12      03/06/21    5         

 

How do I add a column for the day order (i.e. day1, day2.....day14) which includes a missing values e.g. subject 10 is missing 01/04/21 and the rest upto two weeks. 

 

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So you want to insure that each subject has at least 14 days/observations?

You need to know the first/last day that was actually collected. Or the first expected day if it is possible the first day is missing.  For now let's just compute that from your existing dataset.

So first let's convert your listing into an actual SAS dataset so we have something to test our program against.

data have ;
  input Subject_ID  Date :yymmdd. Var1 ;
  format date yymmdd10.;
cards;
10 2021-01-02 1
10 2021-01-03 1
10 2021-01-05 2
11 2021-02-05 4
11 2021-02-06 3
11 2021-02-07 2
12 2021-03-05 1
12 2021-03-06 5 
;

Note: To avoid confusing half of your audience do not display dates in either MDY or DMY order. Use YMD order or the DATE9. format instead.

 

Now calculate the min/max date per subject and then generate a dataset with all of the observations for each subject. Then combine it with your actual data.

proc means data=have;
  by subject_id;
  var date;
  output out=skeleton min=start max=end;
run;

data skeleton;
  set skeleton;
  end=max(end,start+13);
  do date=start to end;
    day=date-start+1;
    output;
  end;
  format date yymmdd10.;
  keep subject_id date day ;
run;

data want;
  merge skeleton have;
  by subject_id date;
run;

Result:

      Subject_
Obs       ID             date    day    Var1

  1       10       2021-01-02      1      1
  2       10       2021-01-03      2      1
  3       10       2021-01-04      3      .
  4       10       2021-01-05      4      2
  5       10       2021-01-06      5      .
  6       10       2021-01-07      6      .
  7       10       2021-01-08      7      .
  8       10       2021-01-09      8      .
  9       10       2021-01-10      9      .
 10       10       2021-01-11     10      .
 11       10       2021-01-12     11      .
 12       10       2021-01-13     12      .
 13       10       2021-01-14     13      .
 14       10       2021-01-15     14      .
 15       11       2021-02-05      1      4
 16       11       2021-02-06      2      3
 17       11       2021-02-07      3      2
 18       11       2021-02-08      4      .
 19       11       2021-02-09      5      .
 20       11       2021-02-10      6      .
 21       11       2021-02-11      7      .
 22       11       2021-02-12      8      .
 23       11       2021-02-13      9      .
 24       11       2021-02-14     10      .
 25       11       2021-02-15     11      .
 26       11       2021-02-16     12      .
 27       11       2021-02-17     13      .
 28       11       2021-02-18     14      .
 29       12       2021-03-05      1      1
 30       12       2021-03-06      2      5
 31       12       2021-03-07      3      .
 32       12       2021-03-08      4      .
 33       12       2021-03-09      5      .
 34       12       2021-03-10      6      .
...

 

View solution in original post

12 REPLIES 12
Patrick
Opal | Level 21

Your have data is monthly but in your narrative you're talking about two week periods. 

Can you please post the desired result based on the sample data you've shared? That should reduce most of the ambiguity. 

Reeza
Super User

Why is 1 repeated for two different dates for the same subject?

 


@ppl wrote:

Hello,

I have a longitudinal dataset, where subjects are followed up for 2 weeks. An observation is recorded daily. The subjects each have a start and end date that is different. There are some missing values in there too. Data set example:

Subject_ID   Date    Obs

10       01/02/21    1

10      01/03/21     1

10      01/05/21     2

11      02/05/21     4

11      02/06/21     3

11      02/07/21     2

12      03/05/21    1

12      03/06/21    5         

 

How do I add a column for the day order (i.e. day1, day2.....day14) which includes a missing values e.g. subject 10 is missing 01/04/21 and the rest upto two weeks. 

 

Thanks.



I suspect your question is related to this persons? 

https://communities.sas.com/t5/SAS-Programming/Creating-a-variable-indicating-day-number-per-person/...

 

 

andreas_lds
Jade | Level 19

@ppl wrote:

Hello,

I have a longitudinal dataset, where subjects are followed up for 2 weeks. An observation is recorded daily. The subjects each have a start and end date that is different. There are some missing values in there too. Data set example:

Subject_ID   Date    Obs

10       01/02/21    1

10      01/03/21     1

10      01/05/21     2

11      02/05/21     4

11      02/06/21     3

11      02/07/21     2

12      03/05/21    1

12      03/06/21    5         

 

How do I add a column for the day order (i.e. day1, day2.....day14) which includes a missing values e.g. subject 10 is missing 01/04/21 and the rest upto two weeks. 

 

Thanks.


An observation is recorded daily.

The obs you have posted seem to be monthly, or are the dates in the obscure month-day-year format? Posting data as working data step always helps to avoid such confusion. Also add an explanation for the variable "obs" it is not clear whether it is just additional information in the data or should be used to solve the task.

 

There are some missing values in there too

You don't have a single missing value in the data you have posted, maybe obs are missing.

 

How do I add a column for the day order (i.e. day1, day2.....day14) which includes a missing values e.g. subject 10 is missing 01/04/21 and the rest upto two weeks. 

Not clear what want actually. Please add a table showing the expected results.

 

After some mind-wrangling this may or may not be want you expect:

data want;
   set have;
   by Subject_ID;
   
   retain expected_date first_date day;
   
   if first.Subject_Id then do;
      day = 1;
      expected_date = date;
      first_date = date;
   end;
   else do;
      day = day + 1;
   end;
   
   if expected_date = date then do;
      output;
   end;
   else do;
      do i = 0 to date - expected_date;
         date = expected_date + i;
         day = day + i;
         output;
      end;
   end;
   
   if last.Subject_Id then do;
      d = 14 - intck('day', first_date, date) - 1;
      
      if d > 0 then do i = 1 to d;
         date = date + 1;
         day = day + 1;
         output;
      end;
   end;

   expected_date = date + 1;
   
   drop i d expected_date first_date ;
run;
ppl
Fluorite | Level 6 ppl
Fluorite | Level 6

Thanks, I think this is almost what is expected. For those asking why I have data for different months, the data was collected over several months, but all subjects have data for 2 weeks only. It is not practical to have all subjects have the same start date. Some will be recruited this month, others next month and so on. 

Regarding the date format it was meant to be in mmddyy10. format. I also meant to say Var instead of obs.

There is a lot of missing data in the table as the observations are meant to be taken daily for two weeks only.

Expected table:

Subject_ID   Date    Obs    Day

10       01/02/21    1     Day1

10      01/03/21     1    Day2

10      01/04/21    -      Day 3

10      01/05/21     2    Day 4

10     01/06/21   -       Day 5

10     01/07/21   -      Day 6

10     01/08/21   -      Day 7

10     01/09/21   -     Day 8

etc    upto Day 14 

11      02/05/21     4    Day1

11      02/06/21     3    Day2

11      02/07/21     2    Day3

etc upto Day 14

12      03/05/21    1    Day 1

12      03/06/21    5   Day 2

 

I hope this helps.

MarkusWeick
Barite | Level 11

Hi @ppl,

I assume that you have a table (work.intervals) with the start and end days per subject, e.g. with the following three columns:

Subject_ID   Start_Date   Enddate

You could create a new table (work.day_order) for the day order (i.e. day1, day2.....day14) per subject with the following three columns via a DO loop in a data step

Subject_ID   Day_order   Date

Then you could merge the day order table with your observation table (work.observations) via the column “Date”.

But that is just one of several possibilites.

Please keep the community friendly.
Like posts you agree with or like. Mark helpful answers as “accepted solutions”. Generally have a look at https://communities.sas.com/t5/Getting-Started/tkb-p/community_articles
MarkusWeick
Barite | Level 11

A result like this

Subject_ID

date

day_order

Obs

10

01/02/2021

Day 1

1

10

01/03/2021

Day 2

1

10

01/04/2021

Day 3

.

10

01/05/2021

Day 4

2

10

01/06/2021

Day 5

.

10

01/07/2021

Day 6

.

10

01/08/2021

Day 7

.

10

01/09/2021

Day 8

.

 

can be produced with the following code (I created the input data in excel):

libname Intval xlsx "/home/u59608141/sasuser.v94/Intervals.xlsx";

libname Observat xlsx "/home/u59608141/sasuser.v94/Observations.xlsx";

 

data day_order;

set intval.Tabelle1;

date=start_date-1;

Do i=1 to 14;

day_order = cat("Day ",i);

date=date+1;

Format date MMDDYY10.;

output;

end;

keep Subject_ID day_order date;

run;

 

data Observat;

set Observat.Observations;

Datenum=input(Date,MMDDYY8.);

drop date;

run;

 

data result;

merge day_order observat (rename=(datenum=date));

by Subject_ID date;

run;

 

proc print data=result noobs;

run;

Please keep the community friendly.
Like posts you agree with or like. Mark helpful answers as “accepted solutions”. Generally have a look at https://communities.sas.com/t5/Getting-Started/tkb-p/community_articles
ppl
Fluorite | Level 6 ppl
Fluorite | Level 6

Thanks for your response. Unfortunately, I do not have a start or end date column. Just dates arranged in chronological order with a variable collected at that date. 

MarkusWeick
Barite | Level 11

I just noticed, that you don't need an enddate, if you assume that the intervall is always 14 days.

Please keep the community friendly.
Like posts you agree with or like. Mark helpful answers as “accepted solutions”. Generally have a look at https://communities.sas.com/t5/Getting-Started/tkb-p/community_articles
MarkusWeick
Barite | Level 11

For the start date you might use the first. Column as above.

 

Please keep the community friendly.
Like posts you agree with or like. Mark helpful answers as “accepted solutions”. Generally have a look at https://communities.sas.com/t5/Getting-Started/tkb-p/community_articles
Ksharp
Super User
data have;
input Subject_ID   Date   : mmddyy10. Obs;
format Date  mmddyy10.;
cards;
10       01/02/21    1
10      01/03/21     1
10      01/05/21     2
11      02/05/21     4
11      02/06/21     3
11      02/07/21     2
12      03/05/21    1
12      03/06/21    5
;
 
data want;
merge have have(keep=Subject_ID   Date rename=(Subject_ID=_Subject_ID   Date=_Date) firstobs=2);
output;
if Subject_ID=_Subject_ID then do;
do date=date+1 to _date-1;
  obs=.;output;
end;
end;
drop _:;
run;
Tom
Super User Tom
Super User

So you want to insure that each subject has at least 14 days/observations?

You need to know the first/last day that was actually collected. Or the first expected day if it is possible the first day is missing.  For now let's just compute that from your existing dataset.

So first let's convert your listing into an actual SAS dataset so we have something to test our program against.

data have ;
  input Subject_ID  Date :yymmdd. Var1 ;
  format date yymmdd10.;
cards;
10 2021-01-02 1
10 2021-01-03 1
10 2021-01-05 2
11 2021-02-05 4
11 2021-02-06 3
11 2021-02-07 2
12 2021-03-05 1
12 2021-03-06 5 
;

Note: To avoid confusing half of your audience do not display dates in either MDY or DMY order. Use YMD order or the DATE9. format instead.

 

Now calculate the min/max date per subject and then generate a dataset with all of the observations for each subject. Then combine it with your actual data.

proc means data=have;
  by subject_id;
  var date;
  output out=skeleton min=start max=end;
run;

data skeleton;
  set skeleton;
  end=max(end,start+13);
  do date=start to end;
    day=date-start+1;
    output;
  end;
  format date yymmdd10.;
  keep subject_id date day ;
run;

data want;
  merge skeleton have;
  by subject_id date;
run;

Result:

      Subject_
Obs       ID             date    day    Var1

  1       10       2021-01-02      1      1
  2       10       2021-01-03      2      1
  3       10       2021-01-04      3      .
  4       10       2021-01-05      4      2
  5       10       2021-01-06      5      .
  6       10       2021-01-07      6      .
  7       10       2021-01-08      7      .
  8       10       2021-01-09      8      .
  9       10       2021-01-10      9      .
 10       10       2021-01-11     10      .
 11       10       2021-01-12     11      .
 12       10       2021-01-13     12      .
 13       10       2021-01-14     13      .
 14       10       2021-01-15     14      .
 15       11       2021-02-05      1      4
 16       11       2021-02-06      2      3
 17       11       2021-02-07      3      2
 18       11       2021-02-08      4      .
 19       11       2021-02-09      5      .
 20       11       2021-02-10      6      .
 21       11       2021-02-11      7      .
 22       11       2021-02-12      8      .
 23       11       2021-02-13      9      .
 24       11       2021-02-14     10      .
 25       11       2021-02-15     11      .
 26       11       2021-02-16     12      .
 27       11       2021-02-17     13      .
 28       11       2021-02-18     14      .
 29       12       2021-03-05      1      1
 30       12       2021-03-06      2      5
 31       12       2021-03-07      3      .
 32       12       2021-03-08      4      .
 33       12       2021-03-09      5      .
 34       12       2021-03-10      6      .
...

 

ppl
Fluorite | Level 6 ppl
Fluorite | Level 6

Thank you so much. Appreciate your help in teaching me this one.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 1356 views
  • 2 likes
  • 7 in conversation