DATA Step, Macro, Functions and more

First observation carry backward

Accepted Solution Solved
Reply
Frequent Contributor
Frequent Contributor
Posts: 103
Accepted Solution

First observation carry backward

[ Edited ]

Have:

 

data have;
input
ID date$ visit $ visitnum value;
datalines;
1 5/6/1998 . .
1 5/6/1998 Baseline 1 2
1 5/6/1998 Baseline 1 2
1 5/9/1998 . 2
1 5/13/1998 . .
1 5/13/1998 Week 1 3 2
1 5/13/1998 Week 1 3 2
1 5/15/1998 . .
1 5/19/1998 Week 2 4 3
1 5/19/1998 Week 2 4 3
1 5/27/1998 Week 3 5 3
1 5/27/1998 Week 3 5 3
1 6/2/1998 . .
1 6/3/1998 Week 4 6 2
1 6/3/1998 Week 4 6 2
1 6/5/1998 . .
1 6/9/1998 Week 5 7 .
1 6/9/1998 Week 5 7 .
1 6/9/1998 . .
1 6/12/1998 . .
1 6/16/1998 . .
1 6/17/1998 Week 6 8 2
1 6/17/1998 Week 6 8 2
1 6/24/1998 Week 4 9 3
1 6/24/1998 Week 4 9 3
1 7/1/1998 Week 4 10 1
1 7/1/1998 Week 4 10 1
1 7/7/1998 Week 4 11 .
1 7/7/1998 Week 4 11 .
1 7/15/1998 Week 4 12 .
1 7/15/1998 Week 4 12 .
1 7/25/1998 Week 7 13 .
1 7/25/1998 Week 7 13 .
1 7/29/1998 Week 7 14 .
1 7/29/1998 Week 7 14 .
1 8/5/1998 Week 7 15 .
1 8/5/1998 Week 7 15 .

; run;


 

Want:

IDDATEvisitVISITNUMValue
15/6/1998 .2
15/6/1998Baseline12
15/6/1998Baseline12
15/9/1998 .2
15/13/1998 ..
15/13/1998 Week 132
15/13/1998 Week 132
15/15/1998 ..
15/19/1998 Week 243
15/19/1998 Week 243
15/27/1998 Week 353
15/27/1998 Week 353
16/2/1998 ..
16/3/1998 Week 462
16/3/1998 Week 462
16/5/1998 ..
16/9/1998 Week 57.
16/9/1998 Week 57.
16/9/1998 ..
16/12/1998 ..
16/16/1998 ..
16/17/1998 Week 682
16/17/1998 Week 682
16/24/1998 Week 493
16/24/1998 Week 493
17/1/1998 Week 4101
17/1/1998 Week 4101
17/7/1998 Week 411.
17/7/1998 Week 411.
17/15/1998 Week 412.
17/15/1998 Week 412.
17/25/1998 Week 713.
17/25/1998 Week 713.
17/29/1998 Week 714.
17/29/1998 Week 714.
18/5/1998 Week 715.
18/5/1998 Week 715.

Trying to impute the missing value by carrying the observation reported at visit = "baseline"  backward. Any leads to achieve this in sas?

 

Thank you


Accepted Solutions
Solution
‎02-09-2017 03:09 AM
Trusted Advisor
Posts: 1,022

Re: First observation carry backward

[ Edited ]
  1. Your data is apparently sorted by ID/DATE, and
  2. the first date is the only date with any observations having visit="BASELINE". 
  3. For that date, and that date only, you want to replace all instances of VALUE=. (which also have visit=' '),  with the non-missing value found in the baseline record:

I revised your data HAVE to properly read the date as a true data value, and the visit value:

 

data have;
input ID date :mmddyy10. 
      @13 visit $char8. visitnum value;
format date date9.;
datalines;
1 5/6/1998           . .
1 5/6/1998  Baseline 1 2
1 5/6/1998  Baseline 1 2
1 5/9/1998           . 2
1 5/13/1998          . .
1 5/13/1998 Week 1   3 2
1 5/13/1998 Week 1   3 2
1 5/15/1998          . .
1 5/19/1998 Week 2   4 3
1 5/19/1998 Week 2   4 3
1 5/27/1998 Week 3   5 3
1 5/27/1998 Week 3   5 3
1 6/2/1998           . .
1 6/3/1998  Week 4   6 2
1 6/3/1998  Week 4   6 2
1 6/5/1998           . .
1 6/9/1998  Week 5   7 .
1 6/9/1998  Week 5   7 .
1 6/9/1998           . .
1 6/12/1998          . .
1 6/16/1998          . .
1 6/17/1998 Week 6   8 2
1 6/17/1998 Week 6   8 2
1 6/24/1998 Week 4   9 3
1 6/24/1998 Week 4   9 3
1 7/1/1998  Week 4   10 1
1 7/1/1998  Week 4   10 1
1 7/7/1998  Week 4   11 .
1 7/7/1998  Week 4   11 .
1 7/15/1998 Week 4   12 .
1 7/15/1998 Week 4   12 .
1 7/25/1998 Week 7   13 .
1 7/25/1998 Week 7   13 .
1 7/29/1998 Week 7   14 .
1 7/29/1998 Week 7   14 .
1 8/5/1998  Week 7   15 .
1 8/5/1998  Week 7   15 .
; run;


data want;
  set have (in=inbase where=(visit='Baseline'))
      have (in=inkeep);
  by id date;
  retain basevalue;
  if first.id and inbase then basevalue=value;
  else if first.date then basevalue=.;
  if inkeep;
  if value=. then value=basevalue;
run;

 

Notes:

  1. The SET has two input streams: (a) all the baseline records for a given id/date, followed by all the records for the same id/date.  Each of these streams has an "in dummy".
  2. If that SET statement produces a first record for the ID that is a baseline, it save the value in basevalue.
  3. but if a subsequent date has been encountered, then reset basevalue to missing, so that it does not prompt an override of subsequent VALUE values
  4. further processing only for the INKEEP records.
  5. replace missinug VALUE with BASEVALUE.  But note that basevalue is non-missing only for the first date, so it does not erroneously overwrite VALUE for other dates.

 

This is not exactly a "look ahead" with a subsequent carry back.  Instead it interleave two separate streams of the sorted data.  One stream to get the baseline value, but is otherwise ignored.  The other stream has every record output, and is conditionally modified by the baseline value.

View solution in original post


All Replies
Super User
Super User
Posts: 7,977

Re: First observation carry backward

Post test data in the form of a datastep - this code isn't tested as I am not typing in test data:

proc sql;
  create table WANT as
  select  A.ID,
            A.VISIT,
            A.VISITNUM,
            COALESCE(A.VALUE,B.VALUE) as VALUE
  from    HAVE A
  left join (select distinct ID,VALUE from HAVE where VISIT="Baseline") B
  on       A.ID=B.ID;
quit;
Frequent Contributor
Frequent Contributor
Posts: 103

Re: First observation carry backward

@RW9: Thanks, I have posted the test data in datastep.

              Actual dataset contains several visits and the test code does not seem to work properly if there are several visits than the baseline.

 

Thanks,

Super User
Super User
Posts: 7,977

Re: First observation carry backward

You want dataset doesn't seem to match the test data.  See 5/9/1988 - Visitnum=2 in test, nothing in output.  First I would suggest sorting out Visit and Visitnumber.  These should be present in all records - this is standard across the industry and I am suprised you have data with no identifier.  It looks like you want to base it on date (as above I would recommend sorting out visit first) which you could do with:

This works, however as your date is not really a date (its just a character according to your test data) it wont sort properly.  Simple answer make date variables actual date values.

data have;
  infile datalines missover;
  input ID date$ visit $ visitnum value;
datalines;
1 5/6/1998 . .
1 5/6/1998 Baseline 1 2
1 5/6/1998 Baseline 1 2
1 5/9/1998 . 2
1 5/13/1998 . .
1 5/13/1998 Week 1 3 2
1 5/13/1998 Week 1 3 2
1 5/15/1998 . .
1 5/19/1998 Week 2 4 3
1 5/19/1998 Week 2 4 3
1 5/27/1998 Week 3 5 3
1 5/27/1998 Week 3 5 3
1 6/2/1998 . .
1 6/3/1998 Week 4 6 2
1 6/3/1998 Week 4 6 2
1 6/5/1998 . .
1 6/9/1998 Week 5 7 .
1 6/9/1998 Week 5 7 .
1 6/9/1998 . .
1 6/12/1998 . .
1 6/16/1998 . .
1 6/17/1998 Week 6 8 2
1 6/17/1998 Week 6 8 2
1 6/24/1998 Week 4 9 3
1 6/24/1998 Week 4 9 3
1 7/1/1998 Week 4 10 1
1 7/1/1998 Week 4 10 1
1 7/7/1998 Week 4 11 .
1 7/7/1998 Week 4 11 .
1 7/15/1998 Week 4 12 .
1 7/15/1998 Week 4 12 .
1 7/25/1998 Week 7 13 .
1 7/25/1998 Week 7 13 .
1 7/29/1998 Week 7 14 .
1 7/29/1998 Week 7 14 .
1 8/5/1998 Week 7 15 .
1 8/5/1998 Week 7 15 .
; run;

proc sql;
  create table WANT as
  select  A.ID,
          A.DATE,
          A.VISIT,
          A.VISITNUM,
          COALESCE(A.VALUE,B.VALUE) as VALUE
  from    HAVE A
  left join (select distinct ID,DATE,VALUE from HAVE where VISIT="Baseline") B
  on       A.ID=B.ID
  and     A.DATE=B.DATE
  order by A.ID,B.DATE;
quit;
Solution
‎02-09-2017 03:09 AM
Trusted Advisor
Posts: 1,022

Re: First observation carry backward

[ Edited ]
  1. Your data is apparently sorted by ID/DATE, and
  2. the first date is the only date with any observations having visit="BASELINE". 
  3. For that date, and that date only, you want to replace all instances of VALUE=. (which also have visit=' '),  with the non-missing value found in the baseline record:

I revised your data HAVE to properly read the date as a true data value, and the visit value:

 

data have;
input ID date :mmddyy10. 
      @13 visit $char8. visitnum value;
format date date9.;
datalines;
1 5/6/1998           . .
1 5/6/1998  Baseline 1 2
1 5/6/1998  Baseline 1 2
1 5/9/1998           . 2
1 5/13/1998          . .
1 5/13/1998 Week 1   3 2
1 5/13/1998 Week 1   3 2
1 5/15/1998          . .
1 5/19/1998 Week 2   4 3
1 5/19/1998 Week 2   4 3
1 5/27/1998 Week 3   5 3
1 5/27/1998 Week 3   5 3
1 6/2/1998           . .
1 6/3/1998  Week 4   6 2
1 6/3/1998  Week 4   6 2
1 6/5/1998           . .
1 6/9/1998  Week 5   7 .
1 6/9/1998  Week 5   7 .
1 6/9/1998           . .
1 6/12/1998          . .
1 6/16/1998          . .
1 6/17/1998 Week 6   8 2
1 6/17/1998 Week 6   8 2
1 6/24/1998 Week 4   9 3
1 6/24/1998 Week 4   9 3
1 7/1/1998  Week 4   10 1
1 7/1/1998  Week 4   10 1
1 7/7/1998  Week 4   11 .
1 7/7/1998  Week 4   11 .
1 7/15/1998 Week 4   12 .
1 7/15/1998 Week 4   12 .
1 7/25/1998 Week 7   13 .
1 7/25/1998 Week 7   13 .
1 7/29/1998 Week 7   14 .
1 7/29/1998 Week 7   14 .
1 8/5/1998  Week 7   15 .
1 8/5/1998  Week 7   15 .
; run;


data want;
  set have (in=inbase where=(visit='Baseline'))
      have (in=inkeep);
  by id date;
  retain basevalue;
  if first.id and inbase then basevalue=value;
  else if first.date then basevalue=.;
  if inkeep;
  if value=. then value=basevalue;
run;

 

Notes:

  1. The SET has two input streams: (a) all the baseline records for a given id/date, followed by all the records for the same id/date.  Each of these streams has an "in dummy".
  2. If that SET statement produces a first record for the ID that is a baseline, it save the value in basevalue.
  3. but if a subsequent date has been encountered, then reset basevalue to missing, so that it does not prompt an override of subsequent VALUE values
  4. further processing only for the INKEEP records.
  5. replace missinug VALUE with BASEVALUE.  But note that basevalue is non-missing only for the first date, so it does not erroneously overwrite VALUE for other dates.

 

This is not exactly a "look ahead" with a subsequent carry back.  Instead it interleave two separate streams of the sorted data.  One stream to get the baseline value, but is otherwise ignored.  The other stream has every record output, and is conditionally modified by the baseline value.

Frequent Contributor
Frequent Contributor
Posts: 103

Re: First observation carry backward

@mkeintz: Thank you.  Appreciate a  lot for the explanation about how does the code work. Very helpful to apply on different scenarios.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 217 views
  • 1 like
  • 3 in conversation