Solved: First observation carry backward

ari · Posted 02-08-2017 06:41 AM

Have:

data have;
input
ID date$ visit $ visitnum value;
datalines;
1 5/6/1998 . .
1 5/6/1998 Baseline 1 2
1 5/6/1998 Baseline 1 2
1 5/9/1998 . 2
1 5/13/1998 . .
1 5/13/1998 Week 1 3 2
1 5/13/1998 Week 1 3 2
1 5/15/1998 . .
1 5/19/1998 Week 2 4 3
1 5/19/1998 Week 2 4 3
1 5/27/1998 Week 3 5 3
1 5/27/1998 Week 3 5 3
1 6/2/1998 . .
1 6/3/1998 Week 4 6 2
1 6/3/1998 Week 4 6 2
1 6/5/1998 . .
1 6/9/1998 Week 5 7 .
1 6/9/1998 Week 5 7 .
1 6/9/1998 . .
1 6/12/1998 . .
1 6/16/1998 . .
1 6/17/1998 Week 6 8 2
1 6/17/1998 Week 6 8 2
1 6/24/1998 Week 4 9 3
1 6/24/1998 Week 4 9 3
1 7/1/1998 Week 4 10 1
1 7/1/1998 Week 4 10 1
1 7/7/1998 Week 4 11 .
1 7/7/1998 Week 4 11 .
1 7/15/1998 Week 4 12 .
1 7/15/1998 Week 4 12 .
1 7/25/1998 Week 7 13 .
1 7/25/1998 Week 7 13 .
1 7/29/1998 Week 7 14 .
1 7/29/1998 Week 7 14 .
1 8/5/1998 Week 7 15 .
1 8/5/1998 Week 7 15 .

; run;

Want:

ID	DATE	visit	VISITNUM	Value
1	5/6/1998		.	2
1	5/6/1998	Baseline	1	2
1	5/6/1998	Baseline	1	2
1	5/9/1998		.	2
1	5/13/1998		.	.
1	5/13/1998	Week 1	3	2
1	5/13/1998	Week 1	3	2
1	5/15/1998		.	.
1	5/19/1998	Week 2	4	3
1	5/19/1998	Week 2	4	3
1	5/27/1998	Week 3	5	3
1	5/27/1998	Week 3	5	3
1	6/2/1998		.	.
1	6/3/1998	Week 4	6	2
1	6/3/1998	Week 4	6	2
1	6/5/1998		.	.
1	6/9/1998	Week 5	7	.
1	6/9/1998	Week 5	7	.
1	6/9/1998		.	.
1	6/12/1998		.	.
1	6/16/1998		.	.
1	6/17/1998	Week 6	8	2
1	6/17/1998	Week 6	8	2
1	6/24/1998	Week 4	9	3
1	6/24/1998	Week 4	9	3
1	7/1/1998	Week 4	10	1
1	7/1/1998	Week 4	10	1
1	7/7/1998	Week 4	11	.
1	7/7/1998	Week 4	11	.
1	7/15/1998	Week 4	12	.
1	7/15/1998	Week 4	12	.
1	7/25/1998	Week 7	13	.
1	7/25/1998	Week 7	13	.
1	7/29/1998	Week 7	14	.
1	7/29/1998	Week 7	14	.
1	8/5/1998	Week 7	15	.
1	8/5/1998	Week 7	15	.

Trying to impute the missing value by carrying the observation reported at visit = "baseline" backward. Any leads to achieve this in sas?

Thank you

mkeintz · Posted 02-08-2017 10:56 AM

Your data is apparently sorted by ID/DATE, and
the first date is the only date with any observations having visit="BASELINE".
For that date, and that date only, you want to replace all instances of VALUE=. (which also have visit=' '), with the non-missing value found in the baseline record:

I revised your data HAVE to properly read the date as a true data value, and the visit value:

data have;
input ID date :mmddyy10. 
      @13 visit $char8. visitnum value;
format date date9.;
datalines;
1 5/6/1998           . .
1 5/6/1998  Baseline 1 2
1 5/6/1998  Baseline 1 2
1 5/9/1998           . 2
1 5/13/1998          . .
1 5/13/1998 Week 1   3 2
1 5/13/1998 Week 1   3 2
1 5/15/1998          . .
1 5/19/1998 Week 2   4 3
1 5/19/1998 Week 2   4 3
1 5/27/1998 Week 3   5 3
1 5/27/1998 Week 3   5 3
1 6/2/1998           . .
1 6/3/1998  Week 4   6 2
1 6/3/1998  Week 4   6 2
1 6/5/1998           . .
1 6/9/1998  Week 5   7 .
1 6/9/1998  Week 5   7 .
1 6/9/1998           . .
1 6/12/1998          . .
1 6/16/1998          . .
1 6/17/1998 Week 6   8 2
1 6/17/1998 Week 6   8 2
1 6/24/1998 Week 4   9 3
1 6/24/1998 Week 4   9 3
1 7/1/1998  Week 4   10 1
1 7/1/1998  Week 4   10 1
1 7/7/1998  Week 4   11 .
1 7/7/1998  Week 4   11 .
1 7/15/1998 Week 4   12 .
1 7/15/1998 Week 4   12 .
1 7/25/1998 Week 7   13 .
1 7/25/1998 Week 7   13 .
1 7/29/1998 Week 7   14 .
1 7/29/1998 Week 7   14 .
1 8/5/1998  Week 7   15 .
1 8/5/1998  Week 7   15 .
; run;


data want;
  set have (in=inbase where=(visit='Baseline'))
      have (in=inkeep);
  by id date;
  retain basevalue;
  if first.id and inbase then basevalue=value;
  else if first.date then basevalue=.;
  if inkeep;
  if value=. then value=basevalue;
run;

Notes:

The SET has two input streams: (a) all the baseline records for a given id/date, followed by all the records for the same id/date. Each of these streams has an "in dummy".
If that SET statement produces a first record for the ID that is a baseline, it save the value in basevalue.
but if a subsequent date has been encountered, then reset basevalue to missing, so that it does not prompt an override of subsequent VALUE values
further processing only for the INKEEP records.
replace missinug VALUE with BASEVALUE. But note that basevalue is non-missing only for the first date, so it does not erroneously overwrite VALUE for other dates.

This is not exactly a "look ahead" with a subsequent carry back. Instead it interleave two separate streams of the sorted data. One stream to get the baseline value, but is otherwise ignored. The other stream has every record output, and is conditionally modified by the baseline value.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

RW9 · Posted 02-08-2017 06:45 AM

Post test data in the form of a datastep - this code isn't tested as I am not typing in test data:

proc sql;
  create table WANT as
  select  A.ID,
            A.VISIT,
            A.VISITNUM,
            COALESCE(A.VALUE,B.VALUE) as VALUE
  from    HAVE A
  left join (select distinct ID,VALUE from HAVE where VISIT="Baseline") B
  on       A.ID=B.ID;
quit;

ari · Posted 02-08-2017 07:23 AM

@RW9: Thanks, I have posted the test data in datastep.

Actual dataset contains several visits and the test code does not seem to work properly if there are several visits than the baseline.

Thanks,

RW9 · Posted 02-08-2017 08:00 AM

You want dataset doesn't seem to match the test data. See 5/9/1988 - Visitnum=2 in test, nothing in output. First I would suggest sorting out Visit and Visitnumber. These should be present in all records - this is standard across the industry and I am suprised you have data with no identifier. It looks like you want to base it on date (as above I would recommend sorting out visit first) which you could do with:

This works, however as your date is not really a date (its just a character according to your test data) it wont sort properly. Simple answer make date variables actual date values.

data have;
  infile datalines missover;
  input ID date$ visit $ visitnum value;
datalines;
1 5/6/1998 . .
1 5/6/1998 Baseline 1 2
1 5/6/1998 Baseline 1 2
1 5/9/1998 . 2
1 5/13/1998 . .
1 5/13/1998 Week 1 3 2
1 5/13/1998 Week 1 3 2
1 5/15/1998 . .
1 5/19/1998 Week 2 4 3
1 5/19/1998 Week 2 4 3
1 5/27/1998 Week 3 5 3
1 5/27/1998 Week 3 5 3
1 6/2/1998 . .
1 6/3/1998 Week 4 6 2
1 6/3/1998 Week 4 6 2
1 6/5/1998 . .
1 6/9/1998 Week 5 7 .
1 6/9/1998 Week 5 7 .
1 6/9/1998 . .
1 6/12/1998 . .
1 6/16/1998 . .
1 6/17/1998 Week 6 8 2
1 6/17/1998 Week 6 8 2
1 6/24/1998 Week 4 9 3
1 6/24/1998 Week 4 9 3
1 7/1/1998 Week 4 10 1
1 7/1/1998 Week 4 10 1
1 7/7/1998 Week 4 11 .
1 7/7/1998 Week 4 11 .
1 7/15/1998 Week 4 12 .
1 7/15/1998 Week 4 12 .
1 7/25/1998 Week 7 13 .
1 7/25/1998 Week 7 13 .
1 7/29/1998 Week 7 14 .
1 7/29/1998 Week 7 14 .
1 8/5/1998 Week 7 15 .
1 8/5/1998 Week 7 15 .
; run;

proc sql;
  create table WANT as
  select  A.ID,
          A.DATE,
          A.VISIT,
          A.VISITNUM,
          COALESCE(A.VALUE,B.VALUE) as VALUE
  from    HAVE A
  left join (select distinct ID,DATE,VALUE from HAVE where VISIT="Baseline") B
  on       A.ID=B.ID
  and     A.DATE=B.DATE
  order by A.ID,B.DATE;
quit;

mkeintz · Posted 02-08-2017 10:56 AM

Your data is apparently sorted by ID/DATE, and
the first date is the only date with any observations having visit="BASELINE".
For that date, and that date only, you want to replace all instances of VALUE=. (which also have visit=' '), with the non-missing value found in the baseline record:

I revised your data HAVE to properly read the date as a true data value, and the visit value:

data have;
input ID date :mmddyy10. 
      @13 visit $char8. visitnum value;
format date date9.;
datalines;
1 5/6/1998           . .
1 5/6/1998  Baseline 1 2
1 5/6/1998  Baseline 1 2
1 5/9/1998           . 2
1 5/13/1998          . .
1 5/13/1998 Week 1   3 2
1 5/13/1998 Week 1   3 2
1 5/15/1998          . .
1 5/19/1998 Week 2   4 3
1 5/19/1998 Week 2   4 3
1 5/27/1998 Week 3   5 3
1 5/27/1998 Week 3   5 3
1 6/2/1998           . .
1 6/3/1998  Week 4   6 2
1 6/3/1998  Week 4   6 2
1 6/5/1998           . .
1 6/9/1998  Week 5   7 .
1 6/9/1998  Week 5   7 .
1 6/9/1998           . .
1 6/12/1998          . .
1 6/16/1998          . .
1 6/17/1998 Week 6   8 2
1 6/17/1998 Week 6   8 2
1 6/24/1998 Week 4   9 3
1 6/24/1998 Week 4   9 3
1 7/1/1998  Week 4   10 1
1 7/1/1998  Week 4   10 1
1 7/7/1998  Week 4   11 .
1 7/7/1998  Week 4   11 .
1 7/15/1998 Week 4   12 .
1 7/15/1998 Week 4   12 .
1 7/25/1998 Week 7   13 .
1 7/25/1998 Week 7   13 .
1 7/29/1998 Week 7   14 .
1 7/29/1998 Week 7   14 .
1 8/5/1998  Week 7   15 .
1 8/5/1998  Week 7   15 .
; run;


data want;
  set have (in=inbase where=(visit='Baseline'))
      have (in=inkeep);
  by id date;
  retain basevalue;
  if first.id and inbase then basevalue=value;
  else if first.date then basevalue=.;
  if inkeep;
  if value=. then value=basevalue;
run;

Notes:

The SET has two input streams: (a) all the baseline records for a given id/date, followed by all the records for the same id/date. Each of these streams has an "in dummy".
If that SET statement produces a first record for the ID that is a baseline, it save the value in basevalue.
but if a subsequent date has been encountered, then reset basevalue to missing, so that it does not prompt an override of subsequent VALUE values
further processing only for the INKEEP records.
replace missinug VALUE with BASEVALUE. But note that basevalue is non-missing only for the first date, so it does not erroneously overwrite VALUE for other dates.

This is not exactly a "look ahead" with a subsequent carry back. Instead it interleave two separate streams of the sorted data. One stream to get the baseline value, but is otherwise ignored. The other stream has every record output, and is conditionally modified by the baseline value.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ari · Posted 02-09-2017 03:09 AM

@mkeintz: Thank you. Appreciate a lot for the explanation about how does the code work. Very helpful to apply on different scenarios.

First observation carry backward

Re: First observation carry backward

Re: First observation carry backward

Re: First observation carry backward

Re: First observation carry backward

Re: First observation carry backward

Re: First observation carry backward

Classroom Training Available!