BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ari
Quartz | Level 8 ari
Quartz | Level 8

Have:

 

data have;
input
ID date$ visit $ visitnum value;
datalines;
1 5/6/1998 . .
1 5/6/1998 Baseline 1 2
1 5/6/1998 Baseline 1 2
1 5/9/1998 . 2
1 5/13/1998 . .
1 5/13/1998 Week 1 3 2
1 5/13/1998 Week 1 3 2
1 5/15/1998 . .
1 5/19/1998 Week 2 4 3
1 5/19/1998 Week 2 4 3
1 5/27/1998 Week 3 5 3
1 5/27/1998 Week 3 5 3
1 6/2/1998 . .
1 6/3/1998 Week 4 6 2
1 6/3/1998 Week 4 6 2
1 6/5/1998 . .
1 6/9/1998 Week 5 7 .
1 6/9/1998 Week 5 7 .
1 6/9/1998 . .
1 6/12/1998 . .
1 6/16/1998 . .
1 6/17/1998 Week 6 8 2
1 6/17/1998 Week 6 8 2
1 6/24/1998 Week 4 9 3
1 6/24/1998 Week 4 9 3
1 7/1/1998 Week 4 10 1
1 7/1/1998 Week 4 10 1
1 7/7/1998 Week 4 11 .
1 7/7/1998 Week 4 11 .
1 7/15/1998 Week 4 12 .
1 7/15/1998 Week 4 12 .
1 7/25/1998 Week 7 13 .
1 7/25/1998 Week 7 13 .
1 7/29/1998 Week 7 14 .
1 7/29/1998 Week 7 14 .
1 8/5/1998 Week 7 15 .
1 8/5/1998 Week 7 15 .

; run;


 

Want:

IDDATEvisitVISITNUMValue
15/6/1998 .2
15/6/1998Baseline12
15/6/1998Baseline12
15/9/1998 .2
15/13/1998 ..
15/13/1998 Week 132
15/13/1998 Week 132
15/15/1998 ..
15/19/1998 Week 243
15/19/1998 Week 243
15/27/1998 Week 353
15/27/1998 Week 353
16/2/1998 ..
16/3/1998 Week 462
16/3/1998 Week 462
16/5/1998 ..
16/9/1998 Week 57.
16/9/1998 Week 57.
16/9/1998 ..
16/12/1998 ..
16/16/1998 ..
16/17/1998 Week 682
16/17/1998 Week 682
16/24/1998 Week 493
16/24/1998 Week 493
17/1/1998 Week 4101
17/1/1998 Week 4101
17/7/1998 Week 411.
17/7/1998 Week 411.
17/15/1998 Week 412.
17/15/1998 Week 412.
17/25/1998 Week 713.
17/25/1998 Week 713.
17/29/1998 Week 714.
17/29/1998 Week 714.
18/5/1998 Week 715.
18/5/1998 Week 715.

Trying to impute the missing value by carrying the observation reported at visit = "baseline"  backward. Any leads to achieve this in sas?

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star
  1. Your data is apparently sorted by ID/DATE, and
  2. the first date is the only date with any observations having visit="BASELINE". 
  3. For that date, and that date only, you want to replace all instances of VALUE=. (which also have visit=' '),  with the non-missing value found in the baseline record:

I revised your data HAVE to properly read the date as a true data value, and the visit value:

 

data have;
input ID date :mmddyy10. 
      @13 visit $char8. visitnum value;
format date date9.;
datalines;
1 5/6/1998           . .
1 5/6/1998  Baseline 1 2
1 5/6/1998  Baseline 1 2
1 5/9/1998           . 2
1 5/13/1998          . .
1 5/13/1998 Week 1   3 2
1 5/13/1998 Week 1   3 2
1 5/15/1998          . .
1 5/19/1998 Week 2   4 3
1 5/19/1998 Week 2   4 3
1 5/27/1998 Week 3   5 3
1 5/27/1998 Week 3   5 3
1 6/2/1998           . .
1 6/3/1998  Week 4   6 2
1 6/3/1998  Week 4   6 2
1 6/5/1998           . .
1 6/9/1998  Week 5   7 .
1 6/9/1998  Week 5   7 .
1 6/9/1998           . .
1 6/12/1998          . .
1 6/16/1998          . .
1 6/17/1998 Week 6   8 2
1 6/17/1998 Week 6   8 2
1 6/24/1998 Week 4   9 3
1 6/24/1998 Week 4   9 3
1 7/1/1998  Week 4   10 1
1 7/1/1998  Week 4   10 1
1 7/7/1998  Week 4   11 .
1 7/7/1998  Week 4   11 .
1 7/15/1998 Week 4   12 .
1 7/15/1998 Week 4   12 .
1 7/25/1998 Week 7   13 .
1 7/25/1998 Week 7   13 .
1 7/29/1998 Week 7   14 .
1 7/29/1998 Week 7   14 .
1 8/5/1998  Week 7   15 .
1 8/5/1998  Week 7   15 .
; run;


data want;
  set have (in=inbase where=(visit='Baseline'))
      have (in=inkeep);
  by id date;
  retain basevalue;
  if first.id and inbase then basevalue=value;
  else if first.date then basevalue=.;
  if inkeep;
  if value=. then value=basevalue;
run;

 

Notes:

  1. The SET has two input streams: (a) all the baseline records for a given id/date, followed by all the records for the same id/date.  Each of these streams has an "in dummy".
  2. If that SET statement produces a first record for the ID that is a baseline, it save the value in basevalue.
  3. but if a subsequent date has been encountered, then reset basevalue to missing, so that it does not prompt an override of subsequent VALUE values
  4. further processing only for the INKEEP records.
  5. replace missinug VALUE with BASEVALUE.  But note that basevalue is non-missing only for the first date, so it does not erroneously overwrite VALUE for other dates.

 

This is not exactly a "look ahead" with a subsequent carry back.  Instead it interleave two separate streams of the sorted data.  One stream to get the baseline value, but is otherwise ignored.  The other stream has every record output, and is conditionally modified by the baseline value.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

5 REPLIES 5
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Post test data in the form of a datastep - this code isn't tested as I am not typing in test data:

proc sql;
  create table WANT as
  select  A.ID,
            A.VISIT,
            A.VISITNUM,
            COALESCE(A.VALUE,B.VALUE) as VALUE
  from    HAVE A
  left join (select distinct ID,VALUE from HAVE where VISIT="Baseline") B
  on       A.ID=B.ID;
quit;
ari
Quartz | Level 8 ari
Quartz | Level 8

@RW9: Thanks, I have posted the test data in datastep.

              Actual dataset contains several visits and the test code does not seem to work properly if there are several visits than the baseline.

 

Thanks,

RW9
Diamond | Level 26 RW9
Diamond | Level 26

You want dataset doesn't seem to match the test data.  See 5/9/1988 - Visitnum=2 in test, nothing in output.  First I would suggest sorting out Visit and Visitnumber.  These should be present in all records - this is standard across the industry and I am suprised you have data with no identifier.  It looks like you want to base it on date (as above I would recommend sorting out visit first) which you could do with:

This works, however as your date is not really a date (its just a character according to your test data) it wont sort properly.  Simple answer make date variables actual date values.

data have;
  infile datalines missover;
  input ID date$ visit $ visitnum value;
datalines;
1 5/6/1998 . .
1 5/6/1998 Baseline 1 2
1 5/6/1998 Baseline 1 2
1 5/9/1998 . 2
1 5/13/1998 . .
1 5/13/1998 Week 1 3 2
1 5/13/1998 Week 1 3 2
1 5/15/1998 . .
1 5/19/1998 Week 2 4 3
1 5/19/1998 Week 2 4 3
1 5/27/1998 Week 3 5 3
1 5/27/1998 Week 3 5 3
1 6/2/1998 . .
1 6/3/1998 Week 4 6 2
1 6/3/1998 Week 4 6 2
1 6/5/1998 . .
1 6/9/1998 Week 5 7 .
1 6/9/1998 Week 5 7 .
1 6/9/1998 . .
1 6/12/1998 . .
1 6/16/1998 . .
1 6/17/1998 Week 6 8 2
1 6/17/1998 Week 6 8 2
1 6/24/1998 Week 4 9 3
1 6/24/1998 Week 4 9 3
1 7/1/1998 Week 4 10 1
1 7/1/1998 Week 4 10 1
1 7/7/1998 Week 4 11 .
1 7/7/1998 Week 4 11 .
1 7/15/1998 Week 4 12 .
1 7/15/1998 Week 4 12 .
1 7/25/1998 Week 7 13 .
1 7/25/1998 Week 7 13 .
1 7/29/1998 Week 7 14 .
1 7/29/1998 Week 7 14 .
1 8/5/1998 Week 7 15 .
1 8/5/1998 Week 7 15 .
; run;

proc sql;
  create table WANT as
  select  A.ID,
          A.DATE,
          A.VISIT,
          A.VISITNUM,
          COALESCE(A.VALUE,B.VALUE) as VALUE
  from    HAVE A
  left join (select distinct ID,DATE,VALUE from HAVE where VISIT="Baseline") B
  on       A.ID=B.ID
  and     A.DATE=B.DATE
  order by A.ID,B.DATE;
quit;
mkeintz
PROC Star
  1. Your data is apparently sorted by ID/DATE, and
  2. the first date is the only date with any observations having visit="BASELINE". 
  3. For that date, and that date only, you want to replace all instances of VALUE=. (which also have visit=' '),  with the non-missing value found in the baseline record:

I revised your data HAVE to properly read the date as a true data value, and the visit value:

 

data have;
input ID date :mmddyy10. 
      @13 visit $char8. visitnum value;
format date date9.;
datalines;
1 5/6/1998           . .
1 5/6/1998  Baseline 1 2
1 5/6/1998  Baseline 1 2
1 5/9/1998           . 2
1 5/13/1998          . .
1 5/13/1998 Week 1   3 2
1 5/13/1998 Week 1   3 2
1 5/15/1998          . .
1 5/19/1998 Week 2   4 3
1 5/19/1998 Week 2   4 3
1 5/27/1998 Week 3   5 3
1 5/27/1998 Week 3   5 3
1 6/2/1998           . .
1 6/3/1998  Week 4   6 2
1 6/3/1998  Week 4   6 2
1 6/5/1998           . .
1 6/9/1998  Week 5   7 .
1 6/9/1998  Week 5   7 .
1 6/9/1998           . .
1 6/12/1998          . .
1 6/16/1998          . .
1 6/17/1998 Week 6   8 2
1 6/17/1998 Week 6   8 2
1 6/24/1998 Week 4   9 3
1 6/24/1998 Week 4   9 3
1 7/1/1998  Week 4   10 1
1 7/1/1998  Week 4   10 1
1 7/7/1998  Week 4   11 .
1 7/7/1998  Week 4   11 .
1 7/15/1998 Week 4   12 .
1 7/15/1998 Week 4   12 .
1 7/25/1998 Week 7   13 .
1 7/25/1998 Week 7   13 .
1 7/29/1998 Week 7   14 .
1 7/29/1998 Week 7   14 .
1 8/5/1998  Week 7   15 .
1 8/5/1998  Week 7   15 .
; run;


data want;
  set have (in=inbase where=(visit='Baseline'))
      have (in=inkeep);
  by id date;
  retain basevalue;
  if first.id and inbase then basevalue=value;
  else if first.date then basevalue=.;
  if inkeep;
  if value=. then value=basevalue;
run;

 

Notes:

  1. The SET has two input streams: (a) all the baseline records for a given id/date, followed by all the records for the same id/date.  Each of these streams has an "in dummy".
  2. If that SET statement produces a first record for the ID that is a baseline, it save the value in basevalue.
  3. but if a subsequent date has been encountered, then reset basevalue to missing, so that it does not prompt an override of subsequent VALUE values
  4. further processing only for the INKEEP records.
  5. replace missinug VALUE with BASEVALUE.  But note that basevalue is non-missing only for the first date, so it does not erroneously overwrite VALUE for other dates.

 

This is not exactly a "look ahead" with a subsequent carry back.  Instead it interleave two separate streams of the sorted data.  One stream to get the baseline value, but is otherwise ignored.  The other stream has every record output, and is conditionally modified by the baseline value.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ari
Quartz | Level 8 ari
Quartz | Level 8

@mkeintz: Thank you.  Appreciate a  lot for the explanation about how does the code work. Very helpful to apply on different scenarios.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1853 views
  • 1 like
  • 3 in conversation