BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
michtka
Fluorite | Level 6


Hi everyone,

I got the next dataset:

  data new;

     length subjid $8 day $10 value 8;

     input subjid day value;

     datalines;

     1   baseline 10

     1   week2   12

     1   week4   14

     1   week8   16

     1   week12  12

     2   baseline 10

     2   week2    12

     2   week4    10

     3  baseline  10

     3  week2      3

      3 week8      4

     ;

     run;

and i need to imputedthe missing data using the last observation carrried forward (LOCF) to obtain the (LOCF) dataset:

     1   baseline 10

     1   week2   12

     1   week4   14

     1   week8   16

     1   week12  12

     2   baseline 10

     2   week2    12

     2   week4    10

     2   week8     10

     2   week12   10

     3  baseline  10

     3  week2      3

     3 week4     3

      3 week8      4

    3 week12     4

  

Please, can you use some loop becuae in my real problem I've got 200 subjects.

Thnaks.

V.

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

I am sure there will be slicker approaches, but for now, I am suggesting the following:

1. Use format to set up a proper order for loop, as your original data does not have a native one.

2. Use 'lead' or 'look ahead' technique to define the boundary of the loop.

 

proc format;
	value $seq
		'baseline'=1
		'week2'=2
		'week4'=3
		'week8'=4
		'week12'=5
	;
	value seq
		1='baseline'
		2='week2'
		3='week4'
		4='week8'
		5='week12'
	;
run;

data new;
	length subjid $8 day $10 value 8;
	input subjid day value;
	datalines;
1   baseline 10
1   week2 12
1   week4 14
1   week8 16
1   week12 12
2   baseline 10
2   week2 12
2   week4 10
3   baseline 10
3   week2 3
3   week8 4
;

data new1;
	set new;
	_day=put(day,$seq.);
run;

data want;
	set new1;
	by subjid notsorted;
	set new1(firstobs=2 keep=_day rename=(_day=__day)) new1(obs=1 drop=_all_);
	if not last.subjid then
		do _i=_day to __day-1;
			day=put(_i,seq.);
			output;
		end;
	else
		do _i=_day to 5;
			day=put(_i,seq.);
			output;
		end;
	drop _:;
run;

proc print;
run;

Haikuo

 

@mkeintz writes (on 31Oct2016):

Here it is four years later, and I am looking for some comments on LOCF and came across this interesting thread.   Data Null's suggestion reminded me of the flexibility and general usefulness of PROC SUMMARY.

 

But it also got me to ask whether there is a good one-step solution.  The code below is my answer.  I think it can be called a merge-with-offset-record approach:

 

data new;
     length subjid $8 day $10 value 8;
     input subjid day value;
     datalines;
     1   baseline 10
     1   week2    12
     1   week4    14
     1   week8    16
     1   week12   12
     2   baseline 10
     2   week2    12
     2   week4    10
     3   baseline 10
     3   week2     3
     3   week8     4 
run;

data want (drop=_:);
  merge new 
        new (firstobs=2 keep=subjid day rename=(subjid=_nextsub day=_nextday) );

  do _f=findw("baseline week2 week4 week8 week12",trim(day),' ','e')
        to
        ifn(subjid^=_nextsub,5,findw("baseline week2 week4 week8 week12",
          trim(_nextday),' ','e')-1);
    output; 
    /* do not carry forward baseline values*/
    if day="baseline" then call missing(of value); 
    day=scan("baseline week2 week4 week8 week12",_f+1);
  end; 
run;

 

 

View solution in original post

9 REPLIES 9
Haikuo
Onyx | Level 15

I am sure there will be slicker approaches, but for now, I am suggesting the following:

1. Use format to set up a proper order for loop, as your original data does not have a native one.

2. Use 'lead' or 'look ahead' technique to define the boundary of the loop.

 

proc format;
	value $seq
		'baseline'=1
		'week2'=2
		'week4'=3
		'week8'=4
		'week12'=5
	;
	value seq
		1='baseline'
		2='week2'
		3='week4'
		4='week8'
		5='week12'
	;
run;

data new;
	length subjid $8 day $10 value 8;
	input subjid day value;
	datalines;
1   baseline 10
1   week2 12
1   week4 14
1   week8 16
1   week12 12
2   baseline 10
2   week2 12
2   week4 10
3   baseline 10
3   week2 3
3   week8 4
;

data new1;
	set new;
	_day=put(day,$seq.);
run;

data want;
	set new1;
	by subjid notsorted;
	set new1(firstobs=2 keep=_day rename=(_day=__day)) new1(obs=1 drop=_all_);
	if not last.subjid then
		do _i=_day to __day-1;
			day=put(_i,seq.);
			output;
		end;
	else
		do _i=_day to 5;
			day=put(_i,seq.);
			output;
		end;
	drop _:;
run;

proc print;
run;

Haikuo

 

@mkeintz writes (on 31Oct2016):

Here it is four years later, and I am looking for some comments on LOCF and came across this interesting thread.   Data Null's suggestion reminded me of the flexibility and general usefulness of PROC SUMMARY.

 

But it also got me to ask whether there is a good one-step solution.  The code below is my answer.  I think it can be called a merge-with-offset-record approach:

 

data new;
     length subjid $8 day $10 value 8;
     input subjid day value;
     datalines;
     1   baseline 10
     1   week2    12
     1   week4    14
     1   week8    16
     1   week12   12
     2   baseline 10
     2   week2    12
     2   week4    10
     3   baseline 10
     3   week2     3
     3   week8     4 
run;

data want (drop=_:);
  merge new 
        new (firstobs=2 keep=subjid day rename=(subjid=_nextsub day=_nextday) );

  do _f=findw("baseline week2 week4 week8 week12",trim(day),' ','e')
        to
        ifn(subjid^=_nextsub,5,findw("baseline week2 week4 week8 week12",
          trim(_nextday),' ','e')-1);
    output; 
    /* do not carry forward baseline values*/
    if day="baseline" then call missing(of value); 
    day=scan("baseline week2 week4 week8 week12",_f+1);
  end; 
run;

 

 

michtka
Fluorite | Level 6

Very clever Haiko. It worked. Thanks.

michtka
Fluorite | Level 6

Hi Haikuo, baseline data is not carried forward, i.e if week2 is missing , it will be missing, how we can modify your code to consider this condition? Thanks.

My new dataset is:

data new; 

     length subjid $8 day $10 value 8;

     input subjid day value;

     datalines;

     1   baseline 10

     1   week2   12

     1   week4   14

     1   week8   16

     1   week12  12

     2   baseline 10

     2   week4    8

     3  baseline  10

     3  week2      3

      3 week8      4

     ;

     run;

I want:

  1   baseline 10 

     1   week2   12

     1   week4   14

     1   week8   16

     1   week12  12

     2   baseline 10

     2   week4    8

     2   week8     8

     2   week12   8

     3  baseline  10

     3  week2      3

     3 week4     3

      3 week8      4

    3 week12     4

data_null__
Jade | Level 19

Consider a method that gets SAS to do most of the work.  The only real work here is creating the CLASSDATA, which may already exist in some form already. The OUT(vars) option on the IDGROUP option of the OUTPUT statement can be expanded to include other variable that need to be LOCFed.

data new;
   length subjid $8 day $10 value 8;
  
input subjid day value;
   datalines;
1  baseline 10
1  week2    12
1  week4    14
1  week8    16
1  week12   12
2  baseline 10
2  week2    12
2  week4    10
3  baseline 10
3  week2     3
3  week8     4
;;;;
   run;
data classdata;
   input day $10.;
  
cards;
baseline
week2
week4
week8
week12
;;;;
   run;
proc summary nway data=new classdata=classdata order=data;
   by subjid;
   class day;
   output out=new2(drop=_freq_ _type_) idgroup(out(value)=);
   run;
data new2;
   update new2(obs=0) new2;
   by subjid;
   output;
  
run;

Edit: Baseline is not carried forward.
data new;
   length subjid $8 day $10 value 8;
  
input subjid day value;
   datalines;
1  baseline 10
1  week2    12
1  week4    14
1  week8    16
1  week12   12
2  baseline 10
2  week2    12
2  week4    10
3  baseline 10
3  week2     3
3  week8     4
4  baseline 10
4  week8     4
;;;;
   run;
data classdata;
   input day $10.;
  
cards;
baseline
week2
week4
week8
week12
;;;;
   run;
proc summary nway data=new classdata=classdata order=data;
   by subjid;
   class day;
   output out=new2(drop=_freq_ _type_) idgroup(out(value)=);
   run;
data new3;
   update new2(obs=0) new2;
   by subjid;
   output;
  
if first(day) eq 'b' then call missing(of _all_);
   run;

Message was edited by: data _null_

Haikuo
Onyx | Level 15

'idgroup'! Thanks for sharing, DN!

And being a slacker, I still haven't finished your paper:

http://support.sas.com/resources/papers/proceedings10/102-2010.pdf

Haikuo

art297
Opal | Level 21

: Thanks for making my day!  Nice!  And, if for some reason a variable list is difficult and one can live with a couple of warning statements, they can always use:

   output out=new2(drop=_freq_ _type_) idgroup(out(_all_)=);


methinks your latest use for proc summary will end up getting even more exposure than the proc transpose alternative.


data_null__
Jade | Level 19

You will get WARNINGs.  In Pharma WARNINGs can never be even it they are harmless.  This even applies to some NOTEs.

WARNING: Variable subjid already exists on file WORK.NEW2.

WARNING: Variable day already exists on file WORK.NEW2.

WARNING: The duplicate variables will not be included in the output data set of the output statement number 1

Of course there are other SAS variable lists that could be used or you could generate the specific list with TRANSPOSE and put it in a macro variable with SQL.

mkeintz
PROC Star

Here iit is four years later, and I am looking for some comments on LOCF and came across this interesting thread.   Data Null's suggestion reminded me of the flexibility and general usefulness of PROC SUMMARY.

 

But it also got me to ask whether there is a good one-step solution.  The code below is my answer.  I think it can be called a merge-with-offset-record approach:

 

 

data new;
     length subjid $8 day $10 value 8;
     input subjid day value;
     datalines;
     1   baseline 10
     1   week2    12
     1   week4    14
     1   week8    16
     1   week12   12
     2   baseline 10
     2   week2    12
     2   week4    10
     3   baseline 10
     3   week2     3
     3   week8     4 
run;

data want (drop=_:);
  merge new 
        new (firstobs=2 keep=subjid day rename=(subjid=_nextsub day=_nextday) );

  do _f=findw("baseline week2 week4 week8 week12",trim(day),' ','e')
        to
        ifn(subjid^=_nextsub,5,findw("baseline week2 week4 week8 week12",
          trim(_nextday),' ','e')-1);
    output; 
    /* do not carry forward baseline values*/
    if day="baseline" then call missing(of value); 
    day=scan("baseline week2 week4 week8 week12",_f+1);
  end; 
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
prakash
Calcite | Level 5

data sda;
input ptno visit weight;
format ptno z3. ;
cards;
1 1 122
1 2 .
1 3 .
1 4 123
2 1 156
2 3 .
3 1 112
3 2 .
4 1 .
4 2 123
4 3 .
;
run;

data all;
format ptno z3.;
do i=1 to 4;
do j=1 to 4;
ptno=i;
visit=j;
output;
end;
end;
drop i j;
run;

proc sort data=sda; by ptno visit; run;
proc sort data=all; by ptno visit; run;

data final (drop=tempval);
retain tempval 0;
merge sda(in=b) all (in=val);
by ptno visit ;
if val;
if weight eq . then weight=tempval;
else tempval=weight;
run;

 

Output is as below

 

001 1 122
001 2 122
001 3 122
001 4 123
002 1 156
002 2 156
002 3 156
002 4 156
003 1 112
003 2 112
003 3 112
003 4 112
004 1 112
004 2 123
004 3 123
004 4 123

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 9178 views
  • 1 like
  • 6 in conversation