- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have dataset by visit in which I am trying to use the observations for the previous record for the current visit. Below are example data (HAVE) for three subid. When correctly done it should look like data WANT. That is, the records for visit=0 is carried over to visit=1; the record for visit=1 is carried over to visit=2, etc. Because the data for visit=0 will be missing when carried over to the visit=1, the data for visit=0 will retain its value and it is also used for visit=1. The last record within each subjid will vanish.
This is needed for all variables in the dataset. The LAG function is not giving the correct output as the LAG function is not pushing the observations by subjid.
data have;
input subjid $ vis var1 var2;
datalines;
001-001 0 4 5
001-001 1 3 6
001-001 2 2 9
001-001 3 5 7
001-002 0 5 7
001-002 1 8 9
001-002 2 6 2
001-002 3 9 4
001-003 0 2 7
001-003 1 6 9
001-003 2 5 2
;
data want;
input subjid $ vis var1 var2 var1new var2new;
datalines;
001-001 0 4 5 4 5
001-001 1 3 6 4 5
001-001 2 2 9 3 6
001-001 3 5 7 2 9
001-002 0 5 7 5 7
001-002 1 8 9 5 7
001-002 2 6 2 8 9
001-002 3 9 4 8 2
001-003 0 2 7 2 7
001-003 1 6 9 2 7
001-003 2 5 2 6 9
;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
/* UNTESTED CODE */
data want;
set have;
by subjid;
prev_var1=lag(var1);
prev_var2=lag(var2);
if first.subjid then do;
var1new=var1;
var2new=var2;
end;
else do;
var1new=prev_var1;
var2new=prev_var2;
end;
drop prev_:;
run;
As a comment, I think a better naming scheme is varnew1 and varnew2 or newvar1 and newvar2. When you use this naming scheme, you can easily refer to all of them via a list or via varnew: or newvar: and if you leave the naming scheme as you stated it, then you have to type out the variable names, you can't use a list or the colon to refer to them.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a million @PaigeMiller.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is a good example for using the lag function embedded in an IFN function:
data want;
set have;
by subjid;
var1new=ifn(first.subjid,var1,lag(var1));
var2new=ifn(first.subjid,var2,lag(var2));
run;
Because the lag function is inside an IFN function, it is always evaluated, even though it is not always selected based on the first argument of the ifn. That is a good thing, since you want the queue of values underlying the lag to always be synchronized with each new observation - i.e. the beginning of each new id get the lagged value from the end of the prior id. But the IFN will ignore that instance.
BTW, I think you have an erroneous value for var1new in the 8th record. You have 8, but I think it should be 6.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Many thanks @mkeintz