I'm trying to replace missing values with the values of the previous row before the row with missing. The example data is as below: the data set "have" is the data I have, and "want" is the target data I want to generate. The idea is that when the fields ID and var1 have missing values, the row with missing takes the value from the row with value before the row with missing value.
Thank you very much!
data have;
infile datalines truncover dsd;
input Person_ID var1 $ var2 $;
datalines;
1,aa,aa1
,,aa2
2,bb,bb
3,cc,cc
4,dd,dd
5,ee,ee
6,ff,ff
7,gg,gg1
,,gg2
,,gg3
8,hh,hh1
,,hh2
,,hh3
;
run;
data want;
infile datalines truncover dsd;
input Person_ID var1 $ var2 $;
datalines;
1,aa,aa1
1,aa,aa2
2,bb,bb
3,cc,cc
4,dd,dd
5,ee,ee
6,ff,ff
7,gg,gg1
7,gg,gg2
7,gg,gg3
8,hh,hh1
8,hh,hh2
8,hh,hh3
;
run;
Here you go.
data have;
infile datalines truncover dsd;
input Person_ID var1 $ var2 $ var3 $ var4;
datalines;
1,aa,aa1,aaa,
,,aa2,,10
2,bb,bb,bbb,6
3,cc,cc,ccc,8
4,dd,dd,ddd,
5,ee,ee,eee,7
6,ff,ff,fff,
7,gg,gg1,ggg,3
,,gg2,,5
,,gg3,,6
8,hh,hh1,hhh,
,,hh2,,
,,hh3,,
;
run;
data have2;
retain dummy 1;
set have;
run;
proc print; run;
data want;
if 0 then set have2;
update have2(keep=dummy obs=0) have2(keep=dummy Person_id var1-var2);
by dummy;
set have2(keep=var3-var4);
output;
drop dummy;
run;
proc print;
run;
Try the update trick.
data have;
infile datalines truncover dsd;
input Person_ID var1 $ var2 $;
datalines;
1,aa,aa1
,,aa2
2,bb,bb
3,cc,cc
4,dd,dd
5,ee,ee
6,ff,ff
7,gg,gg1
,,gg2
,,gg3
8,hh,hh1
,,hh2
,,hh3
;
run;
data have2;
retain dummy 1;
set have;
run;
data want;
update have2(obs=0) have2;
by dummy;
output;
drop dummy;
run;
proc print; run;
Use the UPDATE/BY statements and keep (data set options) only the variables that you want to carry forward. Then use SET dropping the variables you kept in the UPDATE statement.
If you supply some sample data I will make example.
if 0 then set have2;
update have2(keep=dummy obs=0) have2(keep=dummy x y z);
by dummy;
set have2(drop=dummy x y z);
The dummy data would look like below. Basically only Person_ID and var1 need to be populated (as from data have to data want), but not var3 or var4 even thought they have missing values as well.
data have;
infile datalines truncover dsd;
input Person_ID var1 $ var2 $ var3 $ var4;
datalines;
ID,var1,var2,var3,var4
1,aa,aa1,aaa,
,,aa2,,10
2,bb,bb,bbb,6
3,cc,cc,ccc,8
4,dd,dd,ddd,
5,ee,ee,eee,7
6,ff,ff,fff,
7,gg,gg1,ggg,3
,,gg2,,5
,,gg3,,6
8,hh,hh1,hhh,
,,hh2,,
,,hh3,,
;
run;
data want;
infile datalines truncover dsd;
input Person_ID var1 $ var2 $ var3 $ var4;
datalines;
1,aa,aa1,aaa,
1,aa,aa2,,10
2,bb,bb,bbb,6
3,cc,cc,ccc,8
4,dd,dd,ddd,
5,ee,ee,eee,7
6,ff,ff,fff,
7,gg,gg1,ggg,3
7,gg,gg2,,5
7,gg,gg3,,6
8,hh,hh1,hhh,
8,hh,hh2,,
8,hh,hh3,,
;
run;
Sorry! Data have should be as below:
data have;
infile datalines truncover dsd;
input Person_ID var1 $ var2 $ var3 $ var4;
datalines;
1,aa,aa1,aaa,
,,aa2,,10
2,bb,bb,bbb,6
3,cc,cc,ccc,8
4,dd,dd,ddd,
5,ee,ee,eee,7
6,ff,ff,fff,
7,gg,gg1,ggg,3
,,gg2,,5
,,gg3,,6
8,hh,hh1,hhh,
,,hh2,,
,,hh3,,
;
run;
Thank you!
Here you go.
data have;
infile datalines truncover dsd;
input Person_ID var1 $ var2 $ var3 $ var4;
datalines;
1,aa,aa1,aaa,
,,aa2,,10
2,bb,bb,bbb,6
3,cc,cc,ccc,8
4,dd,dd,ddd,
5,ee,ee,eee,7
6,ff,ff,fff,
7,gg,gg1,ggg,3
,,gg2,,5
,,gg3,,6
8,hh,hh1,hhh,
,,hh2,,
,,hh3,,
;
run;
data have2;
retain dummy 1;
set have;
run;
proc print; run;
data want;
if 0 then set have2;
update have2(keep=dummy obs=0) have2(keep=dummy Person_id var1-var2);
by dummy;
set have2(keep=var3-var4);
output;
drop dummy;
run;
proc print;
run;
Here is the solution to retain values on multiple variables and assign the values on the basis of different conditions. The variables may have different or the same conditions.
In this example I've used following conditions: Person_ID eq .
var1 eq''
var2 eq ''
But it can be easily changed and adjusted according to your requirement. @data_null__
data have;
infile datalines truncover dsd;
input Person_ID var1 $ var2 $;
datalines;
1,aa,aa1
,,aa2
2,bb,bb
3,cc,cc
4,dd,dd
5,ee,ee
6,ff,ff
7,gg,gg1
,,gg2
,,gg3
8,hh,hh1
,,hh2
,,hh3
;
run;
proc print;
data want;
set have;
retain temp1 temp2 temp3;
if _N_=1 then do;
temp1=Person_ID;
temp2=var1;
temp3=var2;
output;
end;
else do;
/* for Person_ID */
if Person_ID eq . then do;
temp1=temp1;
Person_ID=temp1;
end;
else do;
temp1=Person_ID;
Person_ID=Person_ID;
end;
/* for var1 */
if var1 eq '' then do;
temp2=temp2;
var1=temp2;
end;
else do;
temp2=var1;
var1=var1;
end;
/* for var2 */
if var2 eq '' then do;
temp3=temp3;
var2=temp3;
end;
else do;
temp3=var2;
var2=var2;
end;
output;
temp1=Person_ID;
temp2=var1;
temp3=var2;
end;
drop temp1 temp2 temp3;
run;
proc print;
This method should work! Looking at your input dataset you just need to add var4 with condition:
if var4 eq . then do
It's a bit long code but simple and easy to understand. It would be possible to further optimize this code, possibly using ARRAY or other sas procedures.
But this method is good enough to achieve desired output.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.