Hi there,
I really need help in filling the missing value.
I want to fill the missing value of value1, value2 and value3 by ''firm'' with last observation.
But I just want to fill the variable of each firm that with the missing value that less than 5.
data test;
input firm $2. year Month value1 value2 value3;
cards;
A 2006 03 7 7 5
A 2006 03 8 8 .
A 2006 03 . 9 1
A 2006 06 5 . .
A 2006 06 8 7 4
A 2006 06 . 4 3
A 2006 09 . . 2
A 2006 09 9 7 1
A 2006 09 . . 5
A 2007 03 . 7 .
A 2007 03 . 7 8
A 2007 03 . 7 .
A 2007 06 . 3 2
A 2007 06 . 7 6
A 2007 06 . 8 .
B 2006 03 . 2 1
B 2006 03 12 7 .
B 2006 03 10 . .
B 2007 03 9 1 2
B 2007 03 1 . .
B 2007 03 . 3 4
B 2007 03 . . .
B 2007 03 . 3 4
;
run;
data want:
A 2006 03 7 7 5
A 2006 03 8 8 5
A 2006 03 . 9 1
A 2006 06 5 9 1
A 2006 06 8 7 4
A 2006 06 . 4 3
A 2006 09 . 4 2
A 2006 09 9 7 1
A 2006 09 . 7 5
A 2007 03 . 7 5
A 2007 03 . 7 8
A 2007 03 . 7 8
A 2007 06 . 3 2
A 2007 06 . 7 6
A 2007 06 . 8 6
B 2006 03 . 2 1
B 2006 03 12 7 1
B 2006 03 10 7 1
B 2007 03 9 1 2
B 2007 03 1 1 2
B 2007 03 1 3 4
B 2007 03 1 3 4
B 2007 03 1 3 4
;
run;
So there are two requirements when filling the missing value:
1. Fill the missing value of each variable by 'firm' with previous data.
2. If the number of missing value > 5, then we do not fill that variable. In this case, because the number of the missing value of value1 of firm A > 5, then it will not be filled by the lasted data.
Thank you so much.
Something like this should work (untested, I don't have SAS handy) :
data WANT;
do until(last.FIRM);
set HAVE;
by FIRM;
VAL1NMISS=sum(VAL1NMISS, missing(VALUE1));
VAL2NMISS=sum(VAL2NMISS, missing(VALUE2));
VAL3NMISS=sum(VAL3NMISS, missing(VALUE3));
end;
do until(last.FIRM);
set HAVE;
by FIRM;
VALUE1=ifn(VAL1NMISS<5 & VALUE1=. & ^first.FIRM, lag(VALUE1), VALUE1);
VALUE2=ifn(VAL2NMISS<5 & VALUE2=. & ^first.FIRM, lag(VALUE2), VALUE2);
VALUE3=ifn(VAL3NMISS<5 & VALUE3=. & ^first.FIRM, lag(VALUE3), VALUE3);
output;
end;
run;
Hi @Maomao_Hui
You don't specify it, byt it seems from your want-data that rule 2 applies only to value1. The following code gives your wanted result:
data test;
input firm $2. year Month value1 value2 value3;
cards;
A 2006 03 7 7 5
A 2006 03 8 8 .
A 2006 03 . 9 1
A 2006 06 5 . .
A 2006 06 8 7 4
A 2006 06 . 4 3
A 2006 09 . . 2
A 2006 09 9 7 1
A 2006 09 . . 5
A 2007 03 . 7 .
A 2007 03 . 7 8
A 2007 03 . 7 .
A 2007 06 . 3 2
A 2007 06 . 7 6
A 2007 06 . 8 .
B 2006 03 . 2 1
B 2006 03 12 7 .
B 2006 03 10 . .
B 2007 03 9 1 2
B 2007 03 1 . .
B 2007 03 . 3 4
B 2007 03 . . .
B 2007 03 . 3 4
;
run;
data want; set test; by firm;
retain ovalue1 ovalue2 ovalue3;
drop ovalue1 ovalue2 ovalue3;
if value1 = . then value1 = ovalue1;
if value2 = . then value2 = ovalue2;
if value3 = . then value3 = ovalue3;
if value1 < 5 then ovalue1 = value1; else ovalue1 = .;
if value2 > . then ovalue2 = value2; else ovalue2 = .;
if value3 > . then ovalue3 = value3; else ovalue3 = .;
run;
Hi @ErikLund_Jensen ,
Thank you so much for your kind reply. I am so sorry that I am not clear enough. Actually, I just want to fill the variables with few missing values. For example, the number of the missing value of value1 of firm A is 10, so I ignore it.
Firm A, the number of the missing value of value1 is 10, not fill it. The number of the missing value of value 2 is 3, fill it. The number of the missing value of value 3 of firm A is 5, so should also not be filled.
Firm B, the number of the missing value of value1 is 3, fill it. The number of the missing value of value 2 is 4, fill it. The number of the missing value of value 3 is 4, fill it.
Hope this time my explanation would be clearer, and thanks again for your quick reply.
Thank you,
Hui
data test;
input firm $2. year Month value1 value2 value3;
cards;A 2006 03 7 7 5
A 2006 03 8 8 .
A 2006 03 . 9 1
A 2006 06 5 . .
A 2006 06 8 7 4
A 2006 06 . 4 3
A 2006 09 . . 2
A 2006 09 9 7 1
A 2006 09 . . 5
A 2007 03 . 7 .
A 2007 03 . 7 8
A 2007 03 . 7 .
A 2007 06 . 3 2
A 2007 06 . 7 6
A 2007 06 . 8 .
B 2006 03 . 2 1
B 2006 03 12 7 .
B 2006 03 10 . .
B 2007 03 9 1 2
B 2007 03 1 . .
B 2007 03 . 3 4
B 2007 03 . . .
B 2007 03 . 3 4;
run;
data want;
A 2006 03 7 7 5
A 2006 03 8 8 .
A 2006 03 . 9 1
A 2006 06 5 9 .
A 2006 06 8 7 4
A 2006 06 . 4 3
A 2006 09 . 4 2
A 2006 09 9 7 1
A 2006 09 . 7 5
A 2007 03 . 7 .
A 2007 03 . 7 8
A 2007 03 . 7 .
A 2007 06 . 3 2
A 2007 06 . 7 6
A 2007 06 . 8 .
B 2006 03 . 2 1
B 2006 03 12 7 1
B 2006 03 10 7 1
B 2007 03 9 1 2
B 2007 03 1 1 2
B 2007 03 1 3 4
B 2007 03 1 3 4
B 2007 03 1 3 4
Something like this should work (untested, I don't have SAS handy) :
data WANT;
do until(last.FIRM);
set HAVE;
by FIRM;
VAL1NMISS=sum(VAL1NMISS, missing(VALUE1));
VAL2NMISS=sum(VAL2NMISS, missing(VALUE2));
VAL3NMISS=sum(VAL3NMISS, missing(VALUE3));
end;
do until(last.FIRM);
set HAVE;
by FIRM;
VALUE1=ifn(VAL1NMISS<5 & VALUE1=. & ^first.FIRM, lag(VALUE1), VALUE1);
VALUE2=ifn(VAL2NMISS<5 & VALUE2=. & ^first.FIRM, lag(VALUE2), VALUE2);
VALUE3=ifn(VAL3NMISS<5 & VALUE3=. & ^first.FIRM, lag(VALUE3), VALUE3);
output;
end;
run;
Thank you so much. I also made a bit adjustment as follow:
data WANT;
do until(last.FIRM);
set HAVE;
by FIRM;
VAL1NMISS=sum(VAL1NMISS, missing(VALUE1));
VAL2NMISS=sum(VAL2NMISS, missing(VALUE2));
VAL3NMISS=sum(VAL3NMISS, missing(VALUE3));
end;
do until(last.FIRM);
set HAVE;
by FIRM;
VALUE1=ifn(VAL1NMISS<5 & VALUE1=. & ^first.FIRM, coalesce(VALUE1,n1), VALUE1);n1=VALUE1;
VALUE2=ifn(VAL2NMISS<5 & VALUE2=. & ^first.FIRM, coalesce(VALUE2,n2), VALUE2);n2=VALUE2;
VALUE3=ifn(VAL3NMISS<5 & VALUE3=. & ^first.FIRM, coalesce(VALUE3,n3), VALUE3);n3=VALUE3;
output;
end;
run;
Hi @Maomao_Hui
I think the code I supplied will create your new want-data set, if the rule is applied to all 3 variables:
data want; set test; by firm;
retain ovalue1 ovalue2 ovalue3;
drop ovalue1 ovalue2 ovalue3;
if value1 = . then value1 = ovalue1;
if value2 = . then value2 = ovalue2;
if value3 = . then value3 = ovalue3;
if value1 < 5 then ovalue1 = value1; else ovalue1 = .;
if value2 < 5 then ovalue2 = value2; else ovalue2 = .;
if value3 < 5 then ovalue3 = value3; else ovalue3 = .;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.