Hello all,
I could use your help with some coding.
I have data that I have organized into a panel type dataset. There are multiple organizations in the dataset that are observed each year. I also have a variable in this dataset that has a lot of missing values. For this variable, if it is missing in a given year, I would like to fill in the current year's value with the previous year's value.
However, the previous year's value may also be missing. It may even be missing for several years back when the firm first entered the dataset (e.g., firm enters dataset in 2001, but does not have an observed value on said variable until 2007), in which case I would want the values for 2001-2006 to be changed from missing to zeros. Additionally, a value could appear on the variable when it first enters the dataset, but then be missing for two subsequent years, the value could appear again, and then be missing for next year, and so on. So the missingness is random. To sum up, I would like my variable to take on the value of the last year in which it had a value. if it does not have a value in any of the previous years, I would like it to take on a zero up until it has a value (if ever). I have created a small example below.
Firm ID Year Variable Transformed to ---> Firm ID Year Variable
1 2002 . 1 2002 0
1 2003 . 1 2003 0
1 2004 1 1 2004 1
1 2005 . 1 2005 1
2 2002 2 2 2002 2
2 2003 . 2 2003 2
2 2004 3 2 2004 3
2 2005 . 2 2005 3
3 1996 . 3 1996 0
3 1997 . 3 1997 0
3 1998 . 3 1998 0
If I could get some guidance with how to code this, I would really appreciate it! To be honest, I'm not sure where to start
with this.
Thanks in advance.
hi @r4321 Very straightforward update
data have;
input FirmID Year Variable ;
cards;
1 2002 .
1 2003 .
1 2004 1
1 2005 .
2 2002 2
2 2003 .
2 2004 3
2 2005 .
3 1996 .
3 1997 .
3 1998 .
;
data want;
update have(obs=0) have;
by firmid;
if missing(variable) then variable=0;
output;
run;
hi @r4321 Very straightforward update
data have;
input FirmID Year Variable ;
cards;
1 2002 .
1 2003 .
1 2004 1
1 2005 .
2 2002 2
2 2003 .
2 2004 3
2 2005 .
3 1996 .
3 1997 .
3 1998 .
;
data want;
update have(obs=0) have;
by firmid;
if missing(variable) then variable=0;
output;
run;
data want;
set have;
by firmID;
retain new_var;
if first.firmID and missing(variable) then new_var = 0;
else if not missing(variable) then new_var = variable;
run;
Use RETAIN and FIRST logic here. I think above is all you need, but if I"m missing some conditions feel free to post back.
@r4321 wrote:
Hello all,
I could use your help with some coding.
I have data that I have organized into a panel type dataset. There are multiple organizations in the dataset that are observed each year. I also have a variable in this dataset that has a lot of missing values. For this variable, if it is missing in a given year, I would like to fill in the current year's value with the previous year's value.
However, the previous year's value may also be missing. It may even be missing for several years back when the firm first entered the dataset (e.g., firm enters dataset in 2001, but does not have an observed value on said variable until 2007), in which case I would want the values for 2001-2006 to be changed from missing to zeros. Additionally, a value could appear on the variable when it first enters the dataset, but then be missing for two subsequent years, the value could appear again, and then be missing for next year, and so on. So the missingness is random. To sum up, I would like my variable to take on the value of the last year in which it had a value. if it does not have a value in any of the previous years, I would like it to take on a zero up until it has a value (if ever). I have created a small example below.
Firm ID Year Variable Transformed to ---> Firm ID Year Variable
1 2002 . 1 2002 0
1 2003 . 1 2003 0
1 2004 1 1 2004 1
1 2005 . 1 2005 1
2 2002 2 2 2002 2
2 2003 . 2 2003 2
2 2004 3 2 2004 3
2 2005 . 2 2005 3
3 1996 . 3 1996 0
3 1997 . 3 1997 0
3 1998 . 3 1998 0
If I could get some guidance with how to code this, I would really appreciate it! To be honest, I'm not sure where to start
with this.
Thanks in advance.
Hello all,
Thank you for your help so far and quick replies.
I have that the code you provided is effective at producing zeros if there are missing values... However, it does not carry over the previous years value into the next year if the previous year is not missing and the current year is missing. In the case that there is a value in a given year on the variable... I want it to carry over to subsequent years that are missing until there is again a value. See example data:
Firm ID Year Variable Transformed to ---> Firm ID Year Variable
1 2002 . 1 2002 0
1 2003 . 1 2003 0
1 2004 100 1 2004 100
1 2005 . 1 2005 100
1 2006 . 1 2006 100
1 2007 . 1 2007 100
2 2002 22 2 2002 22
2 2003 . 2 2003 22
2 2004 33 2 2004 33
2 2005 . 2 2005 33
2 2006 . 44 2 2006 44
3 1994 3 1994 0
3 1995 . 3 1995 0
3 1996 3 1996 0
3 1997 . 3 1997 0
3 1998 . 3 1998 0
I really appreciate your help in figuring this out!
Thanks again.
Hello @r4321 Have you really tested the code I gave you
data want;
update have(obs=0) have;
by firmid;
if missing(variable) then variable=0;
output;
run;
Interesting ---
So I ran it again and it works.
I think the problem was, I had multiple variables in the dataset and I was comparing the new variable that had been created (with the values filled in) to the old variable it was being transformed from (with the missing values) and it was confusing me because the old variable was being transformed by the code as well despite not being specified in the code (not sure why this was happening). Given your push to try it again, I transformed the variable and then merge in the old version (with the missing values) to compare and it looks good!
data act.stage27;
update act.stage26(obs=0) act.stage26;
by cusip;
if missing(jfreducedlast) then jfreducedlast=0;
output;
run;
proc sql;
create table act.stage28 as select a.*, b.jfreduced
from act.stage27 as a left join act.stage25 as b
on a.year_cusip = b.year_cusip;
quit;
Thanks for your help and patience!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.