BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
r4321
Pyrite | Level 9

Hello all, 

 

I could use your help with some coding. 

 

I have data that I have organized into a panel type dataset. There are multiple organizations in the dataset that are observed each year. I also have a variable in this dataset that has a lot of missing values. For this variable, if it is missing in a given year, I would like to fill in the current year's value with the previous year's value.

 

However, the previous year's value may also be missing. It may even be missing for several years back when the firm first entered the dataset (e.g., firm enters dataset in 2001, but does not have an observed value on said variable until 2007), in which case I would want the values for 2001-2006 to be changed from missing to zeros. Additionally, a value could appear on the variable when it first enters the dataset, but then be missing for two subsequent years, the value could appear again, and then be missing for next year, and so on. So the missingness is random. To sum up, I would like my variable to take on the value of the last year in which it had a value. if it does not have a value in any of the previous years, I would like it to take on a zero up until it has a value (if ever). I have created a small example below. 

 

Firm ID   Year     Variable                  Transformed to --->        Firm ID   Year     Variable   

1             2002        .                                                                    1             2002        0

1             2003        .                                                                    1             2003        0

1             2004        1                                                                   1             2004        1

1             2005        .                                                                    1             2005        1

2             2002        2                                                                   2             2002        2

2             2003        .                                                                    2             2003        2

2             2004        3                                                                   2             2004        3

2             2005        .                                                                    2             2005        3  

3             1996        .                                                                    3             1996        0

3             1997        .                                                                    3             1997        0

3             1998        .                                                                    3             1998        0

 

 

If I could get some guidance with how to code this, I would really appreciate it! To be honest, I'm not sure where to start

with this. 

 

Thanks in advance. 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

hi @r4321   Very straightforward update

 

 
data have;
input FirmID   Year     Variable  ;          
cards;
1             2002        .                                                                  
1             2003        .                                                                    
1             2004        1                                                               
1             2005        .                                                                 
2             2002        2                                                                  
2             2003        .                                                                 
2             2004        3                                                                   
2             2005        .                                                            
3             1996        .                                                                  
3             1997        .                                                                   
3             1998        .                                                                   
;
 
data want;
update have(obs=0) have;
by firmid;
if missing(variable) then variable=0;
output;
run;

 

 

View solution in original post

5 REPLIES 5
novinosrin
Tourmaline | Level 20

hi @r4321   Very straightforward update

 

 
data have;
input FirmID   Year     Variable  ;          
cards;
1             2002        .                                                                  
1             2003        .                                                                    
1             2004        1                                                               
1             2005        .                                                                 
2             2002        2                                                                  
2             2003        .                                                                 
2             2004        3                                                                   
2             2005        .                                                            
3             1996        .                                                                  
3             1997        .                                                                   
3             1998        .                                                                   
;
 
data want;
update have(obs=0) have;
by firmid;
if missing(variable) then variable=0;
output;
run;

 

 

Reeza
Super User
data want;
set have;
by firmID;

retain new_var;

if first.firmID and missing(variable) then new_var = 0;
else if not missing(variable) then new_var = variable;

run;

Use RETAIN and FIRST logic here. I think above is all you need, but if I"m missing some conditions feel free to post back. 

 


@r4321 wrote:

Hello all, 

 

I could use your help with some coding. 

 

I have data that I have organized into a panel type dataset. There are multiple organizations in the dataset that are observed each year. I also have a variable in this dataset that has a lot of missing values. For this variable, if it is missing in a given year, I would like to fill in the current year's value with the previous year's value.

 

However, the previous year's value may also be missing. It may even be missing for several years back when the firm first entered the dataset (e.g., firm enters dataset in 2001, but does not have an observed value on said variable until 2007), in which case I would want the values for 2001-2006 to be changed from missing to zeros. Additionally, a value could appear on the variable when it first enters the dataset, but then be missing for two subsequent years, the value could appear again, and then be missing for next year, and so on. So the missingness is random. To sum up, I would like my variable to take on the value of the last year in which it had a value. if it does not have a value in any of the previous years, I would like it to take on a zero up until it has a value (if ever). I have created a small example below. 

 

Firm ID   Year     Variable                  Transformed to --->        Firm ID   Year     Variable   

1             2002        .                                                                    1             2002        0

1             2003        .                                                                    1             2003        0

1             2004        1                                                                   1             2004        1

1             2005        .                                                                    1             2005        1

2             2002        2                                                                   2             2002        2

2             2003        .                                                                    2             2003        2

2             2004        3                                                                   2             2004        3

2             2005        .                                                                    2             2005        3  

3             1996        .                                                                    3             1996        0

3             1997        .                                                                    3             1997        0

3             1998        .                                                                    3             1998        0

 

 

If I could get some guidance with how to code this, I would really appreciate it! To be honest, I'm not sure where to start

with this. 

 

Thanks in advance. 

 

 

 


 

r4321
Pyrite | Level 9

Hello all, 

 

Thank you for your help so far and quick replies. 

 

I have that the code you provided is effective at producing zeros if there are missing values... However, it does not carry over the previous years value into the next year if the previous year is not missing and the current year is missing. In the case that there is a value in a given year on the variable... I want it to carry over to subsequent years that are missing until there is again a value. See example data:

 

Firm ID   Year     Variable                  Transformed to --->        Firm ID   Year     Variable   

1             2002        .                                                                    1             2002        0

1             2003        .                                                                    1             2003        0

1             2004        100                                                               1             2004        100

1             2005        .                                                                    1             2005        100

1             2006        .                                                                    1             2006        100

1             2007        .                                                                    1             2007        100

2             2002        22                                                                  2             2002       22

2             2003        .                                                                    2             2003        22

2             2004        33                                                                 2             2004        33

2             2005        .                                                                    2             2005        33 

2             2006     .  44                                                                 2             2006        44

3             1994                                                                             3             1994        0

3             1995        .                                                                    3            1995         0

3             1996                                                                             3             1996        0

3             1997        .                                                                    3             1997        0 

3             1998     .                                                                       3             1998        0

 

 

 

I really appreciate your help in figuring this out!

 

Thanks again. 

 

 

 

novinosrin
Tourmaline | Level 20

Hello @r4321    Have you really tested the code I gave you

 

data want;
update have(obs=0) have;
by firmid;
if missing(variable) then variable=0;
output;
run;

 

r4321
Pyrite | Level 9

Interesting --- 

 

So I ran it again and it works. 

 

I think the problem was, I had multiple variables in the dataset and I was comparing the new variable that had been created (with the values filled in) to the old variable it was being transformed from (with the missing values) and it was confusing me because the old variable was being transformed by the code as well despite not being specified in the code (not sure why this was happening). Given your push to try it again, I transformed the variable and then merge in the old version (with the missing values) to compare and it looks good! 

 

data act.stage27;
update act.stage26(obs=0) act.stage26;
by cusip;
if missing(jfreducedlast) then jfreducedlast=0;
output;
run;


proc sql;
create table act.stage28 as select a.*, b.jfreduced 
from act.stage27 as a left join act.stage25 as b
on a.year_cusip = b.year_cusip;
quit;

 

Thanks for your help and patience!

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1604 views
  • 3 likes
  • 3 in conversation