Hello everyone.
I have a dataset with certain variables for each firm. each firm has multiple year observations i.e. 1990, 1993, 1995,1998, 2000, 2002 and 2004.
I want to create the missing values not only for the missing years between these years but also want to add 2005 to these years. the values of other variables for the years created using SAS needs to be the value of previous years. For example 1990 exists originally in the dataset so the missing years which i create i.e. 1991 and 1992 should have same observations as 1990.
1994 should have same observations as 1993.
1996 and 1997 should have same observations as 1995
and so on until 2005 where the observations should be same as 2004.
If the firm observation ends before year 2004 lets say at year 2000, then still I would like to create the observations until 2005 where the variables will have values of 2000 in 2001 to 2005.
I am providing tables for examples
Have
Year | Cusip | Index | Dual class dummy |
1990 | XXXYY12 | 3 | 1 |
1993 | XXXYY12 | 5 | 1 |
1995 | XXXYY12 | 7 | 1 |
1998 | XXXYY12 | 9 | 1 |
2000 | XXXYY12 | 6 | 1 |
2002 | XXXYY12 | 5 | 1 |
2004 | XXXYY12 | 4 | 1 |
1993 | ABCD220 | 2 | 0 |
1995 | ABCD220 | 8 | 0 |
1998 | ABCD220 | 6 | 0 |
2000 | ABCD220 | 7 | 0 |
1995 | DECO921 | 7 | 0 |
1998 | DECO921 | 8 | 0 |
2000 | DECO921 | 9 | 0 |
2002 | DECO921 | 5 | 0 |
want
Year | Cusip | Index | Dual class dummy |
1990 | XXXYY12 | 3 | 1 |
1991 | XXXYY12 | 3 | 1 |
1992 | XXXYY12 | 3 | 1 |
1993 | XXXYY12 | 5 | 1 |
1994 | XXXYY12 | 5 | 1 |
1995 | XXXYY12 | 7 | 1 |
1996 | XXXYY12 | 7 | 1 |
1997 | XXXYY12 | 7 | 1 |
1998 | XXXYY12 | 9 | 1 |
1999 | XXXYY12 | 9 | 1 |
2000 | XXXYY12 | 6 | 1 |
2001 | XXXYY12 | 6 | 1 |
2002 | XXXYY12 | 5 | 1 |
2003 | XXXYY12 | 5 | 1 |
2004 | XXXYY12 | 4 | 1 |
2005 | XXXYY12 | 4 | 1 |
1993 | ABCD220 | 2 | 0 |
1994 | ABCD220 | 2 | 0 |
1995 | ABCD220 | 8 | 0 |
1996 | ABCD220 | 8 | 0 |
1997 | ABCD220 | 8 | 0 |
1998 | ABCD220 | 6 | 0 |
1999 | ABCD220 | 6 | 0 |
2000 | ABCD220 | 6 | 0 |
2001 | ABCD220 | 6 | 0 |
2002 | ABCD220 | 6 | 0 |
2003 | ABCD220 | 6 | 0 |
2004 | ABCD220 | 6 | 0 |
2005 | ABCD220 | 6 | 0 |
Thanking you in anticipation
I will provide UNTESTED CODE to get you most of the way there. If you want TESTED CODE, or code which does all of what you want, you must provide the input data according to these instructions and not using any other method.
/* UNTESTED CODE */
data years;
do year=1990 to 2005;
output;
end;
run;
proc sql;
create table cartesian as select a.cusip,b.year from (select distinct cusip from have) as a,years as b
order by a.cusip, b.year;
quit;
data want;
merge have cartesian;
by cusip year;
retain index1 dual_class_dummy1;
if missing(index) then index1=index;
if missing(dual_class_dummy) then dual_class_dummy1=dual_class_dummy;
run;
I will provide UNTESTED CODE to get you most of the way there. If you want TESTED CODE, or code which does all of what you want, you must provide the input data according to these instructions and not using any other method.
/* UNTESTED CODE */
data years;
do year=1990 to 2005;
output;
end;
run;
proc sql;
create table cartesian as select a.cusip,b.year from (select distinct cusip from have) as a,years as b
order by a.cusip, b.year;
quit;
data want;
merge have cartesian;
by cusip year;
retain index1 dual_class_dummy1;
if missing(index) then index1=index;
if missing(dual_class_dummy) then dual_class_dummy1=dual_class_dummy;
run;
Hello PaigeMiller,
Thank you for your response.
First of all my apologies for the format of the data provided. I am new to the community and this was my first post. However I will be careful about this next time and thank you for the link provided.
The codes worked well in creating the missing years and the range from 1990 to 2005. However the values of the variables except Cusip could not be created as previous observations and instead they were blank for every observation in the new variable created.
But because the years are created by Cusip, I think I might be able to fill the missing values by the previous observation by some other codes.
@Deepak_More wrote:
The codes worked well in creating the missing years and the range from 1990 to 2005. However the values of the variables except Cusip could not be created as previous observations and instead they were blank for every observation in the new variable created.
But because the years are created by Cusip, I think I might be able to fill the missing values by the previous observation by some other codes.
I don't know what this means, please show us. In any event, please provide the data in the requested format.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.