Hi, I have some missing datadate. How to fill in the missing date
Gvkey fyear datadate
1001 2007 20071231
1001 2008 .....
1001 2009 20091231
1002 2005 20060131
1002 2006 .....
1002 2007 20080131
data want;
update have(obs=0) have;
by gvkey fyear;
output;
run;
The above code doesn't work
I want the following result
Gvkey fyear datadate
1001 2007 20071231
1001 2008 20081231.....
1001 2009 20091231
1002 2005 20060131
1002 2006 .20070131....
1002 2007 20080131
So, It is basically the year will change, not the month and day. Datadate is a numeric value
Hi @abdulla ,
Please respond to each point and number your three responses accordingly:
1) I understand the value in datadate should be one more year than the last datadate; it does not need any other column. Please let me know if this is correct, if not then please explain.
2) You are showing dots ('.') in your input data and output data, is that really what your input data is like (with dots, e.g. ".....") and is that really what you want your output data to look like (with dots, e.g. ".20070131..")?
3) I removed the dots and used the data in the following code and got the results shown further below. Please copy and paste the below code and run it in SAS. Check the "have" data set to confirm it looks correct or not. If "have" is correct, then let us know if the results are what you want and share your results - all rows, including headers.
data have;
infile datalines truncover;
input
gvkey : $char4.
fyear : $char4.
datadate : $char8.
;
datalines;
1001 2007 20071231
1001 2008
1001 2009 20091231
1002 2005 20060131
1002 2006
1002 2007 20080131
;
data want(drop = previous_datadate);
set have;
length previous_datadate $ 8;
retain previous_datadate;
if missing(datadate) then
datadate = cats(input(substr(previous_datadate,1,4),8.) + 1, substr(previous_datadate,5));
previous_datadate = datadate;
run;
proc print data = want;
run;
Results:
Obs gvkey fyear datadate 1 1001 2007 20071231 2 1001 2008 20081231 3 1001 2009 20091231 4 1002 2005 20060131 5 1002 2006 20070131 6 1002 2007 20080131
Kind regards,
Amir.
What should be the value for 2008?
Hi @abdulla ,
Please show what the data set should look like after the values are populated.
Is datadate a numeric variable (SAS date) or a character variable?
What are the rules / logic on how to decide what value should be used for datadate?
Kind regards,
Amir.
Hi Amir, It should fill the gap based on the missing observations between two date variables or gap can be in the last or first observations or in any place by gvkey and fyear(datadate only)
Hi @abdulla ,
If I've understood correctly, the value in datadate should be one more year than the last datadate; it does not need any other column. Please let me know if this not the correct understanding.
Ideally, dates are better stored as numeric in SAS, as they are easier to deal with.
However based on what you have, you can try the following code.
The first data step sets up the data, I have added some more rows at the end so that there are multiple consecutive blank datadate values.
The second data step has the logic to fill in the gaps.
You have to consider what should happen if the first row has a blank datadate value.
data have;
infile datalines truncover;
input
gvkey : $char4.
fyear : $char4.
datadate : $char8.
;
datalines;
1001 2007 20071231
1001 2008
1001 2009 20091231
1002 2005 20060131
1001 2006
1001 2007 20080131
1001 2007
1001 2007
1001 2007
;
data want(drop = previous_datadate);
set have;
length previous_datadate $ 8;
retain previous_datadate;
if missing(datadate) then
datadate = cats(input(substr(previous_datadate,1,4),8.) + 1, substr(previous_datadate,5));
previous_datadate = datadate;
run;
Kind regards,
Amir.
Hi @abdulla ,
I cannot see your new data.
Please show your new data with the new results you want.
Please show the log with any errors using the "Insert Code" icon when posting it.
Kind regards,
Amir.
Hi @abdulla ,
Please respond to each point and number your three responses accordingly:
1) I understand the value in datadate should be one more year than the last datadate; it does not need any other column. Please let me know if this is correct, if not then please explain.
2) You are showing dots ('.') in your input data and output data, is that really what your input data is like (with dots, e.g. ".....") and is that really what you want your output data to look like (with dots, e.g. ".20070131..")?
3) I removed the dots and used the data in the following code and got the results shown further below. Please copy and paste the below code and run it in SAS. Check the "have" data set to confirm it looks correct or not. If "have" is correct, then let us know if the results are what you want and share your results - all rows, including headers.
data have;
infile datalines truncover;
input
gvkey : $char4.
fyear : $char4.
datadate : $char8.
;
datalines;
1001 2007 20071231
1001 2008
1001 2009 20091231
1002 2005 20060131
1002 2006
1002 2007 20080131
;
data want(drop = previous_datadate);
set have;
length previous_datadate $ 8;
retain previous_datadate;
if missing(datadate) then
datadate = cats(input(substr(previous_datadate,1,4),8.) + 1, substr(previous_datadate,5));
previous_datadate = datadate;
run;
proc print data = want;
run;
Results:
Obs gvkey fyear datadate 1 1001 2007 20071231 2 1001 2008 20081231 3 1001 2009 20091231 4 1002 2005 20060131 5 1002 2006 20070131 6 1002 2007 20080131
Kind regards,
Amir.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.