BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
abdulla
Pyrite | Level 9

Hi, I have some missing datadate. How to fill in the missing date

Gvkey  fyear  datadate

1001    2007   20071231

1001    2008    .....

1001    2009    20091231

1002    2005   20060131

1002    2006    .....

1002    2007    20080131

 

data want;
update have(obs=0) have;
by gvkey fyear;
output;
run;

 

The above code doesn't work

 

I want the following result

Gvkey  fyear  datadate

1001    2007   20071231

1001    2008    20081231.....

1001    2009    20091231

1002    2005   20060131

1002    2006    .20070131....

1002    2007    20080131

 

So, It is basically the year will change, not the month and day. Datadate is a numeric value

1 ACCEPTED SOLUTION

Accepted Solutions
Amir
PROC Star

Hi @abdulla ,

 

Please respond to each point and number your three responses accordingly:

 

1) I understand the value in datadate should be one more year than the last datadate; it does not need any other column. Please let me know if this is correct, if not then please explain.

 

2) You are showing dots ('.') in your input data and output data, is that really what your input data is like (with dots, e.g. ".....") and is that really what you want your output data to look like (with dots, e.g. ".20070131..")?

 

3) I removed the dots and used the data in the following code and got the results shown further below. Please copy and paste the below code and run it in SAS. Check the "have" data set to confirm it looks correct or not. If "have" is correct, then let us know if the results are what you want and share your results - all rows, including headers.

 

 

data have;
   infile datalines truncover;

   input
      gvkey    : $char4.
      fyear    : $char4.
      datadate : $char8.
   ;

   datalines;
1001    2007   20071231
1001    2008   
1001    2009   20091231
1002    2005   20060131
1002    2006   
1002    2007   20080131
;


data want(drop = previous_datadate);
   set have;

   length previous_datadate $ 8;
   retain previous_datadate;

   if missing(datadate) then
      datadate = cats(input(substr(previous_datadate,1,4),8.) + 1, substr(previous_datadate,5));

   previous_datadate = datadate;
run;

proc print data =  want;
run;

Results:

 

Obs    gvkey    fyear    datadate

 1     1001     2007     20071231
 2     1001     2008     20081231
 3     1001     2009     20091231
 4     1002     2005     20060131
 5     1002     2006     20070131
 6     1002     2007     20080131

 

Kind regards,

Amir.

View solution in original post

8 REPLIES 8
ChrisNZ
Tourmaline | Level 20

What should be the value for 2008?

Amir
PROC Star

Hi @abdulla ,

 

Please show what the data set should look like after the values are populated.

 

Is datadate a numeric variable (SAS date) or a character variable?

 

What are the rules / logic on how to decide what value should be used for datadate?

 

Kind regards,

Amir.

abdulla
Pyrite | Level 9

Hi Amir, It should fill the gap based on the missing observations between two date variables or gap can be in the last or first observations or in any place by gvkey and fyear(datadate only)

Amir
PROC Star

Hi @abdulla ,

 

If I've understood correctly, the value in datadate should be one more year than the last datadate; it does not need any other column. Please let me know if this not the correct understanding.

 

Ideally, dates are better stored as numeric in SAS, as they are easier to deal with.

 

However based on what you have, you can try the following code.

The first data step sets up the data, I have added some more rows at the end so that there are multiple consecutive blank datadate values.

The second data step has the logic to fill in the gaps.

You have to consider what should happen if the first row has a blank datadate value.

 

data have;
   infile datalines truncover;

   input
      gvkey    : $char4.
      fyear    : $char4.
      datadate : $char8.
   ;

   datalines;
1001    2007   20071231
1001    2008
1001    2009   20091231
1002    2005   20060131
1001    2006   
1001    2007   20080131
1001    2007   
1001    2007   
1001    2007   
;


data want(drop = previous_datadate);
   set have;

   length previous_datadate $ 8;
   retain previous_datadate;

   if missing(datadate) then
      datadate = cats(input(substr(previous_datadate,1,4),8.) + 1, substr(previous_datadate,5));

   previous_datadate = datadate;
run;

 

Kind regards,

Amir.

abdulla
Pyrite | Level 9
Amir,
I have made some changes in my data. Could you please check again? The above code is not giving me the correct answer.
Amir
PROC Star

Hi @abdulla ,

 

I cannot see your new data.

 

Please show your new data with the new results you want.

 

Please show the log with any errors using the "Insert Code" icon when posting it.

 

 

Kind regards,

Amir.

abdulla
Pyrite | Level 9
I have edited the main questions. By using the code, I am getting missing values filled but not the way I want. For example
I want something like this
1002 2005 20060131
1002 2006 .20070131..
But I am getting
1002 2005 20060131
1002 2006 .20340131..
Amir
PROC Star

Hi @abdulla ,

 

Please respond to each point and number your three responses accordingly:

 

1) I understand the value in datadate should be one more year than the last datadate; it does not need any other column. Please let me know if this is correct, if not then please explain.

 

2) You are showing dots ('.') in your input data and output data, is that really what your input data is like (with dots, e.g. ".....") and is that really what you want your output data to look like (with dots, e.g. ".20070131..")?

 

3) I removed the dots and used the data in the following code and got the results shown further below. Please copy and paste the below code and run it in SAS. Check the "have" data set to confirm it looks correct or not. If "have" is correct, then let us know if the results are what you want and share your results - all rows, including headers.

 

 

data have;
   infile datalines truncover;

   input
      gvkey    : $char4.
      fyear    : $char4.
      datadate : $char8.
   ;

   datalines;
1001    2007   20071231
1001    2008   
1001    2009   20091231
1002    2005   20060131
1002    2006   
1002    2007   20080131
;


data want(drop = previous_datadate);
   set have;

   length previous_datadate $ 8;
   retain previous_datadate;

   if missing(datadate) then
      datadate = cats(input(substr(previous_datadate,1,4),8.) + 1, substr(previous_datadate,5));

   previous_datadate = datadate;
run;

proc print data =  want;
run;

Results:

 

Obs    gvkey    fyear    datadate

 1     1001     2007     20071231
 2     1001     2008     20081231
 3     1001     2009     20091231
 4     1002     2005     20060131
 5     1002     2006     20070131
 6     1002     2007     20080131

 

Kind regards,

Amir.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 935 views
  • 1 like
  • 3 in conversation