BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
georgel
Quartz | Level 8

Dear Mark, 

 

Thank you very much for paying attention to details!. This is a great support from you and  I highly appreciated it.

No, this is just a random sample non-real one!!  I have my real data. I havent notice this!. Basically to telll the story. I want to have benchmarks for each of my  own set of companies all of them from US  but I find that some of my benchmarks have only one missing return. The algortihm that I created it finds another benchmark-company only for the missing day and I dont want this. (real example from CRSP: WRDS:20030815    CUSIP: 88800230  Missing Return )

Regards

George

mkeintz
PROC Star

If your data are already sorted, then there is a somewhat cleaner way to replace internal missing values:

 

data retmeans (keep=cusip_id date_issue _midpoint) / view=retmeans;
  merge have (where=(ret^=.)                 keep=cusip_id ret date_issue)
        have (where=(_nxt_ret^=.) firstobs=2 keep=cusip_id ret rename=(cusip_id=_nxt_cusip ret=_nxt_ret));
  if cusip_id=_nxt_cusip then _midpoint=mean(ret,_nxt_ret);
run;

data want (drop=_:);
  merge have  retmeans (keep=cusip_id date_issue in=inmid);
  by cusip_id date_issue;
  if inmid then set retmeans ; /* Get _MIDPOINT */
  if ret=. then ret=_midpoint;
run;

Note the data set RETMEANS is a data set view, not a data set file.  That means it will not be written to disk, and that it will only be implemented when called for in a later data step.  Think of it as a synchronized process that is run only when used in the DATA WANT step.  So while the program is structed in two steps, there is only one step that actually causes data processing - i.e. it's efficient.

 

The retmeans data set will have one record per non-missing RET.  It will contain the id, date and the _MIDPOINT, the mean of the current RET and the following RET (note the FIRSTOBS=2).  Except for the last non-missing RET of each ID where _MIDPOINT will be left missing.

 

The second data step refers to RETMEANS once in a MERGE statement (excluding the _MIDPOINT variable) and once the SET statement.  Any variable in the (unconditional) MERGE statement will be replaced with ever iteration of the data step.  But any variable associated only with the (conditional) SET statement will not be replaced/reset until the next execution of the set statement.  So _MIDPOINT will be retained and available for all dates between two non-missing RET's.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
georgel
Quartz | Level 8

Dear Mark,

Thank you very much for your great support and explanations. 

Regards

George

Ksharp
Super User
data have;
input cusip_id $8. Name $6. date_issue :ddmmyy10. type $5. Ret ;
format date_issue ddmmyy10. ;
datalines;
10343452 Zilas 01/04/2016 main  3.2
10343452 Zilas 02/04/2014 main     4.5
10343452 Zilas 03/04/2012 main   6.6
10343452 Zilas 04/04/2011 main   .
10343452 Zilas 05/04/2017 main  7.7
10343452 Zilas 06/04/2017 main  .
10343452 Zilas 07/04/2017 main  .
10343452 Zilas 08/04/2017 main  .
10343453 Astra 01/04/2016 main  6.2
10343453 Astra 02/04/2014 main     5.5
10343453 Astra 03/04/2012 main   2.6
10343453 Astra 04/04/2011 main   .
10343453 Astra 05/04/2011 main   .
10343453 Astra 06/04/2017 main  4.7
10343453 Astra 07/04/2017 main  .
10343453 Astra 08/04/2017 main  8
10343453 Astra 09/04/2017 main  .
10343453 Astra 10/04/2017 main  .
10343453 Astra 11/04/2017 main  .
10343453 Astra 12/04/2017 main  .
10343453 Astra 13/04/2017 main  .
;
run;
data want;
if 0 then set have;
retain lag_ret .;
do until(last.cusip_id or not missing(ret));
 set have;
 by cusip_id;
 if first.cusip_id then call missing(lag_ret);
end;
mean=(lag_ret+ret)/2;
do until(last.cusip_id or not missing(ret));
 set have;
 by cusip_id;
 new_ret=coalesce(ret,mean);
 output;
end;
lag_ret=ret;
drop mean lag_ret ret;
run;
mkeintz
PROC Star

@Rick_SAS 

 

I think the OP is not really asking for a within-id-group mean.  The request is to get the mean of the nearest-preceding and nearest-following valid values.   I don't think PROC STDIZE can handle that.    

 

But I do like the comments on the advisability of mean-imputation.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Rick_SAS
SAS Super FREQ

Thanks. I did not read the thread very carefully, so I appreciate the correction. My apologies. Best wishes.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 21 replies
  • 2071 views
  • 10 likes
  • 6 in conversation