BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
saurabh_latrobe
Fluorite | Level 6

Dear Friends,

 

I am quite new to SAS. I am struggling to write a code in SAS.

Objective - Collapse the daily data on return into average of daily return in each quarter end date like 31-Mar-2001, 30-Jun-2001,30-Sep-2001..... for each company.

Other Details - There are multiple returns in a single date for a company and there will be many dates which will not be there in the dataset. The program should take the average of all these returns for each company and in the output datafile, it should show only the quarter end dates with the average (daily) return in that quarter. The data runs into millions so it will be difficult to share (sample is in the table below). In the output, I need these 3 columns but dates should be only the end date of each quarter as shown in the output table below:

Required Output Table:

DateCompany IdReturn (Average of all Daily Returns across the given quarter)
31/03/2001xyz2%
30/06/2001xyz1.38%
30/09/2001xyz1.29%
31/12/2001xyz3.01%
...........
31/03/2010xyz1.45%
30/06/2010xyz3%
30/09/2010xyz2.12%
31/12/2010xyz2.07%
....  
31/03/2001abc0.99%
30/06/2001abc1.75%
30/09/2001abc1.67%
31/12/2001abc2.01%
....... 
31/03/2011abc3%
30/06/2011abc1.23%
30/09/2011abc2%
31/12/2011abc1.3%
.........

Input Data Sample -

DateCompany IdDaily Return
01/01/2001abc1.1%
01/01/2001abc1.23%
01/01/2001abc0.67%
02/01/2001abc.89%
02/01/2001abc1.02%
05/02/2001abc1%
06/02/2001abc0.99%
06/02/2001abc2%
.............
23/07/2010abc1.34%
.....  
10/01/2003xyz2.3%
10/01/2003xyz1.87%
.........
28/06/2009xyz3%
02/07/2009xyz1.2%
02/07/2009xyz0.59%
...........
30/09/2009xyz1.23%
04/10/2009xyz

2%

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

First the relatively easy part. When you have an actual SAS date value you can create groups that SAS procedures will recognize by using a Format. So if you use a format such as YYQ6 with a date value in proc means or summary you create calendar groups.

 

I don't have your data and this may not work if you have enough combinations of date and company:

 

Proc summary data=have nway;

   class date company;

   format date yyq6.;

   var return;

   output out=want (drop= _freq_ _type_) mean =;

run;

You would need a pass through a data step to ensure the date value was the end of the quarter as this approach tends to keep the earliest value actually encountered in the data for the group.

 

Alternatively you could create a new variable that holds the desired value and then summarize:

Data need;
    set have;
    endofquarter = intnx('quarter', date,0,'E');
    format endofquarter ddmmyy10.;
run;
proc sort data=need;
   by endofquarter company;
run;

proc summary data=need;
   by endofquarter company;
   var return;
   output out=want  mean=;
run;

Now for the bad news: averaging daily percentages is very likely not going to yield a true percentage unless the number of values is exactly the same for every single day of record per company.

 

An easy example of one rate and why you may want to reconsider, or redefine, the problem.

Suppose I drive a car and for 1 mile I am costing downhill and on only use 0.01 gallons of gas, the mpg=100. Then I drive the next 100 miles and use 5 gallons or mpg=20. So would the average mpg across that tripe be the mean (100+20)/2 = 60?

 

View solution in original post

2 REPLIES 2
ballardw
Super User

First the relatively easy part. When you have an actual SAS date value you can create groups that SAS procedures will recognize by using a Format. So if you use a format such as YYQ6 with a date value in proc means or summary you create calendar groups.

 

I don't have your data and this may not work if you have enough combinations of date and company:

 

Proc summary data=have nway;

   class date company;

   format date yyq6.;

   var return;

   output out=want (drop= _freq_ _type_) mean =;

run;

You would need a pass through a data step to ensure the date value was the end of the quarter as this approach tends to keep the earliest value actually encountered in the data for the group.

 

Alternatively you could create a new variable that holds the desired value and then summarize:

Data need;
    set have;
    endofquarter = intnx('quarter', date,0,'E');
    format endofquarter ddmmyy10.;
run;
proc sort data=need;
   by endofquarter company;
run;

proc summary data=need;
   by endofquarter company;
   var return;
   output out=want  mean=;
run;

Now for the bad news: averaging daily percentages is very likely not going to yield a true percentage unless the number of values is exactly the same for every single day of record per company.

 

An easy example of one rate and why you may want to reconsider, or redefine, the problem.

Suppose I drive a car and for 1 mile I am costing downhill and on only use 0.01 gallons of gas, the mpg=100. Then I drive the next 100 miles and use 5 gallons or mpg=20. So would the average mpg across that tripe be the mean (100+20)/2 = 60?

 

saurabh_latrobe
Fluorite | Level 6
Thanks a lot for the solution. I agree on your issue of average.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 299 views
  • 1 like
  • 2 in conversation