Help with a do loop

Reply
Frequent Contributor
Posts: 91

Help with a do loop

I have a data set for daily returns. (Attached)

I want to make a new data set such that I get the average returns for periods of 20 days.

For instance if I have 100 daily returns, the new data set should have five data points for returns..

Can you please help me with this code.

Thanks

Attachment
Respected Advisor
Posts: 3,894

Re: Help with a do loop

Not really sure when your 20 days periods should start and it appears a bit random that it just should be counted based on the first date in your data.

You have also missing dates in your time serie so what should be done if a 20 day period is on a date which is not in the data?

Below code gives you a rolling average over the last 20 days (current day counted as day 1). This should allow you to pick whatever days you need.

proc sql;

  create table want as

  select distinct l.date,l.return, avg(r.return) as avg_return_last20days

  from ftse l left join (select *, min(date) as min_date from ftse) r

    on l.date >= r.date and l.date-20<r.date and l.date>=min_date+19

  group by l.date

  ;

quit;

Occasional Contributor
Posts: 9

Re: Help with a do loop

This is not a general purpose solution.

It only works for a specific small set of data, and will not work for a dataset that spans more than 40 days.

But, you do raise a lot of good questions.

Those need to be answered before an appropriate solution can be arrived at.

One thing to look into would be the intnx function and its related functions.

I would also recommend using a single data step.

Trying to make proc sql work would be a mental exercise, and is contrary to the basic principles of SAS coding, use the simpliest method to minize coding effort.

Respected Advisor
Posts: 3,894

Re: Help with a do loop

If you regard a data step or a SQL as easier to implement will mainly depend on your familiarity with the two syntaxes. If the source is a database table then SQL is often preferable.

As to my knowledge there is no out-of-the-box 20 days inteval for SAS calendar functions so you would have to create a custom interval. Not that this is very difficult but I don't really see the value in it as it's so easy to just substract 20 days from a SAS date value.

PROC Star
Posts: 7,363

Re: Help with a do loop

Patrick: Such an interval is easily created with the out-of-the-box intervalds option.

Respected Advisor
Posts: 3,894

Re: Help with a do loop

Haven't I said that? It's not difficult but you still need this extra datastep creating the table for the custome intervals.

PROC Star
Posts: 7,363

Re: Help with a do loop

Patrick: Okay.  As I carefully read your response the phrase "custom interval" was definitely there.  However, like formats, the custom interval datasets can be made permanent, thus can be quite useful for repeating tasks.

Occasional Contributor
Posts: 9

Re: Help with a do loop

It is not true that proc sql versus data step is a matter or personal preference or what you learned first.

There are things you can do with a data step that you cannot do with proc sql, because a data step is procedural code and proc sql is not.

And, there are things you can do with proc sql that you cannot do with a data step, because sql is uses predicate algebra and is a subset of a cartesian product.

Many years ago, back in 1998 I was overjoyed to learn that SAS had proc sql to do a proper join with.  Trying to code it in a data step is a royal pain.

A merge is not a join, and a join is not a merge, although you can sort of simulate a merge with a special form of join in SAS now.

To create a proper general solution to the question, that does not require a bunch of convoluted coding, is a data step.

It would be much simpler and more elegant, and can work in a general case.

Your solution does not work for a general case because the "max" and "min" as you are using them do not alter as you step through the data.  They are static quantities.

A more general solution, using an assumption that missing dates are missing and not to be interpolated:

data results;

  set in_data;

  by datestamp;

  retain prior total cnt;

  if _n_ = 1 then do;

    prior = datestamp;

    total = 0;

    cnt = 0;

  end;

  if (datestamp - prior) <= 20  then do;

    total = sum(total,returns);

    cnt+1;

  end;

  else do;

    if cnt > 0 then average_return = total / cnt;

                   else average_return = . ;

    output;

    cnt=0;

    total=0;

  end;

run;

Another way;

proc sql;

  create table results as

  select *, floor(datestamp/20)*20 as base_date, mean(returns) as average_returns

     from in_data

    group by floor(datestamp/20)

    order by base_date

  ;

quit;

So, in this case, SQL actually is easier, but produces different results, and are the results close enough to the intent?

To make the two the same, you would have to have a subquery to introduce a bias to the base.

select *,floor((datestamp-least_date)/20)*20 as base_date, mean(returns) as average_returns

from ( select *, min(datestamp) as least_date from in_data)

Then there's the question of 20 day intervals, or 20 observations?

And do you really want 20 day intervals, or a rolling 20 day average?

The Data Step passes through the data once.

The SQL passes through the data 3 times.

So, which is best?

Depends.

Respected Advisor
Posts: 3,894

Re: Help with a do loop

One of the things I like about SAS is that there are always many ways for solving a problem. I agree of course that SAS data step and SQL are not the same.

As for the SQL I've posted:

I normally test my code if sample data is provided. The OP attached several years worth of data and it looks to me as if I'm getting correct results also after the first 40 days.

The min() function is used to derive the earliest day in the whole sample and its only purpose is to not calculate an average before there are 20 days of data available (as it's supposed to be a roling 20 days window).

You could easily replace the hard coded days with macro variables so the code I've posted looks "general" enough to me.

As for performance/passes through data:

It sure can be done less wasteful using a data step. But considering the volumes does it really matter? I've just run a test over 50 years of data and it took on my laptop around 14 seconds.

Ask a Question
Discussion stats
  • 8 replies
  • 456 views
  • 2 likes
  • 4 in conversation