It is not true that proc sql versus data step is a matter or personal preference or what you learned first. There are things you can do with a data step that you cannot do with proc sql, because a data step is procedural code and proc sql is not. And, there are things you can do with proc sql that you cannot do with a data step, because sql is uses predicate algebra and is a subset of a cartesian product. Many years ago, back in 1998 I was overjoyed to learn that SAS had proc sql to do a proper join with. Trying to code it in a data step is a royal pain. A merge is not a join, and a join is not a merge, although you can sort of simulate a merge with a special form of join in SAS now. To create a proper general solution to the question, that does not require a bunch of convoluted coding, is a data step. It would be much simpler and more elegant, and can work in a general case. Your solution does not work for a general case because the "max" and "min" as you are using them do not alter as you step through the data. They are static quantities. A more general solution, using an assumption that missing dates are missing and not to be interpolated: data results; set in_data; by datestamp; retain prior total cnt; if _n_ = 1 then do; prior = datestamp; total = 0; cnt = 0; end; if (datestamp - prior) <= 20 then do; total = sum(total,returns); cnt+1; end; else do; if cnt > 0 then average_return = total / cnt; else average_return = . ; output; cnt=0; total=0; end; run; Another way; proc sql; create table results as select *, floor(datestamp/20)*20 as base_date, mean(returns) as average_returns from in_data group by floor(datestamp/20) order by base_date ; quit; So, in this case, SQL actually is easier, but produces different results, and are the results close enough to the intent? To make the two the same, you would have to have a subquery to introduce a bias to the base. select *,floor((datestamp-least_date)/20)*20 as base_date, mean(returns) as average_returns from ( select *, min(datestamp) as least_date from in_data) Then there's the question of 20 day intervals, or 20 observations? And do you really want 20 day intervals, or a rolling 20 day average? The Data Step passes through the data once. The SQL passes through the data 3 times. So, which is best? Depends.
... View more