Hi all
I have a dataset that has a date and hour then 1000 columns of stock holdings by the hour that I want to sum by date. This is a reduced version of what my dataset looks like:
DATE HOUR HOLDING_AAPL HOLDING_MSF HOLDING_AMZN
2020/08/05 1 5 10 7
2020/08/05 2 3 8 8
2020/08/05 3 6 11 10
2020/08/06 1 12 10 7
2020/08/06 2 9 9 3
2020/08/06 3 7 15 8
And this is what I want it to look like:
DATE SUM_HOLDING_AAPL SUM_HOLDING_MSFT SUM_HOLDING_AMZN
2020/08/05 14 29 25
2020/08/06 28 34 18
I know this code:
proc summary data=have nway;
class date;
var holding_aapl;
output out=want(drop=_:) sum(holding_aapl)=sum_holding_aapl;
run;
but that only works for one variable at a time. Is there a way to do it so I can get all 1000 columns at the same time without having to manually put them into the var line?
proc summary data=have nway;
class date;
var holding_aapl--holding_amzn;
output out=want(drop=_:) sum= /autoname;
run;
Naturally, you could add in all the columns above in the VAR statement. If you have 1000 columns and the first one is named HOLDING_APPL and the last one is name HOLDING_1000, then you would use
var holding_aapl--holding_1000;
proc summary data=have nway;
class date;
var holding_aapl--holding_amzn;
output out=want(drop=_:) sum= /autoname;
run;
Naturally, you could add in all the columns above in the VAR statement. If you have 1000 columns and the first one is named HOLDING_APPL and the last one is name HOLDING_1000, then you would use
var holding_aapl--holding_1000;
@MellyJ13 wrote:
Hi all
I have a dataset that has a date and hour then 1000 columns of stock holdings by the hour that I want to sum by date. This is a reduced version of what my dataset looks like:
DATE HOUR HOLDING_AAPL HOLDING_MSF HOLDING_AMZN
2020/08/05 1 5 10 7
2020/08/05 2 3 8 8
2020/08/05 3 6 11 10
2020/08/06 1 12 10 7
2020/08/06 2 9 9 3
2020/08/06 3 7 15 8
And this is what I want it to look like:
DATE SUM_HOLDING_AAPL SUM_HOLDING_MSFT SUM_HOLDING_AMZN
2020/08/05 14 29 25
2020/08/06 28 34 18
I know this code:
proc summary data=have nway;
class date;
var holding_aapl;
output out=want(drop=_:) sum(holding_aapl)=sum_holding_aapl;
run;
but that only works for one variable at a time. Is there a way to do it so I can get all 1000 columns at the same time without having to manually put them into the var line?
Not sure where you got the idea that it only works for one variable at a time, I think this will work with PROC SUMMARY but I'm more familiar with PROC MEANS so this will work for sure.
proc means data=have nway noprint;
class date;
var holding:;
output out=want(drop=_:) sum = /autoname;
run;
Note the use of the colon operator to reference all holding variables. It will include all variables that start with the holding prefix.
Here is a reference that illustrates how to refer to variables and datasets in a short cut list:
https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html
To make the code fully dynamic, you have to transpose the data from wide to long. Having 1000 variables is - in most cases - a sign of design failure.
data have;
input DATE yymmdd10. HOUR HOLDING_AAPL HOLDING_MSF HOLDING_AMZN;
format date yymmdds10.;
datalines;
2020/08/05 1 5 10 7
2020/08/05 2 3 8 8
2020/08/05 3 6 11 10
2020/08/06 1 12 10 7
2020/08/06 2 9 9 3
2020/08/06 3 7 15 8
;
proc transpose data=have out=transposed(rename=(col1=value)) name=source;
by date hour;
var holding_aapl -- holding_amzn;
run;
proc summary data=transposed nway;
class date source;
var value;
output out=want(drop= _type_ _freq_) sum=;
run;
In general, the solution from @andreas_lds is better than mine, because if you start with the long data set (the one he calls TRANSPOSED) instead of the wide data set, you don't have to know the variable names.
Even if you have to start with a wide dataset (because that's how you received the data), then transposing is still the better method because you don't have to deal with variable names like HOLDING_AAPL_SUM and HOLDING_MSF_SUM and so on, and the output from PROC SUMMARY is nice and neat an usable.
Long beats wide. Maxim 19.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.