BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MellyJ13
Calcite | Level 5

Hi all

 

I have a dataset that has a date and hour then 1000 columns of stock holdings by the hour that I want to sum by date. This is a reduced version of what my dataset looks like:

DATE             HOUR            HOLDING_AAPL       HOLDING_MSF          HOLDING_AMZN

2020/08/05       1                     5                                   10                               7

2020/08/05       2                     3                                   8                                 8

2020/08/05       3                     6                                   11                               10

2020/08/06       1                     12                                  10                               7

2020/08/06       2                     9                                  9                                   3

2020/08/06       3                     7                                   15                               8

 

And this is what I want it to look like:

DATE               SUM_HOLDING_AAPL     SUM_HOLDING_MSFT    SUM_HOLDING_AMZN

2020/08/05           14                                              29                                     25

2020/08/06           28                                              34                                     18

 

I know this code:

 

proc summary data=have nway;
class date;
var holding_aapl;
output out=want(drop=_:) sum(holding_aapl)=sum_holding_aapl;
run;

 

but that only works for one variable at a time. Is there a way to do it so I can get all 1000 columns at the same time without having to manually put them into the var line?

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26
proc summary data=have nway;
    class date;
    var holding_aapl--holding_amzn;
    output out=want(drop=_:) sum= /autoname;
run;

Naturally, you could add in all the columns above in the VAR statement. If you have 1000 columns and the first one is named HOLDING_APPL and the last one is name HOLDING_1000, then you would use

 

var holding_aapl--holding_1000;

 

--
Paige Miller

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26
proc summary data=have nway;
    class date;
    var holding_aapl--holding_amzn;
    output out=want(drop=_:) sum= /autoname;
run;

Naturally, you could add in all the columns above in the VAR statement. If you have 1000 columns and the first one is named HOLDING_APPL and the last one is name HOLDING_1000, then you would use

 

var holding_aapl--holding_1000;

 

--
Paige Miller
Reeza
Super User

@MellyJ13 wrote:

Hi all

 

I have a dataset that has a date and hour then 1000 columns of stock holdings by the hour that I want to sum by date. This is a reduced version of what my dataset looks like:

DATE             HOUR            HOLDING_AAPL       HOLDING_MSF          HOLDING_AMZN

2020/08/05       1                     5                                   10                               7

2020/08/05       2                     3                                   8                                 8

2020/08/05       3                     6                                   11                               10

2020/08/06       1                     12                                  10                               7

2020/08/06       2                     9                                  9                                   3

2020/08/06       3                     7                                   15                               8

 

And this is what I want it to look like:

DATE               SUM_HOLDING_AAPL     SUM_HOLDING_MSFT    SUM_HOLDING_AMZN

2020/08/05           14                                              29                                     25

2020/08/06           28                                              34                                     18

 

I know this code:

 

proc summary data=have nway;
class date;
var holding_aapl;
output out=want(drop=_:) sum(holding_aapl)=sum_holding_aapl;
run;

 

but that only works for one variable at a time. Is there a way to do it so I can get all 1000 columns at the same time without having to manually put them into the var line?

 


Not sure where you got the idea that it only works for one variable at a time, I think this will work with PROC SUMMARY but I'm more familiar with PROC MEANS so this will work for sure.

 

proc means data=have nway noprint;
class date;
var holding:;
output out=want(drop=_:) sum = /autoname;
run;


Note the use of the colon operator to reference all holding variables. It will include all variables that start with the holding prefix.

 

Here is a reference that illustrates how to refer to variables and datasets in a short cut list:
https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html

andreas_lds
Jade | Level 19

To make the code fully dynamic, you have to transpose the data from wide to long. Having 1000 variables is - in most cases - a sign of design failure.

data have;
   input DATE yymmdd10. HOUR HOLDING_AAPL HOLDING_MSF HOLDING_AMZN;
   format date yymmdds10.;
datalines;
2020/08/05 1 5 10 7
2020/08/05 2 3 8 8
2020/08/05 3 6 11 10
2020/08/06 1 12 10 7
2020/08/06 2 9 9 3
2020/08/06 3 7 15 8
;

proc transpose data=have out=transposed(rename=(col1=value)) name=source;
   by date hour;
   var holding_aapl -- holding_amzn;
run;

proc summary data=transposed nway;
   class date source;
   var value;
   output out=want(drop= _type_ _freq_) sum=;
run;
PaigeMiller
Diamond | Level 26

In general, the solution from @andreas_lds is better than mine, because if you start with the long data set (the one he calls TRANSPOSED) instead of the wide data set, you don't have to know the variable names.

 

Even if you have to start with a wide dataset (because that's how you received the data), then transposing is still the better method because you don't have to deal with variable names like HOLDING_AAPL_SUM and HOLDING_MSF_SUM and so on, and the output from PROC SUMMARY is nice and neat an usable.

 

Long beats wide. Maxim 19.

--
Paige Miller

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1611 views
  • 4 likes
  • 4 in conversation