BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
qqian
Calcite | Level 5

Hi,

 

I have a dataset with more than 20 columns and I want to get the sum of each column without writing sum for over 20 times.

 

I made a sample dataset as follows

(the actual dataset will have much more columns, and there's not a particular order of column names):

********************************************

data sample;
infile datalines delimiter=',';
input Email $ code1 code3 codeA;
datalines;
E1,1,0,1
E2,1,1,0
E3,0,1,1
E4,0,0,1
E5,0,0,0
E6,1,1,1
;
run;

 

********************************************

what I want to get,

 

code1

code3

codeA

3

3

4

 

********************************************

I know I can do:

proc sql;

select sum(code1) as Code1, sum(code3) as Code3, sum(codeA) as CodeA

from sample;

quit;

 

But when there are so many columns, can I do some looping in sas in order to make it more efficient?

 

I also tried:

 

data OUT;
set sample;
array code code1 -- codeA;
do over code;
code = sum(code);
end;

run;

 

But it still displays every email, and give me the same dataset as sample. Don' know how I can modify the code to make it work.

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

data sample;
infile datalines delimiter=',';
input Email $ code1 code3 codeA;
datalines;
E1,1,0,1
E2,1,1,0
E3,0,1,1
E4,0,0,1
E5,0,0,0
E6,1,1,1
;
run;

proc summary data=sample ;
var code:;
output out=want(keep=code:) sum=;
run;

The SAS recommendation is that, when the requirement is to summarize, the 1st place to look is procedures summary/freq etc 

View solution in original post

10 REPLIES 10
novinosrin
Tourmaline | Level 20

data sample;
infile datalines delimiter=',';
input Email $ code1 code3 codeA;
datalines;
E1,1,0,1
E2,1,1,0
E3,0,1,1
E4,0,0,1
E5,0,0,0
E6,1,1,1
;
run;

proc summary data=sample ;
var code:;
output out=want(keep=code:) sum=;
run;

The SAS recommendation is that, when the requirement is to summarize, the 1st place to look is procedures summary/freq etc 

qqian
Calcite | Level 5

Hi @novinosrin ,

 

Thanks for the reply!!!

 

My follow up question is, what if there's no common part of the columns? They don't start with 'code'. For example, column names are: AA, BB,SJ, EWK.

 

novinosrin
Tourmaline | Level 20

Hmm, that's a pain. If you do know the start to end sequence present in the dataset, you could list like

 

var aa--ewk;

 

But the above assumes the list is in continuous sequence from left to right in terms of position of the variable.

 

If there is no such a thing and all of these are completely random names and in any random order/position, i'm afraid there isn't much one can do about it. 😞

qqian
Calcite | Level 5
I c. thanks!
data_null__
Jade | Level 19

@novinosrin Why not use _NUMERIC_?

 


var _numeric_;
PaigeMiller
Diamond | Level 26

You could use _NUMERIC_, assuming you want all numeric variables to be summed.

 

If it is a huge data set, and you don't really need ALL of the numeric variables to be summed, for example there's no point in summing zip codes or dates, etc. then you are trading off execution time against programming/typing time.

--
Paige Miller
novinosrin
Tourmaline | Level 20

Guru @data_null__  Honestly did think of it. To be honest, I once made a terrible mistake at work by using _numeric_  without knowing there were some variables(numeric) of no interest, though rectified it later but that's the problem. Of course I agree, should you have all vars of interest and only those vars in your dataset, _numeric_ is very comfy.

PaigeMiller
Diamond | Level 26

Then you can modify the code as follows:

 

proc summary data=sample ;
var gorilla giraffe octopus cobra elephant oryx lion jackal;
output out=want sum=;
run;

where you type the variable names in the VAR statement.

 

If they are consecutive columns in the data set named SAMPLE, then it gets even easier

 

proc summary data=sample ;
var gorilla--jackal;
output out=want sum=;
run;
--
Paige Miller
qqian
Calcite | Level 5
thank you Paige! That's also helpful!
Reeza
Super User

Here is a reference that illustrates how to refer to variables and datasets in a short cut list:
https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2096 views
  • 3 likes
  • 5 in conversation