Solved: Re: Looping through columns to get sum of each column

qqian · Posted 02-20-2020 04:00 PM

Hi,

I have a dataset with more than 20 columns and I want to get the sum of each column without writing sum for over 20 times.

I made a sample dataset as follows

(the actual dataset will have much more columns, and there's not a particular order of column names):

********************************************

data sample;
infile datalines delimiter=',';
input Email $ code1 code3 codeA;
datalines;
E1,1,0,1
E2,1,1,0
E3,0,1,1
E4,0,0,1
E5,0,0,0
E6,1,1,1
;
run;

********************************************

what I want to get,

code1	code3	codeA
3	3	4

********************************************

I know I can do:

proc sql;

select sum(code1) as Code1, sum(code3) as Code3, sum(codeA) as CodeA

from sample;

quit;

But when there are so many columns, can I do some looping in sas in order to make it more efficient?

I also tried:

data OUT;
set sample;
array code code1 -- codeA;
do over code;
code = sum(code);
end;

run;

But it still displays every email, and give me the same dataset as sample. Don' know how I can modify the code to make it work.

novinosrin · Posted 02-20-2020 04:06 PM


data sample;
infile datalines delimiter=',';
input Email $ code1 code3 codeA;
datalines;
E1,1,0,1
E2,1,1,0
E3,0,1,1
E4,0,0,1
E5,0,0,0
E6,1,1,1
;
run;

proc summary data=sample ;
var code:;
output out=want(keep=code:) sum=;
run;

The SAS recommendation is that, when the requirement is to summarize, the 1st place to look is procedures summary/freq etc

View solution in original post

novinosrin · Posted 02-20-2020 04:06 PM


data sample;
infile datalines delimiter=',';
input Email $ code1 code3 codeA;
datalines;
E1,1,0,1
E2,1,1,0
E3,0,1,1
E4,0,0,1
E5,0,0,0
E6,1,1,1
;
run;

proc summary data=sample ;
var code:;
output out=want(keep=code:) sum=;
run;

The SAS recommendation is that, when the requirement is to summarize, the 1st place to look is procedures summary/freq etc

qqian · Posted 02-20-2020 04:11 PM

Hi @novinosrin ,

Thanks for the reply!!!

My follow up question is, what if there's no common part of the columns? They don't start with 'code'. For example, column names are: AA, BB,SJ, EWK.

novinosrin · Posted 02-20-2020 04:16 PM

Hmm, that's a pain. If you do know the start to end sequence present in the dataset, you could list like

var aa--ewk;

But the above assumes the list is in continuous sequence from left to right in terms of position of the variable.

If there is no such a thing and all of these are completely random names and in any random order/position, i'm afraid there isn't much one can do about it. 😞

qqian · Posted 02-20-2020 04:18 PM

I c. thanks!

data_null__ · Posted 02-20-2020 04:20 PM

@novinosrin Why not use _NUMERIC_?


var _numeric_;

PaigeMiller · Posted 02-20-2020 04:25 PM

You could use _NUMERIC_, assuming you want all numeric variables to be summed.

If it is a huge data set, and you don't really need ALL of the numeric variables to be summed, for example there's no point in summing zip codes or dates, etc. then you are trading off execution time against programming/typing time.

--
Paige Miller

novinosrin · Posted 02-20-2020 04:31 PM

Guru @data_null__ Honestly did think of it. To be honest, I once made a terrible mistake at work by using _numeric_ without knowing there were some variables(numeric) of no interest, though rectified it later but that's the problem. Of course I agree, should you have all vars of interest and only those vars in your dataset, _numeric_ is very comfy.

PaigeMiller · Posted 02-20-2020 04:17 PM

Then you can modify the code as follows:

proc summary data=sample ;
var gorilla giraffe octopus cobra elephant oryx lion jackal;
output out=want sum=;
run;

where you type the variable names in the VAR statement.

If they are consecutive columns in the data set named SAMPLE, then it gets even easier

proc summary data=sample ;
var gorilla--jackal;
output out=want sum=;
run;

--
Paige Miller

qqian · Posted 02-20-2020 04:19 PM

thank you Paige! That's also helpful!

Reeza · Posted 02-20-2020 04:53 PM

Here is a reference that illustrates how to refer to variables and datasets in a short cut list:
https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html

Registration is open

SAS Training: Just a Click Away