Solved: sumif and a countif in sas

soham_sas · Posted 05-31-2018 02:15 PM

Hi ,

I have a table like this below , from this i have to do two things

1) i have to count all the one (1) that is present for each variable

2) i have to add the all probability scores of those variable where the value is one

i have given here a dummy data , in the original data set which i am using is having around 300 variables like that and for all the variable i have to do the same task

data have;
infile cards dlm='09'x;
input age1-age7 probability;
cards;
0 0 1 0 0 0 0 0.037316109
0 0 1 0 0 0 0 0.012645846
0 0 1 0 0 0 0 0.030745011
1 0 0 0 0 0 0 0.109232968
0 1 0 0 0 0 0 0.121137734
0 0 0 1 0 0 0 0.064422562
0 0 1 0 0 0 0 0.030978301
0 0 1 0 0 0 0 0.030990311
0 0 1 0 0 0 0 0.15763914
1 0 0 0 0 0 0 0.05447492
1 0 0 0 0 0 0 0.032326732
0 0 1 0 0 0 0 0.100191375
0 0 1 0 0 0 0 0.015438986
1 0 0 0 0 0 0 0.026323937
0 1 0 0 0 0 0 0.069376926
0 1 0 0 0 0 0 0.030861443
0 0 0 0 1 0 0 0.019696911
;
run;

Want

Var names	Count of one	Sum of Probability
age_1	4	0.222358557
age_2	3	0.221376102
age_3	8	0.415945079
age_4	1	0.064422562
age_5	1	0.019696911
age_6	0	0
age_7	0	0

Astounding · Posted 05-31-2018 02:47 PM

I think @PaigeMiller is moving in the right direction here. However, with 300 variables, I would switch things around a little:

proc summary data=have;

var age1-age300;

weight probability;

output out=pr sum=;

run;

That gives you the total probability for each AGE variable. Getting the count of the 1's is simpler ... get rid of the WEIGHT statement.

View solution in original post

PaigeMiller · Posted 05-31-2018 02:28 PM

UNTESTED CODE

proc summary data=have;
    var probability;
    weight age1;
    output out=stats1 sum=sum_probability sumwgt=n_age1;
run;
/* repeat for age2 through age 7 */

--
Paige Miller

Astounding · Posted 05-31-2018 02:47 PM

I think @PaigeMiller is moving in the right direction here. However, with 300 variables, I would switch things around a little:

proc summary data=have;

var age1-age300;

weight probability;

output out=pr sum=;

run;

That gives you the total probability for each AGE variable. Getting the count of the 1's is simpler ... get rid of the WEIGHT statement.

MarkWik · Posted 05-31-2018 02:50 PM

Hi @Astounding and @PaigeMiller, if you have got a couple of mins to spare, would you mind a small explanation for the weight statement plz? like what it does and why use it.

The doc is rather confusing me

PaigeMiller · Posted 05-31-2018 02:57 PM

It computes (in this case) a weighted sum instead of the ordinary sum. Using @Astounding's code, the sum of the zeros and ones are weighted by the probability. So, if age1 is 0, this contributes 0*probability to the sum; if age1 is 1, this contributes 1*probability to the sum.

--
Paige Miller

PGStats · Posted 05-31-2018 06:35 PM

Try this:

data temp;
set have;
age = whichn(1, of age1-age7);
run;

proc sql;
create table want as
select
    age,
    count(*) as n,
    sum(probability) as probability
from temp
group by age;
select * from want;
quit;

PG

ChrisNZ · Posted 06-01-2018 12:03 AM

Not the best method for 300 variables, but to answer your question about SUMIF, you can do something like this is in SAS to achieve SUMIF's logic:

data WANT;
  set HAVE end=LASTOBS;
  AGE_1+AGE1*PROBABILITY;
  if LASTOBS;
  putlog AGE_1=;
run;

AGE_1=0.222358557

The logic is:

AGE_1 + (any test you want goes here and 0 or 1 will be returned) * PROBABILITY;

High-Performance SAS Coding - Third Edition

mahesh146 · Posted 06-01-2018 06:55 AM

Please try below solution.
DATA TEMP;
infile cards;
input age1-age7 probability;
cards;
0 0 1 0 0 0 0 0.037316109
0 0 1 0 0 0 0 0.012645846
0 0 1 0 0 0 0 0.030745011
1 0 0 0 0 0 0 0.109232968
0 1 0 0 0 0 0 0.121137734
0 0 0 1 0 0 0 0.064422562
0 0 1 0 0 0 0 0.030978301
0 0 1 0 0 0 0 0.030990311
0 0 1 0 0 0 0 0.15763914
1 0 0 0 0 0 0 0.05447492
1 0 0 0 0 0 0 0.032326732
0 0 1 0 0 0 0 0.100191375
0 0 1 0 0 0 0 0.015438986
1 0 0 0 0 0 0 0.026323937
0 1 0 0 0 0 0 0.069376926
0 1 0 0 0 0 0 0.030861443
0 0 0 0 1 0 0 0.019696911
;
run;

PROC SORT DATA=TEMP;
BY probability;

PROC TRANSPOSE DATA=TEMP OUT=TEMP_1 NAME= Age_Col;
BY probability;
VAR Age:;
RUN;

PROC SQL;

SELECT
Age_col,
SUM(COL1) as Total_Count,
SUM(
CASE WHEN COL1=1 THEN probability
ELSE 0 END
) as Sum_Prob
FROM
TEMP_1
GROUP BY
Age_col
;
QUIT;

PaigeMiller · Posted 06-01-2018 01:00 PM

I just want to throw out a little bit of philosophy that you can adopt, or ignore, or argue with, about why I choose a PROC instead of a data step to do these calculations, which are really just addition and multiplication. (And I want to thank @Astounding for coming up with a much better solution that I had along similar lines)

My philosophy is this: SAS has already written code to compute sums and averages and weighted sums and weighted averages in several different PROCs. They have debugged it, and thousands/millions/tens of millions of applications later, we know the calculations inside these PROCs are correct. So I believe it is to my benefit to use the work that SAS has already done to compute sums and averages (and other similar statistics) — the coding is simpler, and if I write my own code to do this I can make errors

In the end, that's why we have SAS, isn't it? They do all the work to create these algorithms so we don't have to.

Anyway, there's nothing wrong with doing it via data step or PROC SQL, if that's your choice. It's just not my choice to do that level of coding creating an algorithm to do sums and averages.

As an amusing side story, I once took over some code someone else had written. THere were several complicated data steps of about 15-20 lines of code, and it took me about 30 minutes to figure out what these data steps were doing. These complicated data steps were just adding up zeros and ones. Hence, another reason to use built in SAS procedures to do basic statistics is that the next person who has to make use of your code will understand quickly what is being done.

--
Paige Miller

sumif and a countif in sas

Re: sumif and a countif in sas

Re: sumif and a countif in sas

Re: sumif and a countif in sas

Re: sumif and a countif in sas

Re: sumif and a countif in sas

Re: sumif and a countif in sas

Re: sumif and a countif in sas

Re: sumif and a countif in sas

Re: sumif and a countif in sas

Classroom Training Available!