turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- sumif and a countif in sas

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

3 weeks ago

Hi ,

I have a table like this below , from this i have to do two things

1) i have to count all the one (1) that is present for each variable

2) i have to add the all probability scores of those variable where the value is one

i have given here a dummy data , in the original data set which i am using is having around 300 variables like that and for all the variable i have to do the same task

data have;

infile cards dlm='09'x;

input age1-age7 probability;

cards;

0 0 1 0 0 0 0 0.037316109

0 0 1 0 0 0 0 0.012645846

0 0 1 0 0 0 0 0.030745011

1 0 0 0 0 0 0 0.109232968

0 1 0 0 0 0 0 0.121137734

0 0 0 1 0 0 0 0.064422562

0 0 1 0 0 0 0 0.030978301

0 0 1 0 0 0 0 0.030990311

0 0 1 0 0 0 0 0.15763914

1 0 0 0 0 0 0 0.05447492

1 0 0 0 0 0 0 0.032326732

0 0 1 0 0 0 0 0.100191375

0 0 1 0 0 0 0 0.015438986

1 0 0 0 0 0 0 0.026323937

0 1 0 0 0 0 0 0.069376926

0 1 0 0 0 0 0 0.030861443

0 0 0 0 1 0 0 0.019696911

;

run;

Want

Var names | Count of one | Sum of Probability |

age_1 | 4 | 0.222358557 |

age_2 | 3 | 0.221376102 |

age_3 | 8 | 0.415945079 |

age_4 | 1 | 0.064422562 |

age_5 | 1 | 0.019696911 |

age_6 | 0 | 0 |

age_7 | 0 | 0 |

Accepted Solutions

Solution

3 weeks ago

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to soham_sas

3 weeks ago

I think @PaigeMiller is moving in the right direction here. However, with 300 variables, I would switch things around a little:

proc summary data=have;

var age1-age300;

weight probability;

output out=pr sum=;

run;

That gives you the total probability for each AGE variable. Getting the count of the 1's is simpler ... get rid of the WEIGHT statement.

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to soham_sas

3 weeks ago

UNTESTED CODE

```
proc summary data=have;
var probability;
weight age1;
output out=stats1 sum=sum_probability sumwgt=n_age1;
run;
/* repeat for age2 through age 7 */
```

--

Paige Miller

Paige Miller

Solution

3 weeks ago

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to soham_sas

3 weeks ago

I think @PaigeMiller is moving in the right direction here. However, with 300 variables, I would switch things around a little:

proc summary data=have;

var age1-age300;

weight probability;

output out=pr sum=;

run;

That gives you the total probability for each AGE variable. Getting the count of the 1's is simpler ... get rid of the WEIGHT statement.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Astounding

3 weeks ago

Hi @Astounding and @PaigeMiller, if you have got a couple of mins to spare, would you mind a small explanation for the weight statement plz? like what it does and why use it.

The doc is rather confusing me

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MarkWik

3 weeks ago

It computes (in this case) a weighted sum instead of the ordinary sum. Using @Astounding's code, the sum of the zeros and ones are weighted by the probability. So, if age1 is 0, this contributes 0*probability to the sum; if age1 is 1, this contributes 1*probability to the sum.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to soham_sas

3 weeks ago

Try this:

```
data temp;
set have;
age = whichn(1, of age1-age7);
run;
proc sql;
create table want as
select
age,
count(*) as n,
sum(probability) as probability
from temp
group by age;
select * from want;
quit;
```

PG

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to soham_sas

3 weeks ago

Not the best method for 300 variables, but to answer your question about SUMIF, you can do something like this is in SAS to achieve SUMIF's logic:

```
data WANT;
set HAVE end=LASTOBS;
AGE_1+AGE1*PROBABILITY;
if LASTOBS;
putlog AGE_1=;
run;
```

AGE_1=0.222358557

The logic is:

`AGE_1 + (any test you want goes here and 0 or 1 will be returned) * PROBABILITY;`

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to soham_sas

3 weeks ago

Please try below solution.

DATA TEMP;

infile cards;

input age1-age7 probability;

cards;

0 0 1 0 0 0 0 0.037316109

0 0 1 0 0 0 0 0.012645846

0 0 1 0 0 0 0 0.030745011

1 0 0 0 0 0 0 0.109232968

0 1 0 0 0 0 0 0.121137734

0 0 0 1 0 0 0 0.064422562

0 0 1 0 0 0 0 0.030978301

0 0 1 0 0 0 0 0.030990311

0 0 1 0 0 0 0 0.15763914

1 0 0 0 0 0 0 0.05447492

1 0 0 0 0 0 0 0.032326732

0 0 1 0 0 0 0 0.100191375

0 0 1 0 0 0 0 0.015438986

1 0 0 0 0 0 0 0.026323937

0 1 0 0 0 0 0 0.069376926

0 1 0 0 0 0 0 0.030861443

0 0 0 0 1 0 0 0.019696911

;

run;

PROC SORT DATA=TEMP;

BY probability;

PROC TRANSPOSE DATA=TEMP OUT=TEMP_1 NAME= Age_Col;

BY probability;

VAR Age:;

RUN;

PROC SQL;

SELECT

Age_col,

SUM(COL1) as Total_Count,

SUM(

CASE WHEN COL1=1 THEN probability

ELSE 0 END

) as Sum_Prob

FROM

TEMP_1

GROUP BY

Age_col

;

QUIT;

DATA TEMP;

infile cards;

input age1-age7 probability;

cards;

0 0 1 0 0 0 0 0.037316109

0 0 1 0 0 0 0 0.012645846

0 0 1 0 0 0 0 0.030745011

1 0 0 0 0 0 0 0.109232968

0 1 0 0 0 0 0 0.121137734

0 0 0 1 0 0 0 0.064422562

0 0 1 0 0 0 0 0.030978301

0 0 1 0 0 0 0 0.030990311

0 0 1 0 0 0 0 0.15763914

1 0 0 0 0 0 0 0.05447492

1 0 0 0 0 0 0 0.032326732

0 0 1 0 0 0 0 0.100191375

0 0 1 0 0 0 0 0.015438986

1 0 0 0 0 0 0 0.026323937

0 1 0 0 0 0 0 0.069376926

0 1 0 0 0 0 0 0.030861443

0 0 0 0 1 0 0 0.019696911

;

run;

PROC SORT DATA=TEMP;

BY probability;

PROC TRANSPOSE DATA=TEMP OUT=TEMP_1 NAME= Age_Col;

BY probability;

VAR Age:;

RUN;

PROC SQL;

SELECT

Age_col,

SUM(COL1) as Total_Count,

SUM(

CASE WHEN COL1=1 THEN probability

ELSE 0 END

) as Sum_Prob

FROM

TEMP_1

GROUP BY

Age_col

;

QUIT;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mahesh146

3 weeks ago

I just want to throw out a little bit of philosophy that you can adopt, or ignore, or argue with, about why I choose a PROC instead of a data step to do these calculations, which are really just addition and multiplication. (And I want to thank @Astounding for coming up with a much better solution that I had along similar lines)

My philosophy is this: SAS has already written code to compute sums and averages and weighted sums and weighted averages in several different PROCs. They have debugged it, and thousands/millions/tens of millions of applications later, we know the calculations inside these PROCs are correct. So I believe it is to my benefit to use the work that SAS has already done to compute sums and averages (and other similar statistics) — the coding is simpler, and if I write my own code to do this I can make errors

In the end, that's why we have SAS, isn't it? They do all the work to create these algorithms so we don't have to.

Anyway, there's nothing wrong with doing it via data step or PROC SQL, if that's your choice. It's just not my choice to do that level of coding creating an algorithm to do sums and averages.

As an amusing side story, I once took over some code someone else had written. THere were several complicated data steps of about 15-20 lines of code, and it took me about 30 minutes to figure out what these data steps were doing. These complicated data steps were just adding up zeros and ones. Hence, another reason to use built in SAS procedures to do basic statistics is that the next person who has to make use of your code will understand quickly what is being done.

--

Paige Miller

Paige Miller