BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
soham_sas
Quartz | Level 8

Hi , 

 

I have a table like this below , from this i have to do two things 

1) i have to count all the one (1) that is present for each variable 

2) i have to add the all probability scores of those variable where the value is one 

 

i have given here a dummy  data , in the original data set which i am using is having around 300 variables like that and for all the variable i have to do the same task

 


data have;
infile cards dlm='09'x;
input age1-age7 probability;
cards;
0 0 1 0 0 0 0 0.037316109
0 0 1 0 0 0 0 0.012645846
0 0 1 0 0 0 0 0.030745011
1 0 0 0 0 0 0 0.109232968
0 1 0 0 0 0 0 0.121137734
0 0 0 1 0 0 0 0.064422562
0 0 1 0 0 0 0 0.030978301
0 0 1 0 0 0 0 0.030990311
0 0 1 0 0 0 0 0.15763914
1 0 0 0 0 0 0 0.05447492
1 0 0 0 0 0 0 0.032326732
0 0 1 0 0 0 0 0.100191375
0 0 1 0 0 0 0 0.015438986
1 0 0 0 0 0 0 0.026323937
0 1 0 0 0 0 0 0.069376926
0 1 0 0 0 0 0 0.030861443
0 0 0 0 1 0 0 0.019696911
;
run;

 

Want

 

Var namesCount of oneSum of Probability
age_140.222358557
age_230.221376102
age_380.415945079
age_410.064422562
age_510.019696911
age_600
age_700


  

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

I think @PaigeMiller is moving in the right direction here.  However, with 300 variables, I would switch things around a little:

 

proc summary data=have;

var age1-age300;

weight probability;

output out=pr sum=;

run;

 

That gives you the total probability for each AGE variable.  Getting the count of the 1's is simpler ... get rid of the WEIGHT statement.

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26

UNTESTED CODE

 

proc summary data=have;
    var probability;
    weight age1;
    output out=stats1 sum=sum_probability sumwgt=n_age1;
run;
/* repeat for age2 through age 7 */

 

--
Paige Miller
Astounding
PROC Star

I think @PaigeMiller is moving in the right direction here.  However, with 300 variables, I would switch things around a little:

 

proc summary data=have;

var age1-age300;

weight probability;

output out=pr sum=;

run;

 

That gives you the total probability for each AGE variable.  Getting the count of the 1's is simpler ... get rid of the WEIGHT statement.

MarkWik
Quartz | Level 8

Hi @Astounding and @PaigeMiller, if you have got a couple of mins to spare, would you mind a small explanation for the weight statement plz? like what it does and why use it.

 

The doc is rather confusing me

PaigeMiller
Diamond | Level 26

It computes (in this case) a weighted sum instead of the ordinary sum. Using @Astounding's code, the sum of the zeros and ones are weighted by the probability. So, if age1 is 0, this contributes 0*probability to the sum; if age1 is 1, this contributes 1*probability to the sum.

--
Paige Miller
PGStats
Opal | Level 21

Try this:

 

data temp;
set have;
age = whichn(1, of age1-age7);
run;

proc sql;
create table want as
select
    age,
    count(*) as n,
    sum(probability) as probability
from temp
group by age;
select * from want;
quit;
PG
ChrisNZ
Tourmaline | Level 20

Not the best method for 300 variables, but to answer your question about SUMIF, you can do something like this is in SAS to achieve SUMIF's logic:

data WANT;
  set HAVE end=LASTOBS;
  AGE_1+AGE1*PROBABILITY;
  if LASTOBS;
  putlog AGE_1=;
run;

AGE_1=0.222358557

The logic is:

AGE_1 + (any test you want goes here and 0 or 1 will be returned) * PROBABILITY;

 

 

mahesh146
Obsidian | Level 7
Please try below solution.
DATA TEMP;
infile cards;
input age1-age7 probability;
cards;
0 0 1 0 0 0 0 0.037316109
0 0 1 0 0 0 0 0.012645846
0 0 1 0 0 0 0 0.030745011
1 0 0 0 0 0 0 0.109232968
0 1 0 0 0 0 0 0.121137734
0 0 0 1 0 0 0 0.064422562
0 0 1 0 0 0 0 0.030978301
0 0 1 0 0 0 0 0.030990311
0 0 1 0 0 0 0 0.15763914
1 0 0 0 0 0 0 0.05447492
1 0 0 0 0 0 0 0.032326732
0 0 1 0 0 0 0 0.100191375
0 0 1 0 0 0 0 0.015438986
1 0 0 0 0 0 0 0.026323937
0 1 0 0 0 0 0 0.069376926
0 1 0 0 0 0 0 0.030861443
0 0 0 0 1 0 0 0.019696911
;
run;




PROC SORT DATA=TEMP;
BY probability;

PROC TRANSPOSE DATA=TEMP OUT=TEMP_1 NAME= Age_Col;
BY probability;
VAR Age:;
RUN;


PROC SQL;

SELECT
Age_col,
SUM(COL1) as Total_Count,
SUM(
CASE WHEN COL1=1 THEN probability
ELSE 0 END
) as Sum_Prob
FROM
TEMP_1
GROUP BY
Age_col
;
QUIT;




PaigeMiller
Diamond | Level 26

I just want to throw out a little bit of philosophy that you can adopt, or ignore, or argue with, about why I choose a PROC instead of a data step to do these calculations, which are really just addition and multiplication. (And I want to thank @Astounding for coming up with a much better solution that I had along similar lines)

 

My philosophy is this: SAS has already written code to compute sums and averages and weighted sums and weighted averages in several different PROCs. They have debugged it, and thousands/millions/tens of millions of applications later, we know the calculations inside these PROCs are correct. So I believe it is to my benefit to use the work that SAS has already done to compute sums and averages (and other similar statistics) — the coding is simpler, and if I write my own code to do this I can make errors

 

In the end, that's why we have SAS, isn't it? They do all the work to create these algorithms so we don't have to. 

 

Anyway, there's nothing wrong with doing it via data step or PROC SQL, if that's your choice. It's just not my choice to do that level of coding creating an algorithm to do sums and averages.

 

As an amusing side story, I once took over some code someone else had written. THere were several complicated data steps of about 15-20 lines of code, and it took me about 30 minutes to figure out what these data steps were doing. These complicated data steps were just adding up zeros and ones. Hence, another reason to use built in SAS procedures to do basic statistics is that the next person who has to make use of your code will understand quickly what is being done.

--
Paige Miller

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 6936 views
  • 5 likes
  • 7 in conversation