Programming the statistical procedures from SAS

SUM FUNCTION

Accepted Solution Solved
Reply
Super Contributor
Posts: 1,040
Accepted Solution

SUM FUNCTION

Hi Team,

I read about this function in Pharma book to group datasets and then break them in the report.

I was wondering the necissity of a SUM

when we say

in1*1

in2*2

in3*3

That serves as putting 1 2 and 3 to all the observation sunder a particular variable.

Why do WE HAVE TO USE A SUM FUNCTION HERE??

THANKS

DATA FINAL(drop=percent rename=(percent1=Percent));

SET age_1(IN=in1) gender_2(IN=in2) BMI_2(IN=in3) Cap_2(IN=in4) Sr_2(IN=in5)

AS_2(IN=in6) Cm_2(IN=in7) Cmb_2(IN=in8) CmbC_2(IN=in9) Cnc_2(IN=in10)

TN_2(IN=in11) PD_2(IN=in12)Cp_2(IN=in13);

    percent1=put(percent,6.2) ||  '%';

     group=sum(in1*1, in2*2, in3*3, in4*4, in5*5, in6*6 ,in7*7, in8*8, in9*9, in10*10, in11*11, in12*12,in13*13);

run;


Accepted Solutions
Solution
‎10-02-2012 06:02 PM
Super User
Posts: 18,498

Re: SUM FUNCTION

Because listing them like you have without any function isn't valid code, SAS wouldn't know what to return. You could also use max as well instead of sum to get the same answer, it's a way of telling SAS what you want.

This isn't valid SAS code, doesn't return a number.

group=in1*1, in2*2, in3*3, in4*4, in5*5, in6*6 ,in7*7, in8*8, in9*9, in10*10, in11*11, in12*12,in13*13;

Why do this instead of INDSNAME? Because INDSNAME is a relatively new option, 9.2+ I believe and people have been needing it a lot longer. In fact the IN method only works in your case because your datasets are numbered. If they were named differently without numbers this method wouldn't work at all.

View solution in original post


All Replies
PROC Star
Posts: 7,416

Re: SUM FUNCTION

Since the code is setting all 13 files, group will equal a number between 1 and 13, indicating which file the record came from.

Super Contributor
Posts: 1,040

Re: SUM FUNCTION

Arthur,

Thanks for the reply. I was wondering the use of a SUM function when we are not doing any addition...u know what i mean?

i thought sum function is to add up two values.

Here sum function is not adding up any values.....

Thanks

Super User
Posts: 10,816

Re: SUM FUNCTION

Actually SUM has much more functionallity than just adding. If one or more of the variables or expressions as arguments to SUM are missing it will return the total of the non-missing values. SUM will only return a missing value if all of the arguments are missing. A simple addition statement such as x= y + z; returns missing if either y or z are missing.

I have used a similar code construct where some of the values might be missing but I still wanted the indicator value when present.

Super Contributor
Posts: 1,040

Re: SUM FUNCTION

Hi Ballard,

Thanks for all the time for the detailed explanation. Maybe I am not able to ask the question properly.

I understand that in the code below all observation coming from age_1 dataset will all have a group=1

Likewise all observations from gendr_2 dataset will have a group=2

I thought it will be good enough to just write

group=in1*1, in2*2, in3*3, in4*4, in5*5, in6*6 ,in7*7, in8*8, in9*9, in10*10, in11*11, in12*12,in13*13;

instead of

group=SUM(in1*1, in2*2, in3*3, in4*4, in5*5, in6*6 ,in7*7, in8*8, in9*9, in10*10, in11*11, in12*12,in13*13);

DATA FINAL(drop=percent rename=(percent1=Percent));

SET age_1(IN=in1) gender_2(IN=in2) BMI_2(IN=in3) Cap_2(IN=in4) Sr_2(IN=in5)

AS_2(IN=in6) Cm_2(IN=in7) Cmb_2(IN=in8) CmbC_2(IN=in9) Cnc_2(IN=in10)

TN_2(IN=in11) PD_2(IN=in12)Cp_2(IN=in13);

    percent1=put(percent,6.2) ||  '%';

     group=sum(in1*1, in2*2, in3*3, in4*4, in5*5, in6*6 ,in7*7, in8*8, in9*9, in10*10, in11*11, in12*12,in13*13);

run;

Respected Advisor
Posts: 4,742

Re: SUM FUNCTION

Hi,

Since all in-variables are zero except one, the SUM function performs exactly as a simple sum 1*in1+2*in2+3*in3..., thus it gives you a number from 1 to 13 identifying the dataset from which each observation in dataset FINAL comes from. You could achieve the same goal with simply

DATA FINAL(drop=percent rename=(percent1=Percent));

SET age_1 gender_2 BMI_2 Cap_2 Sr_2

AS_2 Cm_2 Cmb_2 CmbC_2 Cnc_2

TN_2 PD_2 Cp_2 / INDSNAME=group;

percent1=put(percent,6.2) ||'%';

run;

here GROUP would take the values "WORK.AGE_1", "WORK.GENDER_2", ... etc.

PG

PG
Super Contributor
Posts: 1,040

Re: SUM FUNCTION

HI PG,

All in-variables are zero except one????

Could you please explain a lil bit

Thanks

PROC Star
Posts: 7,416

Re: SUM FUNCTION

Karun,

Your 13 variables are set with in= statements thus, by definition, are set to be either 1 or 0.  i.e., The in= variable will be set to 1 for the file being read, and 0 for the other 12 variables.  Since you are then multiplying those values by a number between 1 and 13, all of the ones with 0 will end up with 0, while the other will have a number between 1 and 13.

Super Contributor
Posts: 1,040

Re: SUM FUNCTION

Hi Arthur Excellent job as usual. I understand that part very clearly now. Even without the SUM function the above explanation is still valid right?

Correct me if i am wrong??!!!

If so then why do we need a SUM function!!

Also Arthur as PG said earlier insted of Aliasing each and every datasets with in=option  its even simpler to use the option  INDSNAME=group once and acheive the same task????

Thanks

Solution
‎10-02-2012 06:02 PM
Super User
Posts: 18,498

Re: SUM FUNCTION

Because listing them like you have without any function isn't valid code, SAS wouldn't know what to return. You could also use max as well instead of sum to get the same answer, it's a way of telling SAS what you want.

This isn't valid SAS code, doesn't return a number.

group=in1*1, in2*2, in3*3, in4*4, in5*5, in6*6 ,in7*7, in8*8, in9*9, in10*10, in11*11, in12*12,in13*13;

Why do this instead of INDSNAME? Because INDSNAME is a relatively new option, 9.2+ I believe and people have been needing it a lot longer. In fact the IN method only works in your case because your datasets are numbered. If they were named differently without numbers this method wouldn't work at all.

Super Contributor
Posts: 1,040

Re: SUM FUNCTION


Thanks Reeza,

I learnt many things apart from my question. That is what I realizesd after following on the forum. Thanks so much to each and everyone of you.

Reeza,  I think it works even if the datasets arent numbered as well. (assuming that u meant age_1 gender_2) etc..

In SAS programming for the pharamaceutical Industry by Jack Shostak

Chapter4:Creating Tables and listing he has used it on ordinary datasets

Thanks

PROC Star
Posts: 7,416

Re: SUM FUNCTION

Karun,

I agree with you, over Reeza, in this case.  You are simply arbitrarily assigning the numbers 1 thru 13.  The actual file names are irrelevant, other than they happen to co-incide with the file names in your example.

Super User
Posts: 18,498

Re: SUM FUNCTION

I think that depends on what 'group' variable is for Art, if its just to assign group that method is fine, if for some reason, ie tracking/auditing you need to identify the source data set that methodology will not work.

I didn't follow the original thread for this discussion so not sure.

Super Contributor
Posts: 1,040

Re: SUM FUNCTION

Thanks Reeza for the reply. If not for this question Is there a way to put "2 correct answers" and more than 2 "helpful answers"???

Thanks

PROC Star
Posts: 7,416

Re: SUM FUNCTION

Karun,

Only one answer can be marked correct and I'm not sure if you are allowed 2 or 3 helpful responses.

If needed, feel free to unmark my response as being helpful and assign it to someone else.  I won't feel the least bit offended.

Art

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 20 replies
  • 680 views
  • 8 likes
  • 7 in conversation