turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- SUM FUNCTION

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-02-2012 04:04 PM

Hi Team,

I read about this function in Pharma book to group datasets and then break them in the report.

I was wondering the necissity of a SUM

when we say

in1*1

in2*2

in3*3

That serves as putting 1 2 and 3 to all the observation sunder a particular variable.

Why do WE HAVE TO USE A SUM FUNCTION HERE??

THANKS

DATA FINAL(drop=percent rename=(percent1=Percent));

SET age_1(IN=in1) gender_2(IN=in2) BMI_2(IN=in3) Cap_2(IN=in4) Sr_2(IN=in5)

AS_2(IN=in6) Cm_2(IN=in7) Cmb_2(IN=in8) CmbC_2(IN=in9) Cnc_2(IN=in10)

TN_2(IN=in11) PD_2(IN=in12)Cp_2(IN=in13);

percent1=put(percent,6.2) || '%';

group=sum(in1*1, in2*2, in3*3, in4*4, in5*5, in6*6 ,in7*7, in8*8, in9*9, in10*10, in11*11, in12*12,in13*13);

run;

Accepted Solutions

Solution

10-02-2012
06:02 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-02-2012 06:02 PM

Because listing them like you have without any function isn't valid code, SAS wouldn't know what to return. You could also use max as well instead of sum to get the same answer, it's a way of telling SAS what you want.

This isn't valid SAS code, doesn't return a number.

group=in1*1, in2*2, in3*3, in4*4, in5*5, in6*6 ,in7*7, in8*8, in9*9, in10*10, in11*11, in12*12,in13*13;

Why do this instead of INDSNAME? Because INDSNAME is a relatively new option, 9.2+ I believe and people have been needing it a lot longer. In fact the IN method only works in your case because your datasets are numbered. If they were named differently without numbers this method wouldn't work at all.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-02-2012 04:21 PM

Since the code is setting all 13 files, group will equal a number between 1 and 13, indicating which file the record came from.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-02-2012 04:28 PM

Arthur,

Thanks for the reply. I was wondering the use of a SUM function when we are not doing any addition...u know what i mean?

i thought sum function is to add up two values.

Here sum function is not adding up any values.....

Thanks

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-02-2012 05:31 PM

Actually SUM has much more functionallity than just adding. If one or more of the variables or expressions as arguments to SUM are missing it will return the total of the non-missing values. SUM will only return a missing value if all of the arguments are missing. A simple addition statement such as x= y + z; returns missing if either y or z are missing.

I have used a similar code construct where some of the values might be missing but I still wanted the indicator value when present.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-02-2012 05:40 PM

Hi Ballard,

Thanks for all the time for the detailed explanation. Maybe I am not able to ask the question properly.

I understand that in the code below all observation coming from age_1 dataset will all have a group=1

Likewise all observations from gendr_2 dataset will have a group=2

I thought it will be good enough to just write

group=in1*1, in2*2, in3*3, in4*4, in5*5, in6*6 ,in7*7, in8*8, in9*9, in10*10, in11*11, in12*12,in13*13;

**instead of**

group=**SUM**(in1*1, in2*2, in3*3, in4*4, in5*5, in6*6 ,in7*7, in8*8, in9*9, in10*10, in11*11, in12*12,in13*13);

DATA FINAL(drop=percent rename=(percent1=Percent));

SET age_1(IN=in1) gender_2(IN=in2) BMI_2(IN=in3) Cap_2(IN=in4) Sr_2(IN=in5)

AS_2(IN=in6) Cm_2(IN=in7) Cmb_2(IN=in8) CmbC_2(IN=in9) Cnc_2(IN=in10)

TN_2(IN=in11) PD_2(IN=in12)Cp_2(IN=in13);

percent1=put(percent,6.2) || '%';

group=sum(in1*1, in2*2, in3*3, in4*4, in5*5, in6*6 ,in7*7, in8*8, in9*9, in10*10, in11*11, in12*12,in13*13);

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-02-2012 04:48 PM

Hi,

Since all in-variables are zero except one, the SUM function performs exactly as a simple sum 1*in1+2*in2+3*in3..., thus it gives you a number from 1 to 13 identifying the dataset from which each observation in dataset FINAL comes from. You could achieve the same goal with simply

**DATA FINAL(drop=percent rename=(percent1=Percent));**

**SET age_1 gender_2 BMI_2 Cap_2 Sr_2 **

**AS_2 Cm_2 Cmb_2 CmbC_2 Cnc_2 **

**TN_2 PD_2 Cp_2 / INDSNAME=group;**

**percent1=put(percent,6.2) ||'%';**

**run;**

here GROUP would take the values "WORK.AGE_1", "WORK.GENDER_2", ... etc.

PG

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-02-2012 05:25 PM

HI PG,

All in-variables are zero except one????

Could you please explain a lil bit

Thanks

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-02-2012 05:43 PM

Karun,

Your 13 variables are set with in= statements thus, by definition, are set to be either 1 or 0. i.e., The in= variable will be set to 1 for the file being read, and 0 for the other 12 variables. Since you are then multiplying those values by a number between 1 and 13, all of the ones with 0 will end up with 0, while the other will have a number between 1 and 13.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-02-2012 05:51 PM

Hi Arthur Excellent job as usual. I understand that part very clearly now. Even without the **SUM** function the above explanation is still valid right?

Correct me if i am wrong??!!!

If so then why do we need a SUM function!!

Also Arthur as PG said earlier insted of Aliasing each and every datasets with in=option its even simpler to use the option INDSNAME=group once and acheive the same task????

Thanks

Solution

10-02-2012
06:02 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-02-2012 06:02 PM

Because listing them like you have without any function isn't valid code, SAS wouldn't know what to return. You could also use max as well instead of sum to get the same answer, it's a way of telling SAS what you want.

This isn't valid SAS code, doesn't return a number.

group=in1*1, in2*2, in3*3, in4*4, in5*5, in6*6 ,in7*7, in8*8, in9*9, in10*10, in11*11, in12*12,in13*13;

Why do this instead of INDSNAME? Because INDSNAME is a relatively new option, 9.2+ I believe and people have been needing it a lot longer. In fact the IN method only works in your case because your datasets are numbered. If they were named differently without numbers this method wouldn't work at all.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-02-2012 06:13 PM

Thanks Reeza,

I learnt many things apart from my question. That is what I realizesd after following on the forum. Thanks so much to each and everyone of you.

Reeza,* I think it works even if the datasets arent numbered as well. (assuming that u meant age_1 gender_2) etc..*

*In SAS programming for the pharamaceutical Industry by Jack Shostak *

*Chapter4:Creating Tables and listing he has used it on ordinary datasets *

*Thanks*

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-02-2012 06:35 PM

Karun,

I agree with you, over Reeza, in this case. You are simply arbitrarily assigning the numbers 1 thru 13. The actual file names are irrelevant, other than they happen to co-incide with the file names in your example.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-03-2012 10:29 AM

I think that depends on what 'group' variable is for Art, if its just to assign group that method is fine, if for some reason, ie tracking/auditing you need to identify the source data set that methodology will not work.

I didn't follow the original thread for this discussion so not sure.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-03-2012 10:47 AM

Thanks Reeza for the reply. If not for this question Is there a way to put "2 correct answers" and more than 2 "helpful answers"???

Thanks

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-03-2012 11:00 AM

Karun,

Only one answer can be marked correct and I'm not sure if you are allowed 2 or 3 helpful responses.

If needed, feel free to unmark my response as being helpful and assign it to someone else. I won't feel the least bit offended.

Art