BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
robertrao
Quartz | Level 8

Hi Team,

I read about this function in Pharma book to group datasets and then break them in the report.

I was wondering the necissity of a SUM

when we say

in1*1

in2*2

in3*3

That serves as putting 1 2 and 3 to all the observation sunder a particular variable.

Why do WE HAVE TO USE A SUM FUNCTION HERE??

THANKS

DATA FINAL(drop=percent rename=(percent1=Percent));

SET age_1(IN=in1) gender_2(IN=in2) BMI_2(IN=in3) Cap_2(IN=in4) Sr_2(IN=in5)

AS_2(IN=in6) Cm_2(IN=in7) Cmb_2(IN=in8) CmbC_2(IN=in9) Cnc_2(IN=in10)

TN_2(IN=in11) PD_2(IN=in12)Cp_2(IN=in13);

    percent1=put(percent,6.2) ||  '%';

     group=sum(in1*1, in2*2, in3*3, in4*4, in5*5, in6*6 ,in7*7, in8*8, in9*9, in10*10, in11*11, in12*12,in13*13);

run;

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Because listing them like you have without any function isn't valid code, SAS wouldn't know what to return. You could also use max as well instead of sum to get the same answer, it's a way of telling SAS what you want.

This isn't valid SAS code, doesn't return a number.

group=in1*1, in2*2, in3*3, in4*4, in5*5, in6*6 ,in7*7, in8*8, in9*9, in10*10, in11*11, in12*12,in13*13;

Why do this instead of INDSNAME? Because INDSNAME is a relatively new option, 9.2+ I believe and people have been needing it a lot longer. In fact the IN method only works in your case because your datasets are numbered. If they were named differently without numbers this method wouldn't work at all.

View solution in original post

20 REPLIES 20
art297
Opal | Level 21

Since the code is setting all 13 files, group will equal a number between 1 and 13, indicating which file the record came from.

robertrao
Quartz | Level 8

Arthur,

Thanks for the reply. I was wondering the use of a SUM function when we are not doing any addition...u know what i mean?

i thought sum function is to add up two values.

Here sum function is not adding up any values.....

Thanks

ballardw
Super User

Actually SUM has much more functionallity than just adding. If one or more of the variables or expressions as arguments to SUM are missing it will return the total of the non-missing values. SUM will only return a missing value if all of the arguments are missing. A simple addition statement such as x= y + z; returns missing if either y or z are missing.

I have used a similar code construct where some of the values might be missing but I still wanted the indicator value when present.

robertrao
Quartz | Level 8

Hi Ballard,

Thanks for all the time for the detailed explanation. Maybe I am not able to ask the question properly.

I understand that in the code below all observation coming from age_1 dataset will all have a group=1

Likewise all observations from gendr_2 dataset will have a group=2

I thought it will be good enough to just write

group=in1*1, in2*2, in3*3, in4*4, in5*5, in6*6 ,in7*7, in8*8, in9*9, in10*10, in11*11, in12*12,in13*13;

instead of

group=SUM(in1*1, in2*2, in3*3, in4*4, in5*5, in6*6 ,in7*7, in8*8, in9*9, in10*10, in11*11, in12*12,in13*13);

DATA FINAL(drop=percent rename=(percent1=Percent));

SET age_1(IN=in1) gender_2(IN=in2) BMI_2(IN=in3) Cap_2(IN=in4) Sr_2(IN=in5)

AS_2(IN=in6) Cm_2(IN=in7) Cmb_2(IN=in8) CmbC_2(IN=in9) Cnc_2(IN=in10)

TN_2(IN=in11) PD_2(IN=in12)Cp_2(IN=in13);

    percent1=put(percent,6.2) ||  '%';

     group=sum(in1*1, in2*2, in3*3, in4*4, in5*5, in6*6 ,in7*7, in8*8, in9*9, in10*10, in11*11, in12*12,in13*13);

run;

PGStats
Opal | Level 21

Hi,

Since all in-variables are zero except one, the SUM function performs exactly as a simple sum 1*in1+2*in2+3*in3..., thus it gives you a number from 1 to 13 identifying the dataset from which each observation in dataset FINAL comes from. You could achieve the same goal with simply

DATA FINAL(drop=percent rename=(percent1=Percent));

SET age_1 gender_2 BMI_2 Cap_2 Sr_2

AS_2 Cm_2 Cmb_2 CmbC_2 Cnc_2

TN_2 PD_2 Cp_2 / INDSNAME=group;

percent1=put(percent,6.2) ||'%';

run;

here GROUP would take the values "WORK.AGE_1", "WORK.GENDER_2", ... etc.

PG

PG
robertrao
Quartz | Level 8

HI PG,

All in-variables are zero except one????

Could you please explain a lil bit

Thanks

art297
Opal | Level 21

Karun,

Your 13 variables are set with in= statements thus, by definition, are set to be either 1 or 0.  i.e., The in= variable will be set to 1 for the file being read, and 0 for the other 12 variables.  Since you are then multiplying those values by a number between 1 and 13, all of the ones with 0 will end up with 0, while the other will have a number between 1 and 13.

robertrao
Quartz | Level 8

Hi Arthur Excellent job as usual. I understand that part very clearly now. Even without the SUM function the above explanation is still valid right?

Correct me if i am wrong??!!!

If so then why do we need a SUM function!!

Also Arthur as PG said earlier insted of Aliasing each and every datasets with in=option  its even simpler to use the option  INDSNAME=group once and acheive the same task????

Thanks

Reeza
Super User

Because listing them like you have without any function isn't valid code, SAS wouldn't know what to return. You could also use max as well instead of sum to get the same answer, it's a way of telling SAS what you want.

This isn't valid SAS code, doesn't return a number.

group=in1*1, in2*2, in3*3, in4*4, in5*5, in6*6 ,in7*7, in8*8, in9*9, in10*10, in11*11, in12*12,in13*13;

Why do this instead of INDSNAME? Because INDSNAME is a relatively new option, 9.2+ I believe and people have been needing it a lot longer. In fact the IN method only works in your case because your datasets are numbered. If they were named differently without numbers this method wouldn't work at all.

robertrao
Quartz | Level 8


Thanks Reeza,

I learnt many things apart from my question. That is what I realizesd after following on the forum. Thanks so much to each and everyone of you.

Reeza,  I think it works even if the datasets arent numbered as well. (assuming that u meant age_1 gender_2) etc..

In SAS programming for the pharamaceutical Industry by Jack Shostak

Chapter4:Creating Tables and listing he has used it on ordinary datasets

Thanks

art297
Opal | Level 21

Karun,

I agree with you, over Reeza, in this case.  You are simply arbitrarily assigning the numbers 1 thru 13.  The actual file names are irrelevant, other than they happen to co-incide with the file names in your example.

Reeza
Super User

I think that depends on what 'group' variable is for Art, if its just to assign group that method is fine, if for some reason, ie tracking/auditing you need to identify the source data set that methodology will not work.

I didn't follow the original thread for this discussion so not sure.

robertrao
Quartz | Level 8

Thanks Reeza for the reply. If not for this question Is there a way to put "2 correct answers" and more than 2 "helpful answers"???

Thanks

art297
Opal | Level 21

Karun,

Only one answer can be marked correct and I'm not sure if you are allowed 2 or 3 helpful responses.

If needed, feel free to unmark my response as being helpful and assign it to someone else.  I won't feel the least bit offended.

Art

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 20 replies
  • 2592 views
  • 8 likes
  • 7 in conversation