DATA Step, Macro, Functions and more

Grouping stats in no grouped data set

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 19
Accepted Solution

Grouping stats in no grouped data set

[ Edited ]

Hello,

I  am struggling with the groups' statistics in not grouped data set, I would be grateful if anyone please tell me an efficient way to count stats like std, mean, max by a group and add it to the original dataset. I found this http://support.sas.com/resources/papers/proceedings11/019-2011.pdf, but they show how to count or sum not mentioned std, max, mean. The other problem is efficiency, I have to create over 500 variables like mentioned one, working with more than 20 large tables(over 10 mln obs in each) so sorting and merging is time-consuming(SQL package process everything in about 15h, nevertheless there is need to rewrite everything into 4GL, SQL Oracle can hadle this problem with overt (partition by)).  

In the table below, you will find desirable results.  

 

Thanks.

NameSexAgeWeightHeightmean_age_by_sexstd_weight_by_age
PhilipM167215013,40
JanetF1562,5112,513,221,81573
MaryF1566,511213,221,81573
RonaldM156713313,41,81573
WilliamM1566,511213,41,81573
CarolF1462,8102,513,222,425902
JudyF1464,39013,222,425902
AlfredM1469112,513,42,425902
HenryM1463,5102,513,42,425902
AliceF1356,58413,223,670907
BarbaraF1365,39813,223,670907
JeffreyM1362,58413,43,670907
JaneF1259,884,513,222,949305
LouiseF1256,37713,222,949305
JamesM1257,38313,42,949305
JohnM125999,513,42,949305
RobertM1264,812813,42,949305
JoyceF1151,350,513,223,1
ThomasM1157,58513,4

3,1


Accepted Solutions
Solution
a week ago
Super User
Posts: 10,500

Re: Grouping stats in no grouped data set

[ Edited ]

Any time I see a request to merge summary values into a data set I want to know how the resulting data set is to be used.

 

Next, with the same question, comes from "I have to create over 500 variables like mentioned one". Is this 500 in a single set or across the 20 (adding 25 variables to a single set makes much more sense than 500 to each of 20 sets).

 

How you use the result can make a difference on approachs. For some things I would tend to make a custom format and attach that to a variable for reporting purposes. For example, your "mean_age_by_sex" variable is going to take exactly 2 values in any given data set.

So an additional variable is really not needed as if the subject is male or female you already know the value.

 

It would not hurt to mention that you started with SASHELP.CLASS for example data so folks can work with the data.

 

with that said:

proc summary data=sashelp.class;
   class sex age;
   var height weight;
   output out=classsummary mean= max= min= std= median= range= /autoname;
run;

Will in a single pass through the data provide the requested statistics for 1) All records 2) by sex 3) by age and 4) by age and sex.

 

The autoname appends the statistic to the name of the variable. There will be a variable in the output set named _type_. It will have values of 0 to 3 in this case and indicates which combinations of group variables are used to make the statistic.

I would be strongly tempted to not merge the statistics onto the data until I am actually using them as then I can indicate which _type_ to use and match on the appropriate class variable(s) or none at all. But that's me.

View solution in original post


All Replies
Solution
a week ago
Super User
Posts: 10,500

Re: Grouping stats in no grouped data set

[ Edited ]

Any time I see a request to merge summary values into a data set I want to know how the resulting data set is to be used.

 

Next, with the same question, comes from "I have to create over 500 variables like mentioned one". Is this 500 in a single set or across the 20 (adding 25 variables to a single set makes much more sense than 500 to each of 20 sets).

 

How you use the result can make a difference on approachs. For some things I would tend to make a custom format and attach that to a variable for reporting purposes. For example, your "mean_age_by_sex" variable is going to take exactly 2 values in any given data set.

So an additional variable is really not needed as if the subject is male or female you already know the value.

 

It would not hurt to mention that you started with SASHELP.CLASS for example data so folks can work with the data.

 

with that said:

proc summary data=sashelp.class;
   class sex age;
   var height weight;
   output out=classsummary mean= max= min= std= median= range= /autoname;
run;

Will in a single pass through the data provide the requested statistics for 1) All records 2) by sex 3) by age and 4) by age and sex.

 

The autoname appends the statistic to the name of the variable. There will be a variable in the output set named _type_. It will have values of 0 to 3 in this case and indicates which combinations of group variables are used to make the statistic.

I would be strongly tempted to not merge the statistics onto the data until I am actually using them as then I can indicate which _type_ to use and match on the appropriate class variable(s) or none at all. But that's me.

Occasional Contributor
Posts: 19

Re: Grouping stats in no grouped data set

[ Edited ]

1. Resulting data set is going to be use as analytical base table customer 360 customer view*.
2. Current SQL package build about 20 ABT tables from diffrent point of view and merge them into one ABT 360 customer view*.

Super User
Posts: 10,500

Re: Grouping stats in no grouped data set

I have no idea what " base table customer 360 degree" means.

What specific types of analysis are contemplated? I am not seeing an advantage to all of the redundant information in the proposed structure especially with such large numbers of records.

 

And is "ABT table" supposed to be something special or is that local to your organization jargon?

 

Would it be possible to post a few rows of one of these final tables?

 

 

Occasional Contributor
Posts: 19

Re: Grouping stats in no grouped data set

[ Edited ]

No that`s no possible(personal details), unfortunately.

 

ABT is commonly used for predictive models, especially in the banking sector or telecommunication industry.

https://en.wikipedia.org/wiki/Analytical_base_table

 

Nevertheless, thank you for your help. I am going to use proc summary.

Super User
Posts: 10,500

Re: Grouping stats in no grouped data set

I am not a banking expert and only a basic modeller but this sounds like one approach, depending on how many individuals you have, would be

1) summarize from the entire data such as the Proc Summary shown.

2) select a sample from the main data, maybe a 20%, or enough to give a reasonable shot at developing a model

3) test the model with the remaining data.

 

If the time crunch is the mergin this may help.

Super User
Posts: 9,681

Re: Grouping stats in no grouped data set

write multiple SQL statement, everyone correspond to a statisical .

 

proc sql;
create table a as
 select *,mean(age) as mean_age_by_sex
  from sashelp.class
   group by sex;
create table b as
 select *,std(weight) as std_weight_by_age
  from a 
   group by age;
quit;
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 119 views
  • 0 likes
  • 3 in conversation