DATA Step, Macro, Functions and more

Domain function mean(var3) over (patrition by var1,var2)

Accepted Solution Solved
Reply
Contributor
Posts: 53
Accepted Solution

Domain function mean(var3) over (patrition by var1,var2)

Hi,

Could anyone please tell me if it is possible to use such domain functions in SAS as in sql

for instance:

'mean(var3) over (patrition by var1,var2)'

Thank you.


Accepted Solutions
Solution
‎08-07-2017 02:07 PM
Respected Advisor
Posts: 4,930

Re: Domain function mean(var3) over (patrition by var1,var2)

Looks like you want something like:

 


proc sql;
select
    Cst_name,
    coalesce(mean(case when data_1 = "NA" then . else skrz end), 0) as avg_data_1_cst,
    coalesce(mean(case when data_2 = "NA" then . else skrz end), 0) as avg_data_2_cst,
    coalesce(mean(case when data_3 = "NA" then . else skrz end), 0) as avg_data_3_cst,
    coalesce(max(case when data_1 = "NA" then . else skrz end), 0) as max_data_1_cst
from test
group by Cst_name;
quit;


PG

View solution in original post


All Replies
Super User
Posts: 7,841

Re: Domain function mean(var3) over (patrition by var1,var2)

Do you mean something like

proc sql;
create table want as
select
  var1,
  var2,
  avg(var3)
from have
group by
  var1,
  var2
;
quit;

?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 53

Re: Domain function mean(var3) over (patrition by var1,var2)

Posted in reply to KurtBremser

Not really, rather then grouping them I have to get statistics over multiple variables for eg.

 

case when data_2 <> 'non' then round(avg(num) over (partition by wyn.id_cst, wyn.data_2),2) 
else 0 end as sred_skzr_6m,
case when data_3 <> 'non' then round(avg(num) over (partition by wyn.id_cst, wyn.data_3),2) 
else 0 end as sred_skzr_9m,
case when data_4 = 'sth' then round(avg(num) over (partition by wyn.id_cst, wyn.data_4),2) 
else 0 end as sred_skzr_12m,

 

Super User
Posts: 5,435

Re: Domain function mean(var3) over (patrition by var1,var2)

Since most of us are SAS specialists, example syntax from other SW is of little help.
Please exemplify with sample data and desired output, paired with a more verbal description of what you want to accomplish.
Data never sleeps
Contributor
Posts: 53

Re: Domain function mean(var3) over (patrition by var1,var2)

ok, here you can see the example of the data:

Cst_Namedata_1data_2data_3skrz
John3m6m9m100
JohnNA6m9m87
AlfredNA6mNA95
AlfredNA6mNA57
AlfredNA6mNA95
Alice3m6mNA46
Barbara3mNA9m78
Barbara3mNA9m45
WilliamNANA9m58
Frnak3mNA9m88
FrankNANA9m58
RoseNA6m9m79
RoseNA6mNA128
SarahNA6m9m78
Sarah3mNA9m89

 

and expected results:

Cst_Nameavg_data_1_cstavg_data_2_cstavg_data_3_cstmax_data_1_cststd_data_2_cst
John10093,593,51006,5
Alfred082,333333330017,91337179
Alice46460460
Barbara61,5061,5780
William005800
Frnak88073880
Rose0103,579024,5
Sarah897883,5890

 

 

in Oracle SQL you can easily do this with over (partition by ) command.

What I would like to do is compute all statistics like (std, max, mean) over customer id and any other variable.

Super User
Posts: 11,343

Re: Domain function mean(var3) over (patrition by var1,var2)

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

I include those instructions as the data you show only has one variable that could be considered numeric by SAS unless your "3m" is supposed to be 3 million or such and is appearing as a FORMATTED value and the NA is appearing because you have a custom format to show missing as NA. Otherwise 3m would be character and mean or other statistical function would yeild a missing value.

 

Or possibly you want Proc summary (or means) as it will create statistics for groups of class variables and autoname them if that option is used. Here's a small example using a SAS supplied dataset you should have access to:

proc summary data=sashelp.class;
   class age sex;
   var height weight;
   output out=classsummary  max= min= mean= std= /autoname;
run;

The output has summaries at all of the combinations of the grouping variables (the ones you want to partition over maybe?) The Var statement lists the numeric variables to summarize, output says to create an output data set and provide the name of the set (out=), the requested statistics (max= min= etc) and the /autoname tells SAS to create names by appending the statistic abbreviation to the variable name.

 

 

You would have to go into a lot detail to explain wher any of the statistics for your data_1, data_2 and data_3 come from given that example data and "results".

Contributor
Posts: 53

Re: Domain function mean(var3) over (patrition by var1,var2)

Hi,

data_1, data_2 etc. are not num variables but characters 3m means 3 months, the thing is when data_1 is NA like in example:

Frnak3mNA9m88
FrankNANA9m58

avg_data_1 is 88,

when both data_2 are NA the avg_data_2=0

when data_3 avg_data_3 = 73.

 

 

Solution
‎08-07-2017 02:07 PM
Respected Advisor
Posts: 4,930

Re: Domain function mean(var3) over (patrition by var1,var2)

Looks like you want something like:

 


proc sql;
select
    Cst_name,
    coalesce(mean(case when data_1 = "NA" then . else skrz end), 0) as avg_data_1_cst,
    coalesce(mean(case when data_2 = "NA" then . else skrz end), 0) as avg_data_2_cst,
    coalesce(mean(case when data_3 = "NA" then . else skrz end), 0) as avg_data_3_cst,
    coalesce(max(case when data_1 = "NA" then . else skrz end), 0) as max_data_1_cst
from test
group by Cst_name;
quit;


PG
Contributor
Posts: 53

Re: Domain function mean(var3) over (patrition by var1,var2)

Many thanks, that is exactly what I wanted to get.
Contributor
Posts: 53

Re: Domain function mean(var3) over (patrition by var1,var2)

I post it here because I  am struggling with the similar problem. Could you please tell me an efficient  way to count stats like std, mean, max by a group and add it to the original dataset. I found this http://support.sas.com/resources/papers/proceedings11/019-2011.pdf but their only show how to count or sum, the other problem is efficiency, I have to create over 500 variables like mentioned one, working with more than 20 large tables(over 10 mln obs in each) so sorting and merging is time-consuming(SQL package process everything in about 15h, but there is need to rewrite everything into 4GL).  

In the table below, you will find desirable results.  

NameSexAgeWeightHeightmean_age_by_sexstd_weight_by_age
PhilipM167215013,40
JanetF1562,5112,513,221,81573
MaryF1566,511213,221,81573
RonaldM156713313,41,81573
WilliamM1566,511213,41,81573
CarolF1462,8102,513,222,425902
JudyF1464,39013,222,425902
AlfredM1469112,513,42,425902
HenryM1463,5102,513,42,425902
AliceF1356,58413,223,670907
BarbaraF1365,39813,223,670907
JeffreyM1362,58413,43,670907
JaneF1259,884,513,222,949305
LouiseF1256,37713,222,949305
JamesM1257,38313,42,949305
JohnM125999,513,42,949305
RobertM1264,812813,42,949305
JoyceF1151,350,513,223,1
ThomasM1157,58513,43,1
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 168 views
  • 1 like
  • 5 in conversation