Solved: Re: Domain function mean(var3) over (patrition by var1,var2)

Matt3 · Posted 08-07-2017 09:09 AM

Hi,

Could anyone please tell me if it is possible to use such domain functions in SAS as in sql

for instance:

'mean(var3) over (patrition by var1,var2)'

Thank you.

PGStats · Posted 08-07-2017 01:56 PM

Looks like you want something like:


proc sql;
select
    Cst_name,
    coalesce(mean(case when data_1 = "NA" then . else skrz end), 0) as avg_data_1_cst,
    coalesce(mean(case when data_2 = "NA" then . else skrz end), 0) as avg_data_2_cst,
    coalesce(mean(case when data_3 = "NA" then . else skrz end), 0) as avg_data_3_cst,
    coalesce(max(case when data_1 = "NA" then . else skrz end), 0) as max_data_1_cst
from test
group by Cst_name;
quit;

PG

View solution in original post

Kurt_Bremser · Posted 08-07-2017 09:14 AM

Do you mean something like

proc sql;
create table want as
select
  var1,
  var2,
  avg(var3)
from have
group by
  var1,
  var2
;
quit;

?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Matt3 · Posted 08-07-2017 09:24 AM

Not really, rather then grouping them I have to get statistics over multiple variables for eg.

case when data_2 <> 'non' then round(avg(num) over (partition by wyn.id_cst, wyn.data_2),2) 
else 0 end as sred_skzr_6m,
case when data_3 <> 'non' then round(avg(num) over (partition by wyn.id_cst, wyn.data_3),2) 
else 0 end as sred_skzr_9m,
case when data_4 = 'sth' then round(avg(num) over (partition by wyn.id_cst, wyn.data_4),2) 
else 0 end as sred_skzr_12m,

LinusH · Posted 08-07-2017 10:45 AM

Since most of us are SAS specialists, example syntax from other SW is of little help.
Please exemplify with sample data and desired output, paired with a more verbal description of what you want to accomplish.

Data never sleeps

Matt3 · Posted 08-07-2017 12:47 PM

ok, here you can see the example of the data:

Cst_Name	data_1	data_2	data_3	skrz
John	3m	6m	9m	100
John	NA	6m	9m	87
Alfred	NA	6m	NA	95
Alfred	NA	6m	NA	57
Alfred	NA	6m	NA	95
Alice	3m	6m	NA	46
Barbara	3m	NA	9m	78
Barbara	3m	NA	9m	45
William	NA	NA	9m	58
Frnak	3m	NA	9m	88
Frank	NA	NA	9m	58
Rose	NA	6m	9m	79
Rose	NA	6m	NA	128
Sarah	NA	6m	9m	78
Sarah	3m	NA	9m	89

and expected results:

Cst_Name	avg_data_1_cst	avg_data_2_cst	avg_data_3_cst	max_data_1_cst	std_data_2_cst
John	100	93,5	93,5	100	6,5
Alfred	0	82,33333333	0	0	17,91337179
Alice	46	46	0	46	0
Barbara	61,5	0	61,5	78	0
William	0	0	58	0	0
Frnak	88	0	73	88	0
Rose	0	103,5	79	0	24,5
Sarah	89	78	83,5	89	0

in Oracle SQL you can easily do this with over (partition by ) command.

What I would like to do is compute all statistics like (std, max, mean) over customer id and any other variable.

ballardw · Posted 08-07-2017 01:20 PM

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

I include those instructions as the data you show only has one variable that could be considered numeric by SAS unless your "3m" is supposed to be 3 million or such and is appearing as a FORMATTED value and the NA is appearing because you have a custom format to show missing as NA. Otherwise 3m would be character and mean or other statistical function would yeild a missing value.

Or possibly you want Proc summary (or means) as it will create statistics for groups of class variables and autoname them if that option is used. Here's a small example using a SAS supplied dataset you should have access to:

proc summary data=sashelp.class;
   class age sex;
   var height weight;
   output out=classsummary  max= min= mean= std= /autoname;
run;

The output has summaries at all of the combinations of the grouping variables (the ones you want to partition over maybe?) The Var statement lists the numeric variables to summarize, output says to create an output data set and provide the name of the set (out=), the requested statistics (max= min= etc) and the /autoname tells SAS to create names by appending the statistic abbreviation to the variable name.

You would have to go into a lot detail to explain wher any of the statistics for your data_1, data_2 and data_3 come from given that example data and "results".

Matt3 · Posted 08-07-2017 01:54 PM

Hi,

data_1, data_2 etc. are not num variables but characters 3m means 3 months, the thing is when data_1 is NA like in example:

Frnak	3m	NA	9m	88
Frank	NA	NA	9m	58

avg_data_1 is 88,

when both data_2 are NA the avg_data_2=0

when data_3 avg_data_3 = 73.

PGStats · Posted 08-07-2017 01:56 PM

Looks like you want something like:


proc sql;
select
    Cst_name,
    coalesce(mean(case when data_1 = "NA" then . else skrz end), 0) as avg_data_1_cst,
    coalesce(mean(case when data_2 = "NA" then . else skrz end), 0) as avg_data_2_cst,
    coalesce(mean(case when data_3 = "NA" then . else skrz end), 0) as avg_data_3_cst,
    coalesce(max(case when data_1 = "NA" then . else skrz end), 0) as max_data_1_cst
from test
group by Cst_name;
quit;

PG

Matt3 · Posted 08-07-2017 02:11 PM

Many thanks, that is exactly what I wanted to get.

Matt3 · Posted 08-08-2017 02:11 PM

I post it here because I am struggling with the similar problem. Could you please tell me an efficient way to count stats like std, mean, max by a group and add it to the original dataset. I found this http://support.sas.com/resources/papers/proceedings11/019-2011.pdf but their only show how to count or sum, the other problem is efficiency, I have to create over 500 variables like mentioned one, working with more than 20 large tables(over 10 mln obs in each) so sorting and merging is time-consuming(SQL package process everything in about 15h, but there is need to rewrite everything into 4GL).

In the table below, you will find desirable results.

Name	Sex	Age	Weight	Height	mean_age_by_sex	std_weight_by_age
Philip	M	16	72	150	13,4	0
Janet	F	15	62,5	112,5	13,22	1,81573
Mary	F	15	66,5	112	13,22	1,81573
Ronald	M	15	67	133	13,4	1,81573
William	M	15	66,5	112	13,4	1,81573
Carol	F	14	62,8	102,5	13,22	2,425902
Judy	F	14	64,3	90	13,22	2,425902
Alfred	M	14	69	112,5	13,4	2,425902
Henry	M	14	63,5	102,5	13,4	2,425902
Alice	F	13	56,5	84	13,22	3,670907
Barbara	F	13	65,3	98	13,22	3,670907
Jeffrey	M	13	62,5	84	13,4	3,670907
Jane	F	12	59,8	84,5	13,22	2,949305
Louise	F	12	56,3	77	13,22	2,949305
James	M	12	57,3	83	13,4	2,949305
John	M	12	59	99,5	13,4	2,949305
Robert	M	12	64,8	128	13,4	2,949305
Joyce	F	11	51,3	50,5	13,22	3,1
Thomas	M	11	57,5	85	13,4	3,1

Classroom Training Available!