Hi,
Could anyone please tell me if it is possible to use such domain functions in SAS as in sql
for instance:
'mean(var3) over (patrition by var1,var2)'
Thank you.
Looks like you want something like:
proc sql;
select
Cst_name,
coalesce(mean(case when data_1 = "NA" then . else skrz end), 0) as avg_data_1_cst,
coalesce(mean(case when data_2 = "NA" then . else skrz end), 0) as avg_data_2_cst,
coalesce(mean(case when data_3 = "NA" then . else skrz end), 0) as avg_data_3_cst,
coalesce(max(case when data_1 = "NA" then . else skrz end), 0) as max_data_1_cst
from test
group by Cst_name;
quit;
Do you mean something like
proc sql;
create table want as
select
var1,
var2,
avg(var3)
from have
group by
var1,
var2
;
quit;
?
Not really, rather then grouping them I have to get statistics over multiple variables for eg.
case when data_2 <> 'non' then round(avg(num) over (partition by wyn.id_cst, wyn.data_2),2) else 0 end as sred_skzr_6m, case when data_3 <> 'non' then round(avg(num) over (partition by wyn.id_cst, wyn.data_3),2) else 0 end as sred_skzr_9m, case when data_4 = 'sth' then round(avg(num) over (partition by wyn.id_cst, wyn.data_4),2) else 0 end as sred_skzr_12m,
ok, here you can see the example of the data:
Cst_Name | data_1 | data_2 | data_3 | skrz |
John | 3m | 6m | 9m | 100 |
John | NA | 6m | 9m | 87 |
Alfred | NA | 6m | NA | 95 |
Alfred | NA | 6m | NA | 57 |
Alfred | NA | 6m | NA | 95 |
Alice | 3m | 6m | NA | 46 |
Barbara | 3m | NA | 9m | 78 |
Barbara | 3m | NA | 9m | 45 |
William | NA | NA | 9m | 58 |
Frnak | 3m | NA | 9m | 88 |
Frank | NA | NA | 9m | 58 |
Rose | NA | 6m | 9m | 79 |
Rose | NA | 6m | NA | 128 |
Sarah | NA | 6m | 9m | 78 |
Sarah | 3m | NA | 9m | 89 |
and expected results:
Cst_Name | avg_data_1_cst | avg_data_2_cst | avg_data_3_cst | max_data_1_cst | std_data_2_cst |
John | 100 | 93,5 | 93,5 | 100 | 6,5 |
Alfred | 0 | 82,33333333 | 0 | 0 | 17,91337179 |
Alice | 46 | 46 | 0 | 46 | 0 |
Barbara | 61,5 | 0 | 61,5 | 78 | 0 |
William | 0 | 0 | 58 | 0 | 0 |
Frnak | 88 | 0 | 73 | 88 | 0 |
Rose | 0 | 103,5 | 79 | 0 | 24,5 |
Sarah | 89 | 78 | 83,5 | 89 | 0 |
in Oracle SQL you can easily do this with over (partition by ) command.
What I would like to do is compute all statistics like (std, max, mean) over customer id and any other variable.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
I include those instructions as the data you show only has one variable that could be considered numeric by SAS unless your "3m" is supposed to be 3 million or such and is appearing as a FORMATTED value and the NA is appearing because you have a custom format to show missing as NA. Otherwise 3m would be character and mean or other statistical function would yeild a missing value.
Or possibly you want Proc summary (or means) as it will create statistics for groups of class variables and autoname them if that option is used. Here's a small example using a SAS supplied dataset you should have access to:
proc summary data=sashelp.class; class age sex; var height weight; output out=classsummary max= min= mean= std= /autoname; run;
The output has summaries at all of the combinations of the grouping variables (the ones you want to partition over maybe?) The Var statement lists the numeric variables to summarize, output says to create an output data set and provide the name of the set (out=), the requested statistics (max= min= etc) and the /autoname tells SAS to create names by appending the statistic abbreviation to the variable name.
You would have to go into a lot detail to explain wher any of the statistics for your data_1, data_2 and data_3 come from given that example data and "results".
Hi,
data_1, data_2 etc. are not num variables but characters 3m means 3 months, the thing is when data_1 is NA like in example:
Frnak | 3m | NA | 9m | 88 |
Frank | NA | NA | 9m | 58 |
avg_data_1 is 88,
when both data_2 are NA the avg_data_2=0
when data_3 avg_data_3 = 73.
Looks like you want something like:
proc sql;
select
Cst_name,
coalesce(mean(case when data_1 = "NA" then . else skrz end), 0) as avg_data_1_cst,
coalesce(mean(case when data_2 = "NA" then . else skrz end), 0) as avg_data_2_cst,
coalesce(mean(case when data_3 = "NA" then . else skrz end), 0) as avg_data_3_cst,
coalesce(max(case when data_1 = "NA" then . else skrz end), 0) as max_data_1_cst
from test
group by Cst_name;
quit;
I post it here because I am struggling with the similar problem. Could you please tell me an efficient way to count stats like std, mean, max by a group and add it to the original dataset. I found this http://support.sas.com/resources/papers/proceedings11/019-2011.pdf but their only show how to count or sum, the other problem is efficiency, I have to create over 500 variables like mentioned one, working with more than 20 large tables(over 10 mln obs in each) so sorting and merging is time-consuming(SQL package process everything in about 15h, but there is need to rewrite everything into 4GL).
In the table below, you will find desirable results.
Name | Sex | Age | Weight | Height | mean_age_by_sex | std_weight_by_age |
Philip | M | 16 | 72 | 150 | 13,4 | 0 |
Janet | F | 15 | 62,5 | 112,5 | 13,22 | 1,81573 |
Mary | F | 15 | 66,5 | 112 | 13,22 | 1,81573 |
Ronald | M | 15 | 67 | 133 | 13,4 | 1,81573 |
William | M | 15 | 66,5 | 112 | 13,4 | 1,81573 |
Carol | F | 14 | 62,8 | 102,5 | 13,22 | 2,425902 |
Judy | F | 14 | 64,3 | 90 | 13,22 | 2,425902 |
Alfred | M | 14 | 69 | 112,5 | 13,4 | 2,425902 |
Henry | M | 14 | 63,5 | 102,5 | 13,4 | 2,425902 |
Alice | F | 13 | 56,5 | 84 | 13,22 | 3,670907 |
Barbara | F | 13 | 65,3 | 98 | 13,22 | 3,670907 |
Jeffrey | M | 13 | 62,5 | 84 | 13,4 | 3,670907 |
Jane | F | 12 | 59,8 | 84,5 | 13,22 | 2,949305 |
Louise | F | 12 | 56,3 | 77 | 13,22 | 2,949305 |
James | M | 12 | 57,3 | 83 | 13,4 | 2,949305 |
John | M | 12 | 59 | 99,5 | 13,4 | 2,949305 |
Robert | M | 12 | 64,8 | 128 | 13,4 | 2,949305 |
Joyce | F | 11 | 51,3 | 50,5 | 13,22 | 3,1 |
Thomas | M | 11 | 57,5 | 85 | 13,4 | 3,1 |
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.