BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Matt3
Quartz | Level 8

Hi,

Could anyone please tell me if it is possible to use such domain functions in SAS as in sql

for instance:

'mean(var3) over (patrition by var1,var2)'

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Looks like you want something like:

 


proc sql;
select
    Cst_name,
    coalesce(mean(case when data_1 = "NA" then . else skrz end), 0) as avg_data_1_cst,
    coalesce(mean(case when data_2 = "NA" then . else skrz end), 0) as avg_data_2_cst,
    coalesce(mean(case when data_3 = "NA" then . else skrz end), 0) as avg_data_3_cst,
    coalesce(max(case when data_1 = "NA" then . else skrz end), 0) as max_data_1_cst
from test
group by Cst_name;
quit;


PG

View solution in original post

9 REPLIES 9
Matt3
Quartz | Level 8

Not really, rather then grouping them I have to get statistics over multiple variables for eg.

 

case when data_2 <> 'non' then round(avg(num) over (partition by wyn.id_cst, wyn.data_2),2) 
else 0 end as sred_skzr_6m,
case when data_3 <> 'non' then round(avg(num) over (partition by wyn.id_cst, wyn.data_3),2) 
else 0 end as sred_skzr_9m,
case when data_4 = 'sth' then round(avg(num) over (partition by wyn.id_cst, wyn.data_4),2) 
else 0 end as sred_skzr_12m,

 

LinusH
Tourmaline | Level 20
Since most of us are SAS specialists, example syntax from other SW is of little help.
Please exemplify with sample data and desired output, paired with a more verbal description of what you want to accomplish.
Data never sleeps
Matt3
Quartz | Level 8

ok, here you can see the example of the data:

Cst_Namedata_1data_2data_3skrz
John3m6m9m100
JohnNA6m9m87
AlfredNA6mNA95
AlfredNA6mNA57
AlfredNA6mNA95
Alice3m6mNA46
Barbara3mNA9m78
Barbara3mNA9m45
WilliamNANA9m58
Frnak3mNA9m88
FrankNANA9m58
RoseNA6m9m79
RoseNA6mNA128
SarahNA6m9m78
Sarah3mNA9m89

 

and expected results:

Cst_Nameavg_data_1_cstavg_data_2_cstavg_data_3_cstmax_data_1_cststd_data_2_cst
John10093,593,51006,5
Alfred082,333333330017,91337179
Alice46460460
Barbara61,5061,5780
William005800
Frnak88073880
Rose0103,579024,5
Sarah897883,5890

 

 

in Oracle SQL you can easily do this with over (partition by ) command.

What I would like to do is compute all statistics like (std, max, mean) over customer id and any other variable.

ballardw
Super User

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

I include those instructions as the data you show only has one variable that could be considered numeric by SAS unless your "3m" is supposed to be 3 million or such and is appearing as a FORMATTED value and the NA is appearing because you have a custom format to show missing as NA. Otherwise 3m would be character and mean or other statistical function would yeild a missing value.

 

Or possibly you want Proc summary (or means) as it will create statistics for groups of class variables and autoname them if that option is used. Here's a small example using a SAS supplied dataset you should have access to:

proc summary data=sashelp.class;
   class age sex;
   var height weight;
   output out=classsummary  max= min= mean= std= /autoname;
run;

The output has summaries at all of the combinations of the grouping variables (the ones you want to partition over maybe?) The Var statement lists the numeric variables to summarize, output says to create an output data set and provide the name of the set (out=), the requested statistics (max= min= etc) and the /autoname tells SAS to create names by appending the statistic abbreviation to the variable name.

 

 

You would have to go into a lot detail to explain wher any of the statistics for your data_1, data_2 and data_3 come from given that example data and "results".

Matt3
Quartz | Level 8

Hi,

data_1, data_2 etc. are not num variables but characters 3m means 3 months, the thing is when data_1 is NA like in example:

Frnak3mNA9m88
FrankNANA9m58

avg_data_1 is 88,

when both data_2 are NA the avg_data_2=0

when data_3 avg_data_3 = 73.

 

 

PGStats
Opal | Level 21

Looks like you want something like:

 


proc sql;
select
    Cst_name,
    coalesce(mean(case when data_1 = "NA" then . else skrz end), 0) as avg_data_1_cst,
    coalesce(mean(case when data_2 = "NA" then . else skrz end), 0) as avg_data_2_cst,
    coalesce(mean(case when data_3 = "NA" then . else skrz end), 0) as avg_data_3_cst,
    coalesce(max(case when data_1 = "NA" then . else skrz end), 0) as max_data_1_cst
from test
group by Cst_name;
quit;


PG
Matt3
Quartz | Level 8
Many thanks, that is exactly what I wanted to get.
Matt3
Quartz | Level 8

I post it here because I  am struggling with the similar problem. Could you please tell me an efficient  way to count stats like std, mean, max by a group and add it to the original dataset. I found this http://support.sas.com/resources/papers/proceedings11/019-2011.pdf but their only show how to count or sum, the other problem is efficiency, I have to create over 500 variables like mentioned one, working with more than 20 large tables(over 10 mln obs in each) so sorting and merging is time-consuming(SQL package process everything in about 15h, but there is need to rewrite everything into 4GL).  

In the table below, you will find desirable results.  

NameSexAgeWeightHeightmean_age_by_sexstd_weight_by_age
PhilipM167215013,40
JanetF1562,5112,513,221,81573
MaryF1566,511213,221,81573
RonaldM156713313,41,81573
WilliamM1566,511213,41,81573
CarolF1462,8102,513,222,425902
JudyF1464,39013,222,425902
AlfredM1469112,513,42,425902
HenryM1463,5102,513,42,425902
AliceF1356,58413,223,670907
BarbaraF1365,39813,223,670907
JeffreyM1362,58413,43,670907
JaneF1259,884,513,222,949305
LouiseF1256,37713,222,949305
JamesM1257,38313,42,949305
JohnM125999,513,42,949305
RobertM1264,812813,42,949305
JoyceF1151,350,513,223,1
ThomasM1157,58513,43,1

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1171 views
  • 1 like
  • 5 in conversation