BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Timmy2383
Lapis Lazuli | Level 10

I'm mainly a SAS admin and developer. I have some code provided by a statistician that runs in an Oracle database.  I'm trying to figure out how to convert to SAS syntax (with PROC SQL).  My problem is that I don't understand the PERCENTILE_DISC function enough to convert to relevant SAS syntax. Anyone able to help?

 

 select PARTY_NUMBER, 
        Min (CREDIT_CT) as CREDIT_CT_Min,
        PERCENTILE_DISC(0.01) WITHIN GROUP (ORDER BY CREDIT_CT ASC) as CREDIT_CT_P1, 
        PERCENTILE_DISC(0.05) WITHIN GROUP (ORDER BY CREDIT_CT ASC) as CREDIT_CT_P5,
        PERCENTILE_DISC(0.10) WITHIN GROUP (ORDER BY CREDIT_CT ASC) as CREDIT_CT_P10,
        PERCENTILE_DISC(0.25) WITHIN GROUP (ORDER BY CREDIT_CT ASC) as CREDIT_CT_P25,
        PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY CREDIT_CT ASC) as CREDIT_CT_Median,
        PERCENTILE_DISC(0.75) WITHIN GROUP (ORDER BY CREDIT_CT ASC) as CREDIT_CT_P75,
        PERCENTILE_DISC(0.95) WITHIN GROUP (ORDER BY CREDIT_CT ASC) as CREDIT_CT_P95,
        PERCENTILE_DISC(0.99) WITHIN GROUP (ORDER BY CREDIT_CT ASC) as CREDIT_CT_P99,     
        Max (CREDIT_CT) as CREDIT_CT_Max,
        Min (DEBIT_CT) as DEBIT_CT_Min,
        PERCENTILE_DISC(0.01) WITHIN GROUP (ORDER BY DEBIT_CT ASC) as DEBIT_CT_P1, 
        PERCENTILE_DISC(0.05) WITHIN GROUP (ORDER BY DEBIT_CT ASC) as DEBIT_CT_P5,
        PERCENTILE_DISC(0.10) WITHIN GROUP (ORDER BY DEBIT_CT ASC) as DEBIT_CT_P10,
        PERCENTILE_DISC(0.25) WITHIN GROUP (ORDER BY DEBIT_CT ASC) as DEBIT_CT_P25,
        PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY DEBIT_CT ASC) as DEBIT_CT_Median,
        PERCENTILE_DISC(0.75) WITHIN GROUP (ORDER BY DEBIT_CT ASC) as DEBIT_CT_P75,
        PERCENTILE_DISC(0.95) WITHIN GROUP (ORDER BY DEBIT_CT ASC) as DEBIT_CT_P95,
        PERCENTILE_DISC(0.99) WITHIN GROUP (ORDER BY DEBIT_CT ASC) as DEBIT_CT_P99,     
        Max (DEBIT_CT) as DEBIT_CT_MAX,
        Min (PCT_OUT) as PCT_OUT_Min,
        PERCENTILE_DISC(0.01) WITHIN GROUP (ORDER BY PCT_OUT ASC) as PCT_OUT_P1, 
        PERCENTILE_DISC(0.05) WITHIN GROUP (ORDER BY PCT_OUT ASC) as PCT_OUT_P5,
        PERCENTILE_DISC(0.10) WITHIN GROUP (ORDER BY PCT_OUT ASC) as PCT_OUT_P10,
        PERCENTILE_DISC(0.25) WITHIN GROUP (ORDER BY PCT_OUT ASC) as PCT_OUT_P25,
        PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY PCT_OUT ASC) as PCT_OUT_Median,
        PERCENTILE_DISC(0.75) WITHIN GROUP (ORDER BY PCT_OUT ASC) as PCT_OUT_P75,
        PERCENTILE_DISC(0.95) WITHIN GROUP (ORDER BY PCT_OUT ASC) as PCT_OUT_P95,
        PERCENTILE_DISC(0.99) WITHIN GROUP (ORDER BY PCT_OUT ASC) as PCT_OUT_P99,     
        Max (PCT_OUT) as PCT_OUT_Max        
from my_table
where tran_month <= add_months(to_date('20170901', 'YYYYMMDD'), -1)
and tran_month >= add_months(to_date('20170901', 'YYYYMMDD'), -12)
group by PARTY_NUMBER
order by PARTY_NUMBER;
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Just a quick note, I looked at the percentile_disc function in Oracle and it's different than most standard calculations IMO. 

 

Look at this example: 

https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions111.htm

 

The median for dept_id = 30 is 2800, but really should be 2850 in most calculations. So you need to decide if that's ok with you, or if you need to match Oracles calculations. 

 

Slight modification to @ballardw code, NOPRINT to prevent output to the results window and add in the WHERE statement for the filter. Again, you'll have to test this matches the Oracle functions. 

 

proc means data=work.my_table nway noprint;
where date between intnx('month', '01Jan2017'd, -1, 's') and intnx('month', '01Jan2017'd, -12, 's');

   class party_number ;

   var CREDIT_CT DEBIT_CT PCT_OUT;

   output out=work.output min=  max= p1= p5= p10= p25=
             p50= p75= p95= p99=    /autoname
               
   ;
run;

 

View solution in original post

6 REPLIES 6
Reeza
Super User
Use PROC UNIVARIATE instead of SAS SQL, it'll be more efficient. This is just calculating summary statistics by different grouping variables and percentiles from 1 to 99th percentile which PROC UNIVARIATE or MEANS will handle.
Timmy2383
Lapis Lazuli | Level 10

Thanks, @Reeza !

 

I'm definitely not trained in statistics or anything, so using SAS/STAT procedures is new to me.  I looked up the documentation and came up with this as a test:

 

proc univariate data = work.my_table;
by party_number;
var CREDIT_CT DEBIT_CT PCT_OUT;
output out = work.output_table;
run;

Am I on the right track? Can include the MIN/MAX statements from the SQL as well?

ballardw
Super User

@Timmy2383 wrote:

Thanks, @Reeza !

 

I'm definitely not trained in statistics or anything, so using SAS/STAT procedures is new to me.  I looked up the documentation and came up with this as a test:

 

proc univariate data = work.my_table;
by party_number;
var CREDIT_CT DEBIT_CT PCT_OUT;
output out = work.output_table;
run;

Am I on the right track? Can include the MIN/MAX statements from the SQL as well?


I might suggest :

 

proc summary data=work.my_table nway;
   class party_number;
   var CREDIT_CT DEBIT_CT PCT_OUT;
   output out=work.output min=  max= p1= p5= p10= p25=
             p50= p75= p95= p99=    /autoname
               
   ;
run;

The CLASS statement works similar to BY with a big difference for programming: it does not require data to be sorted as BY group processing does. If the main purpose is to create an output data set then SUMMARY does that without all of the tables that Univariate or Means would (though you can suppress them). The / autoname option can save a lot of code naming variables as this appends the statistic to the VAR variables names.

Another advantage with Summary/means and class statements is that without the option NWAY then you get additional summaries for combinations of the class variables.

 

P50 can be called median if your prefer and P25/P75 can be Q1 and Q3.

 

ChrisNZ
Tourmaline | Level 20

The CLASS statement works similar to BY with a big difference for programming: it does not require data to be sorted

 

@ballardw Another difference is that CLASS is usually (counter-intuitively) faster than BY, probably to do with the (suboptimal?) supplementary order-checking logic that SAS uses when a BY statement is present.

Reeza
Super User

Just a quick note, I looked at the percentile_disc function in Oracle and it's different than most standard calculations IMO. 

 

Look at this example: 

https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions111.htm

 

The median for dept_id = 30 is 2800, but really should be 2850 in most calculations. So you need to decide if that's ok with you, or if you need to match Oracles calculations. 

 

Slight modification to @ballardw code, NOPRINT to prevent output to the results window and add in the WHERE statement for the filter. Again, you'll have to test this matches the Oracle functions. 

 

proc means data=work.my_table nway noprint;
where date between intnx('month', '01Jan2017'd, -1, 's') and intnx('month', '01Jan2017'd, -12, 's');

   class party_number ;

   var CREDIT_CT DEBIT_CT PCT_OUT;

   output out=work.output min=  max= p1= p5= p10= p25=
             p50= p75= p95= p99=    /autoname
               
   ;
run;

 

Timmy2383
Lapis Lazuli | Level 10

Thank you all!  

 

This gives me a lot to go on.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2673 views
  • 0 likes
  • 4 in conversation