DATA Step, Macro, Functions and more

Proc sql

Accepted Solution Solved
Reply
Contributor Boa
Contributor
Posts: 21
Accepted Solution

Proc sql

[ Edited ]

I'd like to find the Q3 in proc sql

 

Given that, i have to find all consumers in the 3rd quartile of the amount they spent. And display customername, total spent and their customer id.

So basically i need to INNER JOIN 2 datasets which are spending dataset and customer dataset, and there are 100000 of datasets so i just gonna list down as an example. Just want to show the 3rd quartile of the customers.

 

Spending data set, there are spendingid, customerid, totalprice and numunits

Spending_id                  Customer_id                 totalprice              numunits

  1212112                        100000                           19                       2

  989898                           112121                           298                    10

  3i31030                           20000                            2                        22 

  

Customer dataset, there are customerid and firstname

  Customer_id                 firstname

  12311111                        Ellen

  9908009                         JOhn

  3376247                         Jay

 

 

Thank you,

 

 


Accepted Solutions
Solution
‎11-27-2016 02:21 AM
Trusted Advisor
Posts: 1,022

Re: Finding Q3 using proc sql

PROC SQL is not the way to go for this.  If you absolutely insist on PROC sql then you could do it with this broad outline, which can be done with 3 create table statements in a single PROC SQL:

 

   (1) create a table of total spending by customer ordered by total_spending,

 

   (2) the above create table statement (as all create table statemens) will generate an automatic macro variable SQLOBS= number of rows in the new table,

 

   (3) create a second table selecting all the records with MONOTONIC() between 0.5*&sqlobs and 0.75*&sqlobs.  MONOTONIC() is an undocumented/unsupported function available in PROC SQL that is supposed to provide the row  number of the source table.

 

  (4) now do the inner join with the customer data set using a third create table statement.

 

View solution in original post


All Replies
PROC Star
Posts: 749

Re: Finding Q3 using proc sql

[ Edited ]

You have to provide more information. Post some sample data and describe what you want your output to look like Smiley Happy

Contributor Boa
Contributor
Posts: 21

Re: Finding Q3 using proc sql

Hi sir, 

please take a look at my post again.  Thanks. Smiley Happy

Contributor Boa
Contributor
Posts: 21

Re: Finding Q3 using proc sql

Hi sir, i would like to know where should i put my where clause statement for Quartile 3?
Super User
Posts: 19,815

Re: Finding Q3 using proc sql

Not a SIR, but look at proc means or proc univariate to calculate your quartile. 

 

I dont believe SQL can calculate quartikes. 

 

I'm fairly certain the documentation covers this. 

 

You may also be interested in the section on combining data and the various methods to do so. 

 

http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#n1tgk0uanvisvon1r26l...

Respected Advisor
Posts: 4,925

Re: Finding Q3 using proc sql

SAS/SQL does not provide quartiles, except Q2 = median and Q4 = max.

PG
Solution
‎11-27-2016 02:21 AM
Trusted Advisor
Posts: 1,022

Re: Finding Q3 using proc sql

PROC SQL is not the way to go for this.  If you absolutely insist on PROC sql then you could do it with this broad outline, which can be done with 3 create table statements in a single PROC SQL:

 

   (1) create a table of total spending by customer ordered by total_spending,

 

   (2) the above create table statement (as all create table statemens) will generate an automatic macro variable SQLOBS= number of rows in the new table,

 

   (3) create a second table selecting all the records with MONOTONIC() between 0.5*&sqlobs and 0.75*&sqlobs.  MONOTONIC() is an undocumented/unsupported function available in PROC SQL that is supposed to provide the row  number of the source table.

 

  (4) now do the inner join with the customer data set using a third create table statement.

 

Super User
Posts: 19,815

Re: Finding Q3 using proc sql

If you have ties then step 3 is where the SQL solution would break. Proc RANK is the easiest. 

Trusted Advisor
Posts: 1,022

Re: Finding Q3 using proc sql

First, I agree proc rank is easiest.  And I would also use PROC MEANS as you suggested to get total spending by custommer prior to the proc rank.

 

As to ties, point taken.

 

But if OP wants exactly 25% sample size then ignoring ties is not a problem.  Often folks add a random small number to each value to eliminate ties when ranking - this effectively does the same.

 

And if  the OP wants to stay in SQL-world, it's possible to keep ties at the boundary entirely inside or outside Q3, but it's a little ugly, since it needs to invoke macro functions and data set name parameters (firstobs and obs).  Here's a solution for sashelp.cars, using total horsepower for each car make:

 

proc sql noprint;

  create table totalhp as select make, sum(horsepower) as hpsum

    from sashelp.cars group by make order by hpsum;

  %let obs50=%sysevalf(0.5*&sqlobs,CEIL);

  %let obs75=%sysevalf(0.75*&sqlobs,FLOOR);

  create table min_max as select min(hpsum) as medianhpsum, max(hpsum) as q3hpsum

    from (select hpsum from totalhp (firstobs=&obs50 obs=&obs50)

          union

          select hpsum from totalhp (firstobs=&obs75 obs=&obs75)

                   );

  create table step3 as select * from totalhp,min_max where hpsum between medianhpsum and q3hpsum;

quit;

 

Respected Advisor
Posts: 4,925

Re: Finding Q3 using proc sql

There is another way to find Q3, including ties at the borders, using SQL and without macro operations. Use the fact that Q3 is between the median and the median of values above the median:

 

proc sql;
create table totalHp as 
select 
    make, 
    sum(horsepower) as hpSum
from sashelp.cars 
group by make 
order by hpSum;
create table Q3 as
select * 
from (
    select * 
    from totalHp
    having hpSum >= median(hpSum) )
having hpSum <= median(hpSum);
quit;

 

 

PG
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 579 views
  • 1 like
  • 5 in conversation