BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Boa
Obsidian | Level 7 Boa
Obsidian | Level 7

I'd like to find the Q3 in proc sql

 

Given that, i have to find all consumers in the 3rd quartile of the amount they spent. And display customername, total spent and their customer id.

So basically i need to INNER JOIN 2 datasets which are spending dataset and customer dataset, and there are 100000 of datasets so i just gonna list down as an example. Just want to show the 3rd quartile of the customers.

 

Spending data set, there are spendingid, customerid, totalprice and numunits

Spending_id                  Customer_id                 totalprice              numunits

  1212112                        100000                           19                       2

  989898                           112121                           298                    10

  3i31030                           20000                            2                        22 

  

Customer dataset, there are customerid and firstname

  Customer_id                 firstname

  12311111                        Ellen

  9908009                         JOhn

  3376247                         Jay

 

 

Thank you,

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

PROC SQL is not the way to go for this.  If you absolutely insist on PROC sql then you could do it with this broad outline, which can be done with 3 create table statements in a single PROC SQL:

 

   (1) create a table of total spending by customer ordered by total_spending,

 

   (2) the above create table statement (as all create table statemens) will generate an automatic macro variable SQLOBS= number of rows in the new table,

 

   (3) create a second table selecting all the records with MONOTONIC() between 0.5*&sqlobs and 0.75*&sqlobs.  MONOTONIC() is an undocumented/unsupported function available in PROC SQL that is supposed to provide the row  number of the source table.

 

  (4) now do the inner join with the customer data set using a third create table statement.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

9 REPLIES 9
PeterClemmensen
Tourmaline | Level 20

You have to provide more information. Post some sample data and describe what you want your output to look like 🙂

Boa
Obsidian | Level 7 Boa
Obsidian | Level 7

Hi sir, 

please take a look at my post again.  Thanks. 🙂

Boa
Obsidian | Level 7 Boa
Obsidian | Level 7
Hi sir, i would like to know where should i put my where clause statement for Quartile 3?
Reeza
Super User

Not a SIR, but look at proc means or proc univariate to calculate your quartile. 

 

I dont believe SQL can calculate quartikes. 

 

I'm fairly certain the documentation covers this. 

 

You may also be interested in the section on combining data and the various methods to do so. 

 

http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#n1tgk0uanvisvon1r26l...

PGStats
Opal | Level 21

SAS/SQL does not provide quartiles, except Q2 = median and Q4 = max.

PG
mkeintz
PROC Star

PROC SQL is not the way to go for this.  If you absolutely insist on PROC sql then you could do it with this broad outline, which can be done with 3 create table statements in a single PROC SQL:

 

   (1) create a table of total spending by customer ordered by total_spending,

 

   (2) the above create table statement (as all create table statemens) will generate an automatic macro variable SQLOBS= number of rows in the new table,

 

   (3) create a second table selecting all the records with MONOTONIC() between 0.5*&sqlobs and 0.75*&sqlobs.  MONOTONIC() is an undocumented/unsupported function available in PROC SQL that is supposed to provide the row  number of the source table.

 

  (4) now do the inner join with the customer data set using a third create table statement.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Reeza
Super User

If you have ties then step 3 is where the SQL solution would break. Proc RANK is the easiest. 

mkeintz
PROC Star

First, I agree proc rank is easiest.  And I would also use PROC MEANS as you suggested to get total spending by custommer prior to the proc rank.

 

As to ties, point taken.

 

But if OP wants exactly 25% sample size then ignoring ties is not a problem.  Often folks add a random small number to each value to eliminate ties when ranking - this effectively does the same.

 

And if  the OP wants to stay in SQL-world, it's possible to keep ties at the boundary entirely inside or outside Q3, but it's a little ugly, since it needs to invoke macro functions and data set name parameters (firstobs and obs).  Here's a solution for sashelp.cars, using total horsepower for each car make:

 

proc sql noprint;

  create table totalhp as select make, sum(horsepower) as hpsum

    from sashelp.cars group by make order by hpsum;

  %let obs50=%sysevalf(0.5*&sqlobs,CEIL);

  %let obs75=%sysevalf(0.75*&sqlobs,FLOOR);

  create table min_max as select min(hpsum) as medianhpsum, max(hpsum) as q3hpsum

    from (select hpsum from totalhp (firstobs=&obs50 obs=&obs50)

          union

          select hpsum from totalhp (firstobs=&obs75 obs=&obs75)

                   );

  create table step3 as select * from totalhp,min_max where hpsum between medianhpsum and q3hpsum;

quit;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PGStats
Opal | Level 21

There is another way to find Q3, including ties at the borders, using SQL and without macro operations. Use the fact that Q3 is between the median and the median of values above the median:

 

proc sql;
create table totalHp as 
select 
    make, 
    sum(horsepower) as hpSum
from sashelp.cars 
group by make 
order by hpSum;
create table Q3 as
select * 
from (
    select * 
    from totalHp
    having hpSum >= median(hpSum) )
having hpSum <= median(hpSum);
quit;

 

 

PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 8269 views
  • 2 likes
  • 5 in conversation