I'd like to find the Q3 in proc sql
Given that, i have to find all consumers in the 3rd quartile of the amount they spent. And display customername, total spent and their customer id.
So basically i need to INNER JOIN 2 datasets which are spending dataset and customer dataset, and there are 100000 of datasets so i just gonna list down as an example. Just want to show the 3rd quartile of the customers.
Spending data set, there are spendingid, customerid, totalprice and numunits
Spending_id Customer_id totalprice numunits
1212112 100000 19 2
989898 112121 298 10
3i31030 20000 2 22
Customer dataset, there are customerid and firstname
Customer_id firstname
12311111 Ellen
9908009 JOhn
3376247 Jay
Thank you,
PROC SQL is not the way to go for this. If you absolutely insist on PROC sql then you could do it with this broad outline, which can be done with 3 create table statements in a single PROC SQL:
(1) create a table of total spending by customer ordered by total_spending,
(2) the above create table statement (as all create table statemens) will generate an automatic macro variable SQLOBS= number of rows in the new table,
(3) create a second table selecting all the records with MONOTONIC() between 0.5*&sqlobs and 0.75*&sqlobs. MONOTONIC() is an undocumented/unsupported function available in PROC SQL that is supposed to provide the row number of the source table.
(4) now do the inner join with the customer data set using a third create table statement.
You have to provide more information. Post some sample data and describe what you want your output to look like 🙂
Hi sir,
please take a look at my post again. Thanks. 🙂
Not a SIR, but look at proc means or proc univariate to calculate your quartile.
I dont believe SQL can calculate quartikes.
I'm fairly certain the documentation covers this.
You may also be interested in the section on combining data and the various methods to do so.
SAS/SQL does not provide quartiles, except Q2 = median and Q4 = max.
PROC SQL is not the way to go for this. If you absolutely insist on PROC sql then you could do it with this broad outline, which can be done with 3 create table statements in a single PROC SQL:
(1) create a table of total spending by customer ordered by total_spending,
(2) the above create table statement (as all create table statemens) will generate an automatic macro variable SQLOBS= number of rows in the new table,
(3) create a second table selecting all the records with MONOTONIC() between 0.5*&sqlobs and 0.75*&sqlobs. MONOTONIC() is an undocumented/unsupported function available in PROC SQL that is supposed to provide the row number of the source table.
(4) now do the inner join with the customer data set using a third create table statement.
If you have ties then step 3 is where the SQL solution would break. Proc RANK is the easiest.
First, I agree proc rank is easiest. And I would also use PROC MEANS as you suggested to get total spending by custommer prior to the proc rank.
As to ties, point taken.
But if OP wants exactly 25% sample size then ignoring ties is not a problem. Often folks add a random small number to each value to eliminate ties when ranking - this effectively does the same.
And if the OP wants to stay in SQL-world, it's possible to keep ties at the boundary entirely inside or outside Q3, but it's a little ugly, since it needs to invoke macro functions and data set name parameters (firstobs and obs). Here's a solution for sashelp.cars, using total horsepower for each car make:
proc sql noprint;
create table totalhp as select make, sum(horsepower) as hpsum
from sashelp.cars group by make order by hpsum;
%let obs50=%sysevalf(0.5*&sqlobs,CEIL);
%let obs75=%sysevalf(0.75*&sqlobs,FLOOR);
create table min_max as select min(hpsum) as medianhpsum, max(hpsum) as q3hpsum
from (select hpsum from totalhp (firstobs=&obs50 obs=&obs50)
union
select hpsum from totalhp (firstobs=&obs75 obs=&obs75)
);
create table step3 as select * from totalhp,min_max where hpsum between medianhpsum and q3hpsum;
quit;
There is another way to find Q3, including ties at the borders, using SQL and without macro operations. Use the fact that Q3 is between the median and the median of values above the median:
proc sql;
create table totalHp as
select
make,
sum(horsepower) as hpSum
from sashelp.cars
group by make
order by hpSum;
create table Q3 as
select *
from (
select *
from totalHp
having hpSum >= median(hpSum) )
having hpSum <= median(hpSum);
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.