- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'd like to find the Q3 in proc sql
Given that, i have to find all consumers in the 3rd quartile of the amount they spent. And display customername, total spent and their customer id.
So basically i need to INNER JOIN 2 datasets which are spending dataset and customer dataset, and there are 100000 of datasets so i just gonna list down as an example. Just want to show the 3rd quartile of the customers.
Spending data set, there are spendingid, customerid, totalprice and numunits
Spending_id Customer_id totalprice numunits
1212112 100000 19 2
989898 112121 298 10
3i31030 20000 2 22
Customer dataset, there are customerid and firstname
Customer_id firstname
12311111 Ellen
9908009 JOhn
3376247 Jay
Thank you,
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
PROC SQL is not the way to go for this. If you absolutely insist on PROC sql then you could do it with this broad outline, which can be done with 3 create table statements in a single PROC SQL:
(1) create a table of total spending by customer ordered by total_spending,
(2) the above create table statement (as all create table statemens) will generate an automatic macro variable SQLOBS= number of rows in the new table,
(3) create a second table selecting all the records with MONOTONIC() between 0.5*&sqlobs and 0.75*&sqlobs. MONOTONIC() is an undocumented/unsupported function available in PROC SQL that is supposed to provide the row number of the source table.
(4) now do the inner join with the customer data set using a third create table statement.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You have to provide more information. Post some sample data and describe what you want your output to look like 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi sir,
please take a look at my post again. Thanks. 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not a SIR, but look at proc means or proc univariate to calculate your quartile.
I dont believe SQL can calculate quartikes.
I'm fairly certain the documentation covers this.
You may also be interested in the section on combining data and the various methods to do so.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
SAS/SQL does not provide quartiles, except Q2 = median and Q4 = max.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
PROC SQL is not the way to go for this. If you absolutely insist on PROC sql then you could do it with this broad outline, which can be done with 3 create table statements in a single PROC SQL:
(1) create a table of total spending by customer ordered by total_spending,
(2) the above create table statement (as all create table statemens) will generate an automatic macro variable SQLOBS= number of rows in the new table,
(3) create a second table selecting all the records with MONOTONIC() between 0.5*&sqlobs and 0.75*&sqlobs. MONOTONIC() is an undocumented/unsupported function available in PROC SQL that is supposed to provide the row number of the source table.
(4) now do the inner join with the customer data set using a third create table statement.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you have ties then step 3 is where the SQL solution would break. Proc RANK is the easiest.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
First, I agree proc rank is easiest. And I would also use PROC MEANS as you suggested to get total spending by custommer prior to the proc rank.
As to ties, point taken.
But if OP wants exactly 25% sample size then ignoring ties is not a problem. Often folks add a random small number to each value to eliminate ties when ranking - this effectively does the same.
And if the OP wants to stay in SQL-world, it's possible to keep ties at the boundary entirely inside or outside Q3, but it's a little ugly, since it needs to invoke macro functions and data set name parameters (firstobs and obs). Here's a solution for sashelp.cars, using total horsepower for each car make:
proc sql noprint;
create table totalhp as select make, sum(horsepower) as hpsum
from sashelp.cars group by make order by hpsum;
%let obs50=%sysevalf(0.5*&sqlobs,CEIL);
%let obs75=%sysevalf(0.75*&sqlobs,FLOOR);
create table min_max as select min(hpsum) as medianhpsum, max(hpsum) as q3hpsum
from (select hpsum from totalhp (firstobs=&obs50 obs=&obs50)
union
select hpsum from totalhp (firstobs=&obs75 obs=&obs75)
);
create table step3 as select * from totalhp,min_max where hpsum between medianhpsum and q3hpsum;
quit;
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There is another way to find Q3, including ties at the borders, using SQL and without macro operations. Use the fact that Q3 is between the median and the median of values above the median:
proc sql;
create table totalHp as
select
make,
sum(horsepower) as hpSum
from sashelp.cars
group by make
order by hpSum;
create table Q3 as
select *
from (
select *
from totalHp
having hpSum >= median(hpSum) )
having hpSum <= median(hpSum);
quit;