turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Proc sql

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-26-2016 11:21 AM - edited 11-27-2016 02:21 AM

I'd like to find the Q3 in proc sql

Given that, i have to find all consumers in the 3rd quartile of the amount they spent. And display customername, total spent and their customer id.

So basically i need to INNER JOIN 2 datasets which are spending dataset and customer dataset, and there are 100000 of datasets so i just gonna list down as an example. Just want to show the 3rd quartile of the customers.

Spending data set, there are spendingid, customerid, totalprice and numunits

Spending_id Customer_id totalprice numunits

1212112 100000 19 2

989898 112121 298 10

3i31030 20000 2 22

Customer dataset, there are customerid and firstname

Customer_id firstname

12311111 Ellen

9908009 JOhn

3376247 Jay

Thank you,

Accepted Solutions

Solution

11-27-2016
02:21 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-26-2016 07:45 PM

PROC SQL is not the way to go for this. If you absolutely * insist* on PROC sql then you could do it with this broad outline, which can be done with 3 create table statements in a single PROC SQL:

(1) create a table of total spending by customer ordered by total_spending,

(2) the above create table statement (as all create table statemens) will generate an automatic macro variable SQLOBS= number of rows in the new table,

(3) create a second table selecting all the records with MONOTONIC() between 0.5*&sqlobs and 0.75*&sqlobs. MONOTONIC() is an undocumented/unsupported function available in PROC SQL that is supposed to provide the row number of the source table.

(4) now do the inner join with the customer data set using a third create table statement.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-26-2016 11:23 AM - edited 11-26-2016 11:23 AM

You have to provide more information. Post some sample data and describe what you want your output to look like

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to draycut

11-26-2016 11:34 AM

Hi sir,

please take a look at my post again. Thanks.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to draycut

11-26-2016 11:39 AM

Hi sir, i would like to know where should i put my where clause statement for Quartile 3?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-26-2016 12:16 PM

Not a SIR, but look at proc means or proc univariate to calculate your quartile.

I dont believe SQL can calculate quartikes.

I'm fairly certain the documentation covers this.

You may also be interested in the section on combining data and the various methods to do so.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-26-2016 04:41 PM

SAS/SQL does not provide quartiles, except Q2 = median and Q4 = max.

PG

Solution

11-27-2016
02:21 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-26-2016 07:45 PM

PROC SQL is not the way to go for this. If you absolutely * insist* on PROC sql then you could do it with this broad outline, which can be done with 3 create table statements in a single PROC SQL:

(1) create a table of total spending by customer ordered by total_spending,

(2) the above create table statement (as all create table statemens) will generate an automatic macro variable SQLOBS= number of rows in the new table,

(3) create a second table selecting all the records with MONOTONIC() between 0.5*&sqlobs and 0.75*&sqlobs. MONOTONIC() is an undocumented/unsupported function available in PROC SQL that is supposed to provide the row number of the source table.

(4) now do the inner join with the customer data set using a third create table statement.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mkeintz

11-26-2016 08:53 PM

If you have ties then step 3 is where the SQL solution would break. Proc RANK is the easiest.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

11-26-2016 11:00 PM

First, I agree proc rank is easiest. And I would also use PROC MEANS as you suggested to get total spending by custommer prior to the proc rank.

As to ties, point taken.

But if OP wants exactly 25% sample size then ignoring ties is not a problem. Often folks add a random small number to each value to eliminate ties when ranking - this effectively does the same.

And if the OP wants to stay in SQL-world, it's possible to keep ties at the boundary entirely inside or outside Q3, but it's a little ugly, since it needs to invoke macro functions and data set name parameters (firstobs and obs). Here's a solution for sashelp.cars, using total horsepower for each car make:

**proc** **sql** noprint;

create table totalhp as select make, sum(horsepower) as hpsum

from sashelp.cars group by make order by hpsum;

%let obs50=%sysevalf(0.5*&sqlobs,CEIL);

%let obs75=%sysevalf(0.75*&sqlobs,FLOOR);

create table min_max as select min(hpsum) as medianhpsum, max(hpsum) as q3hpsum

from (select hpsum from totalhp (firstobs=&obs50 obs=&obs50)

union

select hpsum from totalhp (firstobs=&obs75 obs=&obs75)

);

create table step3 as select * from totalhp,min_max where hpsum between medianhpsum and q3hpsum;

**quit**;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mkeintz

11-26-2016 11:44 PM

There is another way to find Q3, including ties at the borders, using SQL and without macro operations. Use the fact that Q3 is between the median and the median of values above the median:

```
proc sql;
create table totalHp as
select
make,
sum(horsepower) as hpSum
from sashelp.cars
group by make
order by hpSum;
create table Q3 as
select *
from (
select *
from totalHp
having hpSum >= median(hpSum) )
having hpSum <= median(hpSum);
quit;
```

PG