- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My current data is below. I would like to create a new column called 'median for category+date', where it is equal to the median by date and category. That is order the dataset by date and category and then compute the median and store it in the column 'median for category+date'.
current:
Date | Category | Value |
1-Jan-19 | C | 17 |
1-Jan-19 | B | 21 |
1-Jan-19 | B | 12 |
1-Jan-19 | B | 12 |
1-Jan-19 | A | 17 |
1-Jan-19 | A | 6 |
1-Feb-19 | D | 19 |
1-Feb-19 | C | 23 |
1-Feb-19 | D | 24 |
1-Feb-19 | D | 21 |
1-Feb-19 | A | 16 |
1-Mar-19 | A | 9 |
1-Mar-19 | D | 24 |
1-Mar-19 | D | 15 |
1-Mar-19 | A | 7 |
1-Mar-19 | D | 21 |
1-Mar-19 | D | 5 |
1-Apr-19 | C | 9 |
1-Apr-19 | A | 15 |
1-Apr-19 | A | 5 |
1-Apr-19 | C | 15 |
1-Apr-19 | A | 19 |
1-May-19 | D | 17 |
1-May-19 | C | 9 |
1-May-19 | A | 7 |
1-May-19 | B | 13 |
1-May-19 | C | 10 |
1-May-19 | A | 14 |
desired:
Date | Category | Value | Median for category+date |
1-Jan-19 | A | 17 | 11.5 |
1-Jan-19 | A | 6 | 11.5 |
1-Jan-19 | B | 21 | 12 |
1-Jan-19 | B | 12 | 12 |
1-Jan-19 | B | 12 | 12 |
1-Jan-19 | C | 17 | 17 |
1-Feb-19 | A | 16 | 16 |
1-Feb-19 | C | 23 | 3 |
1-Feb-19 | D | 19 | 21 |
1-Feb-19 | D | 24 | 21 |
1-Feb-19 | D | 21 | 21 |
1-Mar-19 | A | 9 | 8 |
1-Mar-19 | A | 7 | 8 |
1-Mar-19 | D | 24 | 18 |
1-Mar-19 | D | 15 | 18 |
1-Mar-19 | D | 21 | 18 |
1-Mar-19 | D | 5 | 18 |
1-Apr-19 | A | 15 | 15 |
1-Apr-19 | A | 5 | 15 |
1-Apr-19 | A | 19 | 15 |
1-Apr-19 | C | 9 | 12 |
1-Apr-19 | C | 15 | 12 |
1-May-19 | A | 7 | 10.5 |
1-May-19 | A | 14 | 10.5 |
1-May-19 | B | 13 | 13 |
1-May-19 | C | 9 | 9.5 |
1-May-19 | C | 10 | 9.5 |
1-May-19 | D | 17 | 17 |
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Like this?
proc sql;
select DATE, CATEGORY, VALUE, median(VALUE) as MEDIAN
from TABLE
group by DATE, CATEGORY
order by DATE, CATEGORY;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please post your current data as a datastep
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The data is csv formatted and was imported using 'proc import'
- Tags:
- ee d
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
file="C:/Users/ggqeo/Documents/My SAS Files/current.csv"
out=current
dbms=csv;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Like this?
proc sql;
select DATE, CATEGORY, VALUE, median(VALUE) as MEDIAN
from TABLE
group by DATE, CATEGORY
order by DATE, CATEGORY;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This works thanks!
Just need the quit; at the end
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What is the function within sql I could use to also create a Q1 (as 25% percentile) column?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Look at the quantile function.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I tried this and it didn't work as intended, this was the result.
proc sql; select DATE, CATEGORY, VALUE, PCTL(75, VALUE) AS Q3 from current group by DATE, CATEGORY order by DATE, CATEGORY; quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Mm you're right.
It's back to proc means then, unless someone who actually uses statistics is more knowledgeable than me.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't believe Proc SQL treats PCTL function as summary function. At least from this bit in the log:
24 proc sql; 25 create table want as 26 select sex,age, pctl(25,height) as p25 27 from sashelp.class 28 group by sex, age 29 order by sex,age 30 ; NOTE: A GROUP BY clause has been discarded because neither the SELECT clause nor the optional HAVING clause of the associated table-expression referenced a summary function. NOTE: Table WORK.WANT created, with 19 rows and 3 columns.
So since not a summary function it is applying the function to each row. And guess what the 25th (or 75th or 10th ...) percentile is of a single value?
Note that the documentation examples look like:
lower_quartile=PCTL(25, 2, 4, 1, 3);
including for DS2 and FedSQL. So not a summary across records function by design.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @sasprogramming,
There is currently no PROC SQL summary function for percentiles other than the median. The MEDIAN function (as a PROC SQL summary function) was introduced in SAS 9.4. Usage Note 12133 describes this and also that other Base SAS functions (such as PCTL) are only applied to the value(s) on the current row.
So, for the first and third quartile PROC MEANS (or PROC SUMMARY) is the procedure of choice, as Chris said.