My current data is below. I would like to create a new column called 'median for category+date', where it is equal to the median by date and category. That is order the dataset by date and category and then compute the median and store it in the column 'median for category+date'.
current:
Date | Category | Value |
1-Jan-19 | C | 17 |
1-Jan-19 | B | 21 |
1-Jan-19 | B | 12 |
1-Jan-19 | B | 12 |
1-Jan-19 | A | 17 |
1-Jan-19 | A | 6 |
1-Feb-19 | D | 19 |
1-Feb-19 | C | 23 |
1-Feb-19 | D | 24 |
1-Feb-19 | D | 21 |
1-Feb-19 | A | 16 |
1-Mar-19 | A | 9 |
1-Mar-19 | D | 24 |
1-Mar-19 | D | 15 |
1-Mar-19 | A | 7 |
1-Mar-19 | D | 21 |
1-Mar-19 | D | 5 |
1-Apr-19 | C | 9 |
1-Apr-19 | A | 15 |
1-Apr-19 | A | 5 |
1-Apr-19 | C | 15 |
1-Apr-19 | A | 19 |
1-May-19 | D | 17 |
1-May-19 | C | 9 |
1-May-19 | A | 7 |
1-May-19 | B | 13 |
1-May-19 | C | 10 |
1-May-19 | A | 14 |
desired:
Date | Category | Value | Median for category+date |
1-Jan-19 | A | 17 | 11.5 |
1-Jan-19 | A | 6 | 11.5 |
1-Jan-19 | B | 21 | 12 |
1-Jan-19 | B | 12 | 12 |
1-Jan-19 | B | 12 | 12 |
1-Jan-19 | C | 17 | 17 |
1-Feb-19 | A | 16 | 16 |
1-Feb-19 | C | 23 | 3 |
1-Feb-19 | D | 19 | 21 |
1-Feb-19 | D | 24 | 21 |
1-Feb-19 | D | 21 | 21 |
1-Mar-19 | A | 9 | 8 |
1-Mar-19 | A | 7 | 8 |
1-Mar-19 | D | 24 | 18 |
1-Mar-19 | D | 15 | 18 |
1-Mar-19 | D | 21 | 18 |
1-Mar-19 | D | 5 | 18 |
1-Apr-19 | A | 15 | 15 |
1-Apr-19 | A | 5 | 15 |
1-Apr-19 | A | 19 | 15 |
1-Apr-19 | C | 9 | 12 |
1-Apr-19 | C | 15 | 12 |
1-May-19 | A | 7 | 10.5 |
1-May-19 | A | 14 | 10.5 |
1-May-19 | B | 13 | 13 |
1-May-19 | C | 9 | 9.5 |
1-May-19 | C | 10 | 9.5 |
1-May-19 | D | 17 | 17 |
Like this?
proc sql;
select DATE, CATEGORY, VALUE, median(VALUE) as MEDIAN
from TABLE
group by DATE, CATEGORY
order by DATE, CATEGORY;
Please post your current data as a datastep
The data is csv formatted and was imported using 'proc import'
Like this?
proc sql;
select DATE, CATEGORY, VALUE, median(VALUE) as MEDIAN
from TABLE
group by DATE, CATEGORY
order by DATE, CATEGORY;
This works thanks!
Just need the quit; at the end
What is the function within sql I could use to also create a Q1 (as 25% percentile) column?
Thanks
Look at the quantile function.
I tried this and it didn't work as intended, this was the result.
proc sql; select DATE, CATEGORY, VALUE, PCTL(75, VALUE) AS Q3 from current group by DATE, CATEGORY order by DATE, CATEGORY; quit;
Mm you're right.
It's back to proc means then, unless someone who actually uses statistics is more knowledgeable than me.
I don't believe Proc SQL treats PCTL function as summary function. At least from this bit in the log:
24 proc sql; 25 create table want as 26 select sex,age, pctl(25,height) as p25 27 from sashelp.class 28 group by sex, age 29 order by sex,age 30 ; NOTE: A GROUP BY clause has been discarded because neither the SELECT clause nor the optional HAVING clause of the associated table-expression referenced a summary function. NOTE: Table WORK.WANT created, with 19 rows and 3 columns.
So since not a summary function it is applying the function to each row. And guess what the 25th (or 75th or 10th ...) percentile is of a single value?
Note that the documentation examples look like:
lower_quartile=PCTL(25, 2, 4, 1, 3);
including for DS2 and FedSQL. So not a summary across records function by design.
Hi @sasprogramming,
There is currently no PROC SQL summary function for percentiles other than the median. The MEDIAN function (as a PROC SQL summary function) was introduced in SAS 9.4. Usage Note 12133 describes this and also that other Base SAS functions (such as PCTL) are only applied to the value(s) on the current row.
So, for the first and third quartile PROC MEANS (or PROC SUMMARY) is the procedure of choice, as Chris said.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.