Solved: Re: calculate median by category and date

sasprogramming · Posted 07-20-2021 09:01 PM

My current data is below. I would like to create a new column called 'median for category+date', where it is equal to the median by date and category. That is order the dataset by date and category and then compute the median and store it in the column 'median for category+date'.

current:

Date	Category	Value
1-Jan-19	C	17
1-Jan-19	B	21
1-Jan-19	B	12
1-Jan-19	B	12
1-Jan-19	A	17
1-Jan-19	A	6
1-Feb-19	D	19
1-Feb-19	C	23
1-Feb-19	D	24
1-Feb-19	D	21
1-Feb-19	A	16
1-Mar-19	A	9
1-Mar-19	D	24
1-Mar-19	D	15
1-Mar-19	A	7
1-Mar-19	D	21
1-Mar-19	D	5
1-Apr-19	C	9
1-Apr-19	A	15
1-Apr-19	A	5
1-Apr-19	C	15
1-Apr-19	A	19
1-May-19	D	17
1-May-19	C	9
1-May-19	A	7
1-May-19	B	13
1-May-19	C	10
1-May-19	A	14

desired:

Date	Category	Value	Median for category+date
1-Jan-19	A	17	11.5
1-Jan-19	A	6	11.5
1-Jan-19	B	21	12
1-Jan-19	B	12	12
1-Jan-19	B	12	12
1-Jan-19	C	17	17
1-Feb-19	A	16	16
1-Feb-19	C	23	3
1-Feb-19	D	19	21
1-Feb-19	D	24	21
1-Feb-19	D	21	21
1-Mar-19	A	9	8
1-Mar-19	A	7	8
1-Mar-19	D	24	18
1-Mar-19	D	15	18
1-Mar-19	D	21	18
1-Mar-19	D	5	18
1-Apr-19	A	15	15
1-Apr-19	A	5	15
1-Apr-19	A	19	15
1-Apr-19	C	9	12
1-Apr-19	C	15	12
1-May-19	A	7	10.5
1-May-19	A	14	10.5
1-May-19	B	13	13
1-May-19	C	9	9.5
1-May-19	C	10	9.5
1-May-19	D	17	17

ChrisNZ · Posted 07-21-2021 01:07 AM

Like this?

proc sql;
  select DATE, CATEGORY, VALUE, median(VALUE) as MEDIAN
  from TABLE
  group by DATE, CATEGORY 
  order by DATE, CATEGORY;

High-Performance SAS Coding - Third Edition

View solution in original post

Sajid01 · Posted 07-20-2021 09:11 PM

Please post your current data as a datastep

sasprogramming · Posted 07-20-2021 09:44 PM

The data is csv formatted and was imported using 'proc import'

sasprogramming · Posted 07-20-2021 09:46 PM

proc import
file="C:/Users/ggqeo/Documents/My SAS Files/current.csv"
out=current
dbms=csv;
run;

Sajid01 · Posted 07-20-2021 10:22 PM

Thanks.

ChrisNZ · Posted 07-21-2021 01:07 AM

Like this?

proc sql;
  select DATE, CATEGORY, VALUE, median(VALUE) as MEDIAN
  from TABLE
  group by DATE, CATEGORY 
  order by DATE, CATEGORY;

High-Performance SAS Coding - Third Edition

sasprogramming · Posted 07-21-2021 01:14 AM

This works thanks!

Just need the quit; at the end

sasprogramming · Posted 07-21-2021 01:17 AM

What is the function within sql I could use to also create a Q1 (as 25% percentile) column?

Thanks

ChrisNZ · Posted 07-21-2021 01:40 AM

Look at the quantile function.

High-Performance SAS Coding - Third Edition

sasprogramming · Posted 07-21-2021 02:04 AM

I tried this and it didn't work as intended, this was the result.

proc sql;
  select DATE, CATEGORY, VALUE, PCTL(75, VALUE) AS Q3
  from current
  group by DATE, CATEGORY 
  order by DATE, CATEGORY;
quit;

ChrisNZ · Posted 07-21-2021 02:44 AM

Mm you're right.

It's back to proc means then, unless someone who actually uses statistics is more knowledgeable than me.

High-Performance SAS Coding - Third Edition

ballardw · Posted 07-21-2021 03:44 PM

I don't believe Proc SQL treats PCTL function as summary function. At least from this bit in the log:

24   proc sql;
25     create table want as
26     select sex,age, pctl(25,height) as p25
27     from sashelp.class
28     group by sex, age
29     order by sex,age
30     ;
NOTE: A GROUP BY clause has been discarded because neither the SELECT clause nor the optional
      HAVING clause of the associated table-expression referenced a summary function.
NOTE: Table WORK.WANT created, with 19 rows and 3 columns.

So since not a summary function it is applying the function to each row. And guess what the 25th (or 75th or 10th ...) percentile is of a single value?

Note that the documentation examples look like:

lower_quartile=PCTL(25, 2, 4, 1, 3);

including for DS2 and FedSQL. So not a summary across records function by design.

FreelanceReinh · Posted 07-21-2021 05:06 AM

Hi @sasprogramming,

There is currently no PROC SQL summary function for percentiles other than the median. The MEDIAN function (as a PROC SQL summary function) was introduced in SAS 9.4. Usage Note 12133 describes this and also that other Base SAS functions (such as PCTL) are only applied to the value(s) on the current row.

So, for the first and third quartile PROC MEANS (or PROC SUMMARY) is the procedure of choice, as Chris said.

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away