turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- How to compute the median of a variable but exclud...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-19-2013 06:31 PM

I'm try to compute the difference between each observation's value of a certain variable (A) and the median of the same variable (A) for

its joint industry (X) and performance (Y) decile, where the median excludes the obs itself. I already created deciles for variable Y within each industry X,

but can't figure out how to calculate the median of each decile excluding the obs. Any suggestions are much appreciated!

Accepted Solutions

Solution

09-19-2013
10:22 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to navyblue

09-19-2013 10:22 PM

You can try this :

**proc sort data=have; by x decileY; run;**

** **

**proc sql;****create table c as****select x, decileY, count(a) as n****from have****group by x, decileY;****quit;**

** **

**data want;****merge have c; by x decileY;****medPos = cc + floor((n+1)/2);****if _n_ <= medPos then do;**** medPos = min(cc + n, max(cc + 1, medPos + 1));**** set have(keep=a rename=(a=medA)) point=medPos;**** end;****else do;**** medPos = min(cc + n, max(cc + 1, medPos - 1));**** set have(keep=a rename=(a=medA)) point=medPos;**** end;****medDif = a - medA;****if last.decileY then cc + n;****drop n cc;****run;**

The idea is that for observations lower than the true median, your median is the observation just after the true median and for observations greater than the true median, your median is the observation just before the true median.

PG

PG

All Replies

Solution

09-19-2013
10:22 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to navyblue

09-19-2013 10:22 PM

You can try this :

**proc sort data=have; by x decileY; run;**

** **

**proc sql;****create table c as****select x, decileY, count(a) as n****from have****group by x, decileY;****quit;**

** **

**data want;****merge have c; by x decileY;****medPos = cc + floor((n+1)/2);****if _n_ <= medPos then do;**** medPos = min(cc + n, max(cc + 1, medPos + 1));**** set have(keep=a rename=(a=medA)) point=medPos;**** end;****else do;**** medPos = min(cc + n, max(cc + 1, medPos - 1));**** set have(keep=a rename=(a=medA)) point=medPos;**** end;****medDif = a - medA;****if last.decileY then cc + n;****drop n cc;****run;**

The idea is that for observations lower than the true median, your median is the observation just after the true median and for observations greater than the true median, your median is the observation just before the true median.

PG

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

09-20-2013 02:21 PM

Thank you so much for the great help! Your code works perfectly! I'm so impressed by your prompt feedback and glad I find this supportive community. Happy Friday!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

05-06-2015 09:17 PM

hI

Refer back to the program listed by PG stats

**medPos = cc + floor((n+1)/2);**

**if _n_ <= medPos then do;**

** medPos = min(cc + n, max(cc + 1, medPos + 1));**

** set have(keep=a rename=(a=medA)) point=medPos;**

** end;**

**else do;**

** medPos = min(cc + n, max(cc + 1, medPos - 1));**

** set have(keep=a rename=(a=medA)) point=medPos;**

** end;**

**medDif = a - medA;**

**if last.decileY then cc + n;**

**drop n cc;**

**run;**

**What cc stands for, there is no value or definition in the program. Your help is much appreciated. **

**Cheers**

**LEE**

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to LLCLOUD

05-07-2015 11:15 AM

In the above datastep,** cc** is a variable created automatically by SAS because of the statement

**if last.decileY then cc + n;**

**cc + n; **is a sum statement which creates the accumulator variable **cc**. It is equivalent to

**retain cc 0;**

**cc = sum(cc, n);**

Thus **cc** is initialized with zero and it automatically retains its value across datastep iterations.

PG

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

05-07-2015 08:11 PM

Hi PGStats

I blindly used the code you posted, it is running ok. however, it doesnot produce MedA further Medif for each observation. There are quite large observations without MedA. I am confused. Your thoughts and help is much appreciated.

Cheers

li

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to LLCLOUD

05-08-2015 10:59 AM

It might help if your initial sort also includes variable **a **:

**proc sort data=have; by x decileY a; run;**

If that doesn't solve the problem, please provide a small example.

PG

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to navyblue

09-19-2013 11:19 PM

For each company in a decile/industry group, you want CMED, the "complimentary median" (i.e. median of group members except the company in hand) of variable X.

But of course, you can't directly calculate the CMED the same way you could directly calculate the complimentary mean - after all its a non-parametric measure.

But you are helped by the fact that you're going to get no more than 3 distinct values for CMED over each group. Start out with the MED(X), the "ordinary median" (i.e. including the subject company).

Then all companies with X < MED(X) will have the same CMED (i.e. dropping any of these companies from median calculation is the same as dropping the minimum X). Call this value MEDexMIN(X).

The same principle applies to all companies with X> MED(X). Dropping any of them will produce the same CMED as dropping the maximum X. Call it MEDexMAX(X).

And if the company has X=MED(X)? Then the complimentary median is the mid-point between MEDexMIN(X) and MEDexMAX(X).

Note all of these could have the same value if there are a lot of ties at MED(X)..

Here's a suggested program - I'd consider it a good example of why and how to use the double DOW approach. It assumes your data set HAVE is grouped by decile and industry, and that you have (say) no more than 100 companies in a single decile/industry group:

data want (drop=_;

array _x {100}; /* Establishes variables _x1 ... _x100 */

do _I = 1 by 1 until (last.industry);

set have;

by decile industry nosorted;

_x{_I}=x;

_xmax=max(_xmax,x); if _xmax=x then _imax=_I;

_xmin=min(_xmin,x); if _xmin=x then _imin=_I;

end;

_med=median(of _x;

_x{_Imax}=.; _MEDexMAX=median(of _x; _x{_imax}=_xmax;

_x{_imin}=.; _MEDexMIN=median(of _x; _x{_imin}=_xmin;

do until (last.industry);

set have;

by decile industry notsorted;

if x=. then CMED=.;

else select (sign(x-_med));

when (1) cmed = MEDexMAX;

when (-1) cmed = MEDexMIN;

when (0) cmed = (MEDexMAX+MEDexMIN)/2;

end;

output;

end;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mkeintz

09-20-2013 02:45 PM

Thank you so much for explaining the logic behind the code and providing an alternative solution for my problem. I feel so lucky to get help from you all, this means so much to a beginner. I did try your code but getting error message as follows:

1185 data want (drop=_

1186 array _x (100); /* Establishes variables _x1 ... _x100 */

---

22

ERROR 22-7: Invalid option name 100.

I'll check into this more later when I get a chance. I'm sure your code definitely works perfectly, I just need to find out what I might have typed wrong.

Again, thank you very much!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to navyblue

09-20-2013 04:33 PM

Your DATA statement is missing the semi-colon.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-24-2013 06:36 PM

Good eyes, thanks a lot!