turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Median from letters or change letters to numbers

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-18-2012 08:51 AM

Dear anyone

I have a variable for income in $ called var 1. The incom id divided into three levels: a-c, where a is the lowest and c the highest. I want the median income. Is there a way that SAS can give you the median from a list of letters, like in var 1. If not, does anybody know how to make a new variable where the letters a converted to numbers (as in var 2)

Hope you can help

Sincerely

Anders

ID | var 1 | var 2 |

10 | a | 1 |

11 | b | 2 |

12 | b | 2 |

13 | c | 3 |

14 | a | 1 |

15 | a | 1 |

16 | b | 2 |

17 | c | 3 |

18 | a | 1 |

median | b | 2 |

Accepted Solutions

Solution

10-18-2012
09:15 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-18-2012 09:15 AM

Median is a statistical function, it requires numeric values.

If you do not mind the absolute numerical values assigned to the characters then the RANK() function will return the ASCII (or EBCDIC) code for the character as a numeral. This has the advantage that you can translate the numeric value back to the character code using the BYTE() function.

Otherwise the normal way of approaching this is to create an informat in Proc Format

Proc Format ;

Invalue $income

'a' = 1

'b' = 2

'c' = 3

...

;

Data test ;

Set Income_data ;

var2 = Input (var1, $income.) ;

Run ;

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-18-2012 09:15 AM

If you don't have too many of records, try the following:

**data** have;

input ID var1$;

cards;

10 a

11 b

12 b

13 c

14 a

15 a

16 b

17 c

18 a

;

**data** _null_;

call symputx('nobs',nobs);

set have nobs=nobs;

stop;

**run**;

**data** _null_;

array med(&nobs) _temporary_;

set have end=last;

med(_n_)=rank(var1);

if last then do;

m=median(of med(*));

median=byte(m);

put median=;

end;

**run**;

Haikuo

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-19-2012 08:03 AM

Thanks a 10^6 :-)

All the answers were helpful and inspiring. Thank you all for making this such an awsome community.

To be honest, I did not get all the details in this otherwise excellent discussion. I ended up doing the following:

data have;

input ID var1$;

cards;

10 a

11 b

12 b

13 c

14 a

15 a

16 b

17 c

18 a

;

run;

Proc Format ;

invalue categorial

'a' = 1

'b' = 2

'c' = 3

;

Data want ;

Set have;

var2 = Input (var1, categorial.) ;

Run ;

proc means N mean median data=want;

var var2; run;

And I got

The MEANS Procedure

Analysis Variable : var2

N Mean Median

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

9 1.7777778 2.0000000

I hav enow aaplied this to my dataset (>9000 respondents) and it works fine.

Sincerely

Anders

Solution

10-18-2012
09:15 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-18-2012 09:15 AM

Median is a statistical function, it requires numeric values.

If you do not mind the absolute numerical values assigned to the characters then the RANK() function will return the ASCII (or EBCDIC) code for the character as a numeral. This has the advantage that you can translate the numeric value back to the character code using the BYTE() function.

Otherwise the normal way of approaching this is to create an informat in Proc Format

Proc Format ;

Invalue $income

'a' = 1

'b' = 2

'c' = 3

...

;

Data test ;

Set Income_data ;

var2 = Input (var1, $income.) ;

Run ;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-19-2012 08:02 AM

Thanks a 10^6 :-)

All the answers were helpful and inspiring. Thank you all for making this such an awsome community.

To be honest, I did not get all the details in this otherwise excellent discussion. I ended up doing the following:

data have;

input ID var1$;

cards;

10 a

11 b

12 b

13 c

14 a

15 a

16 b

17 c

18 a

;

run;

Proc Format ;

invalue categorial

'a' = 1

'b' = 2

'c' = 3

;

Data want ;

Set have;

var2 = Input (var1, categorial.) ;

Run ;

proc means N mean median data=want;

var var2; run;

And I got

The MEANS Procedure

Analysis Variable : var2

N Mean Median

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

9 1.7777778 2.0000000

I hav enow aaplied this to my dataset (>9000 respondents) and it works fine.

Sincerely

Anders

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-18-2012 10:27 AM

Unfortunately MEDIAN summary function does not exist for SAS SQL, which is rubish, so:

proc sql noprint;

create table m as

select median(rank(var1)) as m from have;

quit;

Won't produce the expected results, which is aggregate data and compute the statistic value.

Instead it will operate at row-level, which is completely wrong, since by standard, a single parameter statistic function is considered to be a summary function of the whole select. That is what differentiates statistic function from row-level/column-level operation.

That being said, we are left with just to ways to calculate the aggregate median through the entire dataset, being with datastep (hai.kuo example) or using the statistical proceadure MEANS (or SUMMARY):

**proc** **sql** noprint;

create view _have as select rank(var1) as var1 from have;

**quit**;

**proc** **means** data=_have median missing;

var var1;

**run**;

SQL is still needed to transform the alpha char into a numeric value, you may use the rank function to do the translation or code the value to numeric through a format (RichardinOz example). Here the value is printed in the results window, but you can point that to a SAS dataset with the OUTPUT option. And that's pretty much it.

More on the MEANS procedure here:

http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a000146729.htm

Cheers from Portugal!

Daniel Santos @ www.cgd.pt

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-19-2012 08:03 AM

Thanks a 10^6 :-)

All the answers were helpful and inspiring. Thank you all for making this such an awsome community.

To be honest, I did not get all the details in this otherwise excellent discussion. I ended up doing the following:

data have;

input ID var1$;

cards;

10 a

11 b

12 b

13 c

14 a

15 a

16 b

17 c

18 a

;

run;

Proc Format ;

invalue categorial

'a' = 1

'b' = 2

'c' = 3

;

Data want ;

Set have;

var2 = Input (var1, categorial.) ;

Run ;

proc means N mean median data=want;

var var2; run;

And I got

The MEANS Procedure

Analysis Variable : var2

N Mean Median

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

9 1.7777778 2.0000000

I hav enow aaplied this to my dataset (>9000 respondents) and it works fine.

Sincerely

Anders

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-18-2012 11:11 AM

Like Daniel had pointed out, Median() is not considered as aggregating function ACROSS rows by proc sql. That being said, if you really want to use median() in Proc sql, you need to put all values of interest into ONE single row, and btw, it does not support variable list (it does in datastep), so Macro variable is needed.

**data** have;

input ID var1$;

r_var1=rank(var1);

cards;

10 a

11 b

12 b

13 c

14 a

15 a

16 b

17 c

18 a

;

**proc** **transpose** data=have out=want;

var r_var1;

**run**;

**proc** **sql**;

select name into :names separated by ',' from dictionary.columns where libname='WORK' AND MEMNAME='WANT'

AND UPCASE(SUBSTR(NAME,**1**,**3**))='COL';

select byte(median(&names)) as median from want;

**QUIT**;

Haikuo

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-18-2012 01:38 PM

One caution in all this.

Anders asked about transforming a letter code to a number and then getting a median on it. Elegant and useful solutions have followed. However that may not answer the underlying question of "what is the median income". It will get at 'what is the median income RANK.' Since the median is the 50th percentile, you can easily get that without data transformation using PROC FREQ and looking at the cumulative percentage.

Statistically, he may get a more informative estimate by spreading the data across each rank according to some known distribution of incomes, with an estimate for each observation, and then taking the median of that distribution across this sample. That could be more accurate for a simple median. It will not be helpful if you want to later use the income as a predictor or outcome in some model; that would still require the coarser grouping approach.

Doc Muhlbaier

Duke

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-19-2012 08:03 AM

Thanks a 10^6 :-)

All the answers were helpful and inspiring. Thank you all for making this such an awsome community.

data have;

input ID var1$;

cards;

10 a

11 b

12 b

13 c

14 a

15 a

16 b

17 c

18 a

;

run;

Proc Format ;

invalue categorial

'a' = 1

'b' = 2

'c' = 3

;

Data want ;

Set have;

var2 = Input (var1, categorial.) ;

Run ;

proc means N mean median data=want;

var var2; run;

And I got

The MEANS Procedure

Analysis Variable : var2

N Mean Median

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

9 1.7777778 2.0000000

I hav enow aaplied this to my dataset (>9000 respondents) and it works fine.

Sincerely

Anders