Median from letters or change letters to numbers

Accepted Solution Solved
Reply
Contributor
Posts: 28
Accepted Solution

Median from letters or change letters to numbers

Dear anyone

I have a variable for income in $ called var 1. The incom id divided into three levels: a-c, where a is the lowest and c the highest. I want the median income. Is there a way that SAS can give you the median from a list of letters, like in var 1. If not, does anybody know how to make a new variable where the letters a converted to numbers (as in var 2)

Hope you can help

Sincerely

Anders

IDvar 1var 2
10a1
11b2
12b2
13c3
14a1
15a1
16b2
17c3
18a1
medianb2

Accepted Solutions
Solution
‎10-18-2012 09:15 AM
Super Contributor
Posts: 644

Re: Median from letters or change letters to numbers

Median is a statistical function, it requires numeric values.

If you do not mind the absolute numerical values assigned to the characters then the RANK() function will return the ASCII (or EBCDIC) code for the character as a numeral.  This has the advantage that you can translate the numeric value back to the character code using the BYTE() function.

Otherwise the normal way of approaching this is to create an informat in Proc Format

Proc Format ;

Invalue $income

'a'     =     1

'b'     =     2

'c'     =     3

...

;

Data test ;

Set Income_data ;

var2 = Input (var1, $income.) ;

Run ;

View solution in original post


All Replies
Respected Advisor
Posts: 3,124

Re: Median from letters or change letters to numbers

If you don't have too many of records, try the following:

data have;

input ID var1$;

cards;

10 a

11 b

12 b

13 c

14 a

15 a

16 b

17 c

18 a

;

data _null_;

call symputx('nobs',nobs);

set have nobs=nobs;

stop;

run;

data _null_;

array med(&nobs) _temporary_;

set have end=last;

  med(_n_)=rank(var1);

if last then do;

   m=median(of med(*));

   median=byte(m);

   put median=;

end;

run;

Haikuo

Contributor
Posts: 28

Re: Median from letters or change letters to numbers

Thanks a 10^6 :-)

All the answers were helpful and inspiring. Thank you all for making this such an awsome community.

To be honest, I did not get all the details in this otherwise excellent discussion. I ended up doing the following:

data have;

input ID var1$;

cards;

10 a

11 b

12 b

13 c

14 a

15 a

16 b

17 c

18 a

;

run;

Proc Format ;

invalue categorial

'a'     =     1

'b'     =     2

'c'     =     3

;

Data want ;

Set have;

var2 = Input (var1, categorial.) ;

Run ;

proc means N mean median data=want;

var var2; run;

And I got

The MEANS Procedure

                                  Analysis Variable : var2

                              N            Mean          Median

                              ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

                              9       1.7777778       2.0000000

I hav enow aaplied this to my dataset (>9000 respondents) and it works fine.

Sincerely

Anders

Solution
‎10-18-2012 09:15 AM
Super Contributor
Posts: 644

Re: Median from letters or change letters to numbers

Median is a statistical function, it requires numeric values.

If you do not mind the absolute numerical values assigned to the characters then the RANK() function will return the ASCII (or EBCDIC) code for the character as a numeral.  This has the advantage that you can translate the numeric value back to the character code using the BYTE() function.

Otherwise the normal way of approaching this is to create an informat in Proc Format

Proc Format ;

Invalue $income

'a'     =     1

'b'     =     2

'c'     =     3

...

;

Data test ;

Set Income_data ;

var2 = Input (var1, $income.) ;

Run ;

Contributor
Posts: 28

Re: Median from letters or change letters to numbers

Thanks a 10^6 :-)

All the answers were helpful and inspiring. Thank you all for making this such an awsome community.

To be honest, I did not get all the details in this otherwise excellent discussion. I ended up doing the following:

data have;

input ID var1$;

cards;

10 a

11 b

12 b

13 c

14 a

15 a

16 b

17 c

18 a

;

run;

Proc Format ;

invalue categorial

'a'     =     1

'b'     =     2

'c'     =     3

;

Data want ;

Set have;

var2 = Input (var1, categorial.) ;

Run ;

proc means N mean median data=want;

var var2; run;

And I got

The MEANS Procedure

                                  Analysis Variable : var2

                              N            Mean          Median

                              ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

                              9       1.7777778       2.0000000

I hav enow aaplied this to my dataset (>9000 respondents) and it works fine.

Sincerely

Anders

Super Contributor
Posts: 474

Re: Median from letters or change letters to numbers

Unfortunately MEDIAN summary function does not exist for SAS SQL, which is rubish, so:

proc sql noprint;

create table m as

select median(rank(var1)) as m from have;

quit;


Won't produce the expected results, which is aggregate data and compute the statistic value.

Instead it will operate at row-level, which is completely wrong, since by standard, a single parameter statistic function is considered to be a summary function of the whole select. That is what differentiates statistic function from row-level/column-level operation.

That being said, we are left with just to ways to calculate the aggregate median through the entire dataset, being with datastep (hai.kuo example) or using the statistical proceadure MEANS (or SUMMARY):

proc sql noprint;

create view _have as select rank(var1) as var1 from have;

quit;

proc means data=_have median missing;

var var1;

run;


SQL is still needed to transform the alpha char into a numeric value, you may use the rank function to do the translation or code the value to numeric through a format (RichardinOz example). Here the value is printed in the results window, but you can point that to a SAS dataset with the OUTPUT option. And that's pretty much it.

More on the MEANS procedure here:

http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a000146729.htm


Cheers from Portugal!

Daniel Santos @ www.cgd.pt

Contributor
Posts: 28

Re: Median from letters or change letters to numbers

Thanks a 10^6 :-)

All the answers were helpful and inspiring. Thank you all for making this such an awsome community.

To be honest, I did not get all the details in this otherwise excellent discussion. I ended up doing the following:

data have;

input ID var1$;

cards;

10 a

11 b

12 b

13 c

14 a

15 a

16 b

17 c

18 a

;

run;

Proc Format ;

invalue categorial

'a'     =     1

'b'     =     2

'c'     =     3

;

Data want ;

Set have;

var2 = Input (var1, categorial.) ;

Run ;

proc means N mean median data=want;

var var2; run;

And I got

The MEANS Procedure

                                  Analysis Variable : var2

                              N            Mean          Median

                              ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

                              9       1.7777778       2.0000000

I hav enow aaplied this to my dataset (>9000 respondents) and it works fine.

Sincerely

Anders

Respected Advisor
Posts: 3,124

Re: Median from letters or change letters to numbers

Like Daniel had pointed out, Median() is not considered as aggregating function ACROSS rows by proc sql. That being said, if you really want to use median() in Proc sql, you need to put all  values of interest into ONE single row, and btw, it does not support variable list (it does in datastep), so Macro variable is needed.

data have;

input ID var1$;

r_var1=rank(var1);

cards;

10 a

11 b

12 b

13 c

14 a

15 a

16 b

17 c

18 a

;

proc transpose data=have out=want;

var r_var1;

run;

proc sql;

   select name into :names separated by ',' from dictionary.columns where libname='WORK' AND MEMNAME='WANT'

   AND UPCASE(SUBSTR(NAME,1,3))='COL';

  select byte(median(&names)) as median from want;

QUIT;

Haikuo

Trusted Advisor
Posts: 2,113

Re: Median from letters or change letters to numbers

One caution in all this.

Anders asked about transforming a letter code to a number and then getting a median on it.  Elegant and useful solutions have followed.  However that may not answer the underlying question of "what is the median income".  It will get at 'what is the median income RANK.'  Since the median is the 50th percentile, you can easily get that without data transformation using PROC FREQ and looking at the cumulative percentage.

Statistically, he may get a more informative estimate by spreading the data across each rank according to some known distribution of incomes, with an estimate for each observation, and then taking the median of that distribution across this sample.  That could be more accurate for a simple median.  It will not be helpful if you want to later use the income as a predictor or outcome in some model; that would still require the coarser grouping approach.

Doc Muhlbaier

Duke

Contributor
Posts: 28

Re: Median from letters or change letters to numbers

Thanks a 10^6 :-)

All the answers were helpful and inspiring. Thank you all for making this such an awsome community.

To be honest, I did not get all the details in this otherwise excellent discussion. I ended up doing the following:

data have;

input ID var1$;

cards;

10 a

11 b

12 b

13 c

14 a

15 a

16 b

17 c

18 a

;

run;

Proc Format ;

invalue categorial

'a'     =     1

'b'     =     2

'c'     =     3

;

Data want ;

Set have;

var2 = Input (var1, categorial.) ;

Run ;

proc means N mean median data=want;

var var2; run;

And I got

The MEANS Procedure

                                  Analysis Variable : var2

                              N            Mean          Median

                              ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

                              9       1.7777778       2.0000000

I hav enow aaplied this to my dataset (>9000 respondents) and it works fine.

Sincerely

Anders

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 1429 views
  • 6 likes
  • 5 in conversation