Calculate median for character variables

Reply
Occasional Contributor
Posts: 16

Calculate median for character variables

I have the following data set:

Date       Company      State

Jan05     Coca-Cola      TX     

Jan05     Coca-Cola      TX 

Jan05     Coca-Cola      DE     

Jan05     Apple             DE 

Jan05     Apple             DE 

Jan05     Apple             DE

Jan05     Microsoft        DE

Feb05     McDonald      MD 

Feb05     McDonald      MD 

Feb05     McDonald      MD

Feb05     McDonald      MD 

Feb05     McDonald      MD 

Feb05     Microsoft       MD   

Jan06     Apple            DE

Jan06     Apple            DE

Jan06     Apple            DE

Jan06     Apple            DE 

Jan06     Apple            DE 

Jan06     Apple            DE

Jan06     Apple            DE    

Feb06    McDonald      MD 

Feb06    McDonald      MD 

Feb06    McDonald      MD

Feb06    McDonald      MD 

Feb06    McDonald      MD 

Feb06    Lenova          MD 

Feb06    Lenova          MD

Jan07    Apple             DE

Jan07    Apple             DE

Jan07    Apple             DE 

Jan07    Microsoft        DE     

Jan07    Lenovo           DE  

Jan07    Apple             DE

Jan07    Apple             DE

Feb07    TJmax           TX

Feb07    TJMax           TX

Feb07    TJMax           TX

Feb07    TJMax           MD

Feb07    TMax             MD

Feb07    TJMax           MD

Feb07    TJMax           MD

What I am trying to do is calculate a 12 month median growth rate per Company from Jan05 to Jan06, then from Jan06 to Jan07, February05 to February06, then from February06 to February07 for each State and so on for the other dates BUT only if the company is available on both dates. I obviously cannot calculate a median growth rate for a company that doesn't exist on both dates. For example, I can calculate the median growth rate of McDonald from Feb05 to Feb06, and Lenovo from Jan06 to Jan07 for the state of MD. The median number is basically the number of times each Company is mentioned (sort of like a count variable).

What is the best way to do this in SAS?

Super Contributor
Posts: 307

Re: Calculate median for character variables

Can you provide an example of what you want the output to look like?

Occasional Contributor
Posts: 16

Re: Calculate median for character variables

Yes. The output would look like this:

                                                 Date             State          Media_growth

                                                 Feb05          MD             (median of McDon in Feb05/median of McDon in Feb04)-1(100)           

                                                Feb06         MD             (median of Lenovo in Feb06/median of Lenovo in Feb05)-1(100)           

Valued Guide
Posts: 854

Re: Calculate median for character variables

Didn't you say by company by state?  If you run my code is this what you expected?  If it's by company by state I would want to see the company name in the output, no?

Occasional Contributor
Posts: 16

Re: Calculate median for character variables

Yes. I would want to see the state in the output. But not necessarily. I could for example create a seperate dataset for each state and then find out the 12 month growth median only for those companies that are present in the 12 month apart months/year. So for example, median for McDonald from Feb05 to Feb06 and then Lenovo for Feb06 to Feb07, but none for McDonald because it's not present on both Feb06 and Feb07. And I want to do this for each state.

Valued Guide
Posts: 854

Re: Calculate median for character variables

I think this will do it but with a sample dataset this small I'm not sure what it'll look like.  I agree with Fugue that output would have been welcome.  This is my interpretation of what you were asking for.

data have;

infile cards;

informat date monyy5. company $9.;

format date monyy5.;

input Date       Company$      State$;

cards;

Jan05     Coca-Cola      TX   

Jan05     Coca-Cola      TX

Jan05     Coca-Cola      DE   

Jan05     Apple             DE

Jan05     Apple             DE

Jan05     Apple             DE

Jan05     Microsoft        DE

Feb05     McDonald      MD

Feb05     McDonald      MD

Feb05     McDonald      MD

Feb05     McDonald      MD

Feb05     McDonald      MD

Feb05     Microsoft       MD 

Jan06     Apple            DE

Jan06     Apple            DE

Jan06     Apple            DE

Jan06     Apple            DE

Jan06     Apple            DE

Jan06     Apple            DE

Jan06     Apple            DE  

Feb06    McDonald      MD

Feb06    McDonald      MD

Feb06    McDonald      MD

Feb06    McDonald      MD

Feb06    McDonald      MD

Feb06    Lenova          MD

Feb06    Lenova          MD

Jan07    Apple             DE

Jan07    Apple             DE

Jan07    Apple             DE

Jan07    Microsoft        DE   

Jan07    Lenovo           DE

Jan07    Apple             DE

Jan07    Apple             DE

Feb07    TJmax           TX

Feb07    TJMax           TX

Feb07    TJMax           TX

Feb07    TJMax           MD

Feb07    TMax             MD

Feb07    TJMax           MD

Feb07    TJMax           MD

;

proc sql;

create table prep1 as

select distinct *,count(company) as count,catx('_',company,state) as comp_state

from have

group by date,company,state

order by comp_state,date;

data prep2;

set prep1;

by comp_state date;

l_count = lag(count);

if not first.comp_state then diff = count-l_count;

run;

proc sql;

create table want as

select date,company,state,avg(diff) as Median_Growth

from prep2

group by comp_state

order by date,state;

Ask a Question
Discussion stats
  • 5 replies
  • 284 views
  • 0 likes
  • 3 in conversation