## Compare and keep character observations that are the same

Occasional Contributor
Posts: 16

# Compare and keep character observations that are the same

I have the following data set:

`Date ID Company  Jan05 1 Coca-Cola  Jan05 2 Coca-Cola  Jan05 3 Coca-Cola  Jan05 4 Apple  Jan05 5 Apple  Jan05 6 AppleJan05 7 Microsoft  Feb05 1 McDonald  Feb05 2 McDonald  Feb05 3 McDonaldFeb05 4 McDonald  Feb05 5 McDonald  Feb05 6 Microsoft  . . .Jan06 1 Apple  Jan06 2 Apple  Jan06 3 Apple  Jan06 4 Apple  Jan06 5 Apple  Jan06 6 AppleJan06 7 Apple  Feb06 1 McDonald  Feb06 2 McDonald  Feb06 3 McDonaldFeb06 4 McDonald  Feb06 5 McDonald  Feb06 6 Lenova  Feb06 7 Lenova  . .Jan07 1 Apple  Jan07 2 Apple  Jan07 3 Apple  Jan07 4 Microsoft  Jan07 5 Lenovo  Jan07 6 AppleJan07 7 Apple  Feb07 1 TJmax  Feb07 2 TJMax  Feb07 3 TJMaxFeb07 4 TJMax  Feb07 5 TJMax  Feb07 6 TJMax  Feb07 7 TJMax  ....until July15`

What I want to do are the following: 1: Compare January 05 with January 06, then January 06 with January 07...February 05 with February 06, February 06 with February 07....so on for each month get compute a median for ID when the same companies are present for both dates. 2: I don't want a new dataset each time I compute a median for ID. I merely want to make sure that both companies are present for lets say in Jan05 and Jan06, then compute a median for ID.

Whats the best way to do this in SAS?

My end result will look like this:

`Date Median_ID  Jan05 2  Jan06 4Jan06 4  Jan07 3Feb05 3  Feb06 3Feb06 0Feb07 0`

As you can see from the result: In Jan05 and 06, the only company that matches is Apple. In Jan06 and Jan07, the only company that matches again is Apple. So we take the median of ID for the time the companies match.

Super User
Posts: 8,127

## Re: Compare and keep character observations that are the same

Median of what variable?  A median of an ID variable does not make any sense.

It almost looks like you are grouping the records based on a change in DATE or COMPANY.

data want ;

set have;

by date company NOTSORTED ;

if first.company then groupno+1;

run;

proc means median ;

by groupno date company ;

var id;

run;

Occasional Contributor
Posts: 16

## Re: Compare and keep character observations that are the same

Hi Tom,

Sorry for the confusion. Actually, I would like to take the median of the the company if the company exists in both data set. So what I think about doing is first of all do a proc means to do a count for the companies because they are in character format:

PROC MEANS DATA=count noprint;

var date;

output out = countdata N=Count

This gives me a count for each company which then I can take and divide by 2 to get the median. So if McDonald is mentioned 5 times, the count will give me 5 for Mcdonald  and then I can divide it by 2 to get a median. And I can do this for all of the companies.

What's tricky for me is to calculate the growth rate of the median between only those companies that are in the dataset in Jan 05 and 06, then Jan06 and 07...in a way that if there is a company in Jan06 that and one in Jan05 then I calculate the median growth rate, but if the company that is in Jan06 and not Jan07 to ignore that company. So for example if Mcdonald is both in dataset Jan06 and Jan05, then calculate a median growth. But if McDonald is not present in Jan07 but is in Jan06,then ignore it. I can't figure out how to do that for all different months and years across different States. SO basically calculate a 12 month median growth rate for only those companies that are present in the 12 month apart months across different states. Sorry for writing too much but I thought I should explain it and clarify it as much as I can.

Posts: 5,542

## Re: Compare and keep character observations that are the same

You could use the median recently introduced in SAS SQL :

proc sql;

create table medHave as

select date, median(id) as medId, Company

from have

group by date, Company;

create table medPairs as

select

a.date as date1, b.date as date2,

a.medId as medId1, b.medId as medId2,

a.company

from

medHave as a inner join

medHave as b on a.company=b.company and

intnx("YEAR", a.date, 1, "S") = b.date;

quit;

data want;

set medpairs;

date=date1; medId=medId1; output;

date=date2; medId=medId2; output;

format date monyy7.;

keep company date medId;

run;

PG

PG
Discussion stats
• 3 replies
• 205 views
• 0 likes
• 3 in conversation