Compare and keep character observations that are the same

Reply
Occasional Contributor
Posts: 16

Compare and keep character observations that are the same

I have the following data set:


Date ID Company 
Jan05 1 Coca-Cola 
Jan05 2 Coca-Cola 
Jan05 3 Coca-Cola 
Jan05 4 Apple 
Jan05 5 Apple 
Jan05 6 Apple
Jan05 7 Microsoft 
Feb05 1 McDonald 
Feb05 2 McDonald 
Feb05 3 McDonald
Feb05 4 McDonald 
Feb05 5 McDonald 
Feb05 6 Microsoft 
.
.
.
Jan06 1 Apple 
Jan06 2 Apple 
Jan06 3 Apple 
Jan06 4 Apple 
Jan06 5 Apple 
Jan06 6 Apple
Jan06 7 Apple 
Feb06 1 McDonald 
Feb06 2 McDonald 
Feb06 3 McDonald
Feb06 4 McDonald 
Feb06 5 McDonald 
Feb06 6 Lenova 
Feb06 7 Lenova 
.
.
Jan07 1 Apple 
Jan07 2 Apple 
Jan07 3 Apple 
Jan07 4 Microsoft 
Jan07 5 Lenovo 
Jan07 6 Apple
Jan07 7 Apple 
Feb07 1 TJmax 
Feb07 2 TJMax 
Feb07 3 TJMax
Feb07 4 TJMax 
Feb07 5 TJMax 
Feb07 6 TJMax 
Feb07 7 TJMax 
.
.
.
.
until July15



What I want to do are the following: 1: Compare January 05 with January 06, then January 06 with January 07...February 05 with February 06, February 06 with February 07....so on for each month get compute a median for ID when the same companies are present for both dates. 2: I don't want a new dataset each time I compute a median for ID. I merely want to make sure that both companies are present for lets say in Jan05 and Jan06, then compute a median for ID.

Whats the best way to do this in SAS?

My end result will look like this:

Date Median_ID 
Jan05 2 
Jan06 4

Jan06 4 
Jan07 3

Feb05 3 
Feb06 3

Feb06 0
Feb07 0

As you can see from the result: In Jan05 and 06, the only company that matches is Apple. In Jan06 and Jan07, the only company that matches again is Apple. So we take the median of ID for the time the companies match.






Super User
Super User
Posts: 7,074

Re: Compare and keep character observations that are the same

Posted in reply to mustafghan

Median of what variable?  A median of an ID variable does not make any sense.

It almost looks like you are grouping the records based on a change in DATE or COMPANY.

data want ;

  set have;

  by date company NOTSORTED ;

  if first.company then groupno+1;

run;

proc means median ;

  by groupno date company ;

  var id;

run;

Occasional Contributor
Posts: 16

Re: Compare and keep character observations that are the same

Hi Tom,

Sorry for the confusion. Actually, I would like to take the median of the the company if the company exists in both data set. So what I think about doing is first of all do a proc means to do a count for the companies because they are in character format:

PROC MEANS DATA=count noprint;

        class advertiser;

        var date;

  output out = countdata N=Count

This gives me a count for each company which then I can take and divide by 2 to get the median. So if McDonald is mentioned 5 times, the count will give me 5 for Mcdonald  and then I can divide it by 2 to get a median. And I can do this for all of the companies.

What's tricky for me is to calculate the growth rate of the median between only those companies that are in the dataset in Jan 05 and 06, then Jan06 and 07...in a way that if there is a company in Jan06 that and one in Jan05 then I calculate the median growth rate, but if the company that is in Jan06 and not Jan07 to ignore that company. So for example if Mcdonald is both in dataset Jan06 and Jan05, then calculate a median growth. But if McDonald is not present in Jan07 but is in Jan06,then ignore it. I can't figure out how to do that for all different months and years across different States. SO basically calculate a 12 month median growth rate for only those companies that are present in the 12 month apart months across different states. Sorry for writing too much but I thought I should explain it and clarify it as much as I can.

Respected Advisor
Posts: 4,930

Re: Compare and keep character observations that are the same

Posted in reply to mustafghan

You could use the median recently introduced in SAS SQL :

proc sql;

create table medHave as

select date, median(id) as medId, Company

from have

group by date, Company;

create table medPairs as

select

    a.date as date1, b.date as date2,

    a.medId as medId1, b.medId as medId2,

    a.company

from

    medHave as a inner join

    medHave as b on a.company=b.company and    

        intnx("YEAR", a.date, 1, "S") = b.date;

quit;

data want;

set medpairs;

date=date1; medId=medId1; output;

date=date2; medId=medId2; output;

format date monyy7.;

keep company date medId;

run;

PG

PG
Ask a Question
Discussion stats
  • 3 replies
  • 190 views
  • 0 likes
  • 3 in conversation