BookmarkSubscribeRSS Feed
mustafghan
Calcite | Level 5

I have the following data set:


Date ID Company 
Jan05 1 Coca-Cola 
Jan05 2 Coca-Cola 
Jan05 3 Coca-Cola 
Jan05 4 Apple 
Jan05 5 Apple 
Jan05 6 Apple
Jan05 7 Microsoft 
Feb05 1 McDonald 
Feb05 2 McDonald 
Feb05 3 McDonald
Feb05 4 McDonald 
Feb05 5 McDonald 
Feb05 6 Microsoft 
.
.
.
Jan06 1 Apple 
Jan06 2 Apple 
Jan06 3 Apple 
Jan06 4 Apple 
Jan06 5 Apple 
Jan06 6 Apple
Jan06 7 Apple 
Feb06 1 McDonald 
Feb06 2 McDonald 
Feb06 3 McDonald
Feb06 4 McDonald 
Feb06 5 McDonald 
Feb06 6 Lenova 
Feb06 7 Lenova 
.
.
Jan07 1 Apple 
Jan07 2 Apple 
Jan07 3 Apple 
Jan07 4 Microsoft 
Jan07 5 Lenovo 
Jan07 6 Apple
Jan07 7 Apple 
Feb07 1 TJmax 
Feb07 2 TJMax 
Feb07 3 TJMax
Feb07 4 TJMax 
Feb07 5 TJMax 
Feb07 6 TJMax 
Feb07 7 TJMax 
.
.
.
.
until July15



What I want to do are the following: 1: Compare January 05 with January 06, then January 06 with January 07...February 05 with February 06, February 06 with February 07....so on for each month get compute a median for ID when the same companies are present for both dates. 2: I don't want a new dataset each time I compute a median for ID. I merely want to make sure that both companies are present for lets say in Jan05 and Jan06, then compute a median for ID.

Whats the best way to do this in SAS?

My end result will look like this:

Date Median_ID 
Jan05 2 
Jan06 4

Jan06 4 
Jan07 3

Feb05 3 
Feb06 3

Feb06 0
Feb07 0

As you can see from the result: In Jan05 and 06, the only company that matches is Apple. In Jan06 and Jan07, the only company that matches again is Apple. So we take the median of ID for the time the companies match.






3 REPLIES 3
Tom
Super User Tom
Super User

Median of what variable?  A median of an ID variable does not make any sense.

It almost looks like you are grouping the records based on a change in DATE or COMPANY.

data want ;

  set have;

  by date company NOTSORTED ;

  if first.company then groupno+1;

run;

proc means median ;

  by groupno date company ;

  var id;

run;

mustafghan
Calcite | Level 5

Hi Tom,

Sorry for the confusion. Actually, I would like to take the median of the the company if the company exists in both data set. So what I think about doing is first of all do a proc means to do a count for the companies because they are in character format:

PROC MEANS DATA=count noprint;

        class advertiser;

        var date;

  output out = countdata N=Count

This gives me a count for each company which then I can take and divide by 2 to get the median. So if McDonald is mentioned 5 times, the count will give me 5 for Mcdonald  and then I can divide it by 2 to get a median. And I can do this for all of the companies.

What's tricky for me is to calculate the growth rate of the median between only those companies that are in the dataset in Jan 05 and 06, then Jan06 and 07...in a way that if there is a company in Jan06 that and one in Jan05 then I calculate the median growth rate, but if the company that is in Jan06 and not Jan07 to ignore that company. So for example if Mcdonald is both in dataset Jan06 and Jan05, then calculate a median growth. But if McDonald is not present in Jan07 but is in Jan06,then ignore it. I can't figure out how to do that for all different months and years across different States. SO basically calculate a 12 month median growth rate for only those companies that are present in the 12 month apart months across different states. Sorry for writing too much but I thought I should explain it and clarify it as much as I can.

PGStats
Opal | Level 21

You could use the median recently introduced in SAS SQL :

proc sql;

create table medHave as

select date, median(id) as medId, Company

from have

group by date, Company;

create table medPairs as

select

    a.date as date1, b.date as date2,

    a.medId as medId1, b.medId as medId2,

    a.company

from

    medHave as a inner join

    medHave as b on a.company=b.company and    

        intnx("YEAR", a.date, 1, "S") = b.date;

quit;

data want;

set medpairs;

date=date1; medId=medId1; output;

date=date2; medId=medId2; output;

format date monyy7.;

keep company date medId;

run;

PG

PG

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 979 views
  • 0 likes
  • 3 in conversation