Re: How to make peer groups based on mutually same analyst code

Lipty · Posted 09-11-2019 04:54 PM

Thank you very much for taking time to read this post. I have the following data set:

FirmID	year	AnalystCode
PAMP	2001	1047
AMZN	2001	1047
NFLX	2001	1047
RIM	2001	1047
NZKA	2001	1047
AMZN	2001	1048
NFLX	2001	1048
RIM	2001	1048
PAMP	2001	1648
AMZN	2001	1648
NFLX	2001	1648
RIM	2001	1648
NZKA	2001	1648
GTW	2001	967
BBRY	2001	967
NOK	2001	967
AEOK	2001	967
BBRY	2001	816
NOK	2001	816
AEOK	2001	816
NOK	2001	1322
AEOK	2001	1322
GTW	2002	967
BBRY	2002	967
NOK	2002	967
AEOK	2002	967
BBRY	2002	816
NOK	2002	816
AEOK	2002	816
NOK	2002	1322
AEOK	2002	1322

Every year, once the firms have at least 3 same AnalystCode, they will be classified as one group. So my desired table should be like the following one:

FirmID	year	Group
AMZN	2001	1
NFLX	2001	1
RIM	2001	1
NOK	2001	2
AEOK	2001	2
NOK	2002	1
AEOK	2002	1

Can anyone help? Thank you very much. : )

Jagadishkatam · Posted 09-11-2019 06:04 PM

please try

proc sort data=have;
by firmid year;
run;

proc sql;
create table want as select count(year) as cnt,firmid,year from have group by firmid,year having cnt=3;
create table want2 as select count(distinct year) as cnt,firmid from have group by firmid ;
quit;

data all;
merge want(in=a drop=cnt) want2(in=b);
by firmid ;
if a and b;
run;

Thanks,
Jag

PGStats · Posted 09-11-2019 06:10 PM

Can a firm belong to many groups for a given year?

If firm A shares 3 analysts with firm B and 3 other analysts with firm C, but firm B and C have no analysts in common, how are they grouped?

PG

Lipty · Posted 09-12-2019 10:47 AM

Thank you very much for your time, @PGStats, @Ksharp, @PeterClemmensen , @Jagadishkatam .

@PGStats I checked my original data which includes more than 200,000 observations. The case mentioned by @PGStats does exist. If firm A shares 3 analysts with firm B and 3 other analysts with firm C, but firm B and C have no analysts in common, B and C cannot be made in one group. Only when they have common and mutually same analyst code, and from the same industry, they can be grouped together.

@PGStats, @Ksharp, @PeterClemmensen , @Jagadishkatam Sorry for missing one more piece of information. There is one more variable of "industry". Once the firms have at least 3 same AnalystCode and are from the same industry, they will fall in one group. This can eliminate most of the cases mentioned by @PGStats .

Could you please help me with the additional piece of information. Thanks again for your help.

PeterClemmensen · Posted 09-12-2019 10:51 AM

@Lipty can you update the sample data and the desired result and take this into account?

The DATA to DATA Step Macro
Blog: SASnrd

Lipty · Posted 09-12-2019 12:38 PM

Sure, will do it after a meeting. : )

Thanks

PGStats · Posted 09-11-2019 11:30 PM

Try this:

data work.have;
input FirmID $ year AnalystCode;
datalines;
PAMP 2001 1047
AMZN 2001 1047
NFLX 2001 1047
RIM 2001 1047
NZKA 2001 1047
AMZN 2001 1048
NFLX 2001 1048
RIM 2001 1048
PAMP 2001 1648
AMZN 2001 1648
NFLX 2001 1648
RIM 2001 1648
NZKA 2001 1648
GTW 2001 967
BBRY 2001 967
NOK 2001 967
AEOK 2001 967
BBRY 2001 816
NOK 2001 816
AEOK 2001 816
NOK 2001 1322
AEOK 2001 1322
GTW 2002 967
BBRY 2002 967
NOK 2002 967
AEOK 2002 967
BBRY 2002 816
NOK 2002 816
AEOK 2002 816
NOK 2002 1322
AEOK 2002 1322
;

proc sql;
create table links as
select
	a.firmId as from, b.firmId as to, a.year
from
	have as a inner join have as b on a.year=b.year and a.analystCode=b.analystCode and a.firmId<b.firmId
group by a.firmId, b.firmId, a.year
having count(*) >=3;
quit;

%macro optnet(year);
proc optnet data_links=links graph_direction=undirected out_nodes=work.nodes_&year.;
where year=&year.;
concomp;
run;
%mend;

%optnet(2001);
%optnet(2002);

data want;
set nodes_: indsname=dsn;
year = input(scan(dsn,2,"_"),best.);
rename node=firmId concomp=group;
run;

proc print data=want noobs; run;

PG

Ksharp · Posted 09-12-2019 08:12 AM

I love this kind of question . Put me to the limit .

data work.have;
input FirmID $ year AnalystCode;
flag=1;
datalines;
PAMP 2001 1047
AMZN 2001 1047
NFLX 2001 1047
RIM 2001 1047
NZKA 2001 1047
AMZN 2001 1048
NFLX 2001 1048
RIM 2001 1048
PAMP 2001 1648
AMZN 2001 1648
NFLX 2001 1648
RIM 2001 1648
NZKA 2001 1648
GTW 2001 967
BBRY 2001 967
NOK 2001 967
AEOK 2001 967
BBRY 2001 816
NOK 2001 816
AEOK 2001 816
NOK 2001 1322
AEOK 2001 1322
GTW 2002 967
BBRY 2002 967
NOK 2002 967
AEOK 2002 967
BBRY 2002 816
NOK 2002 816
AEOK 2002 816
NOK 2002 1322
AEOK 2002 1322
;
proc sort data=have out=temp;
by year AnalystCode;
run;
proc transpose data=temp out=temp1(drop=_name_);
by year AnalystCode;
id FirmID;
var flag;
run;
proc stdize data=temp1 out=temp2 missing=0 reponly;
run;
proc sql noprint;
select distinct cats('_',FirmID) into : Firm separated by ' ' from have;
select distinct cats('_',FirmID,'=cats(_',FirmId,',',FirmID,')')
  into : concat separated by ';'
 from have;
quit;
data temp3;
length &Firm $ 200;
 do until(last.year);
  set temp2;
  by year;
  &concat ;
 end;
 keep year _: ;
run;
data temp4;
 set temp3;
 array x{*} $ _: ;
 do i=1 to dim(x)-1;
  if countc(x{i},'1')>2 then do;
   do j=i+1 to dim(x);
       if x{i}=x{j}  then do;
	     value=x{i};
         vname=vname(x{i});output;
	     vname=vname(x{j});output;
	   end;
	 end;
   end;
 end;
keep year value vname;
run;
proc sort data=temp4 out=temp5 nodupkey;
by year value vname ;
run;
data want;
 set temp5;
 by year value;
 if first.year then group=0;
 group+first.value;
run;

PeterClemmensen · Posted 09-12-2019 08:36 AM

data want;
    declare hash HoH(ordered:'Y');
    HoH.definekey ('year', 'FirmID');
    HoH.definedata('h', 'FirmID', 'year');
    HoH.definedone();
    declare hiter HoHiter("HoH");

    declare hash g();
    g.definekey('year');
    g.definedata('Group');
    g.definedone();

    declare hash h;

    do until (lr);
      set have end=lr;

      if g.check() ne 0 then do;
          Group+1;g.add();
      end;

      if HoH.find() ne 0 then do;
         h=_new_ hash();
         h.definekey('FirmID', 'year', 'AnalystCode');
         h.definedone();
         HoH.add();
      end;

      h.replace();
    end;

    do while(HoHiter.next() = 0);
        if h.num_items ge 3 then do;
            rc=g.find();
            output;
        end;
    end;

    keep FirmID year Group;
run;

The DATA to DATA Step Macro
Blog: SASnrd

Lipty · Posted 09-12-2019 03:24 PM

Hi all,

Sorry for missing one piece of information. I add one more variable "IndustryCode" to the sample data. @PeterClemmensen @Ksharp @PGStats @Jagadishkatam

FirmID	year	AnalystCode	IndustryCode
PAMP	2001	1047	36
AMZN	2001	1047	36
NFLX	2001	1047	37
RIM	2001	1047	36
NZKA	2001	1047	36
AMZN	2001	1048	36
NFLX	2001	1048	37
RIM	2001	1048	36
PAMP	2001	1648	36
AMZN	2001	1648	36
NFLX	2001	1648	37
RIM	2001	1648	36
NZKA	2001	1648	36
GTW	2001	967	23
BBRY	2001	967	23
NOK	2001	967	23
AEOK	2001	967	23
BBRY	2001	816	23
NOK	2001	816	23
AEOK	2001	816	23
NOK	2001	1322	23
AEOK	2001	1322	23
GTW	2002	967	23
BBRY	2002	967	23
NOK	2002	967	23
AEOK	2002	967	23
BBRY	2002	816	23
NOK	2002	816	23
AEOK	2002	816	23
NOK	2002	1322	23
AEOK	2002	1322	23

Once the firms have at least 3 same AnalystCode and are from the same industry (IndustryCode), they will fall in one group. I desired table is as follows:

FirmID	year	Group	IndstutryCode
AMZN	2001	1	36
RIM	2001	1	36
NOK	2001	2	23
AEOK	2001	2	23
NOK	2002	1	23
AEOK	2002	1	23

Thank you very much for your time.

Jagadishkatam · Posted 09-12-2019 03:54 PM

@Lipty , please try the below code , it is generating the expected output

data have;
input FirmID$ year AnalystCode IndustryCode;
cards;
PAMP 2001 1047 36
AMZN 2001 1047 36
NFLX 2001 1047 37
RIM	 2001 1047 36
NZKA 2001 1047 36
AMZN 2001 1048 36
NFLX 2001 1048 37
RIM	 2001 1048 36
PAMP 2001 1648 36
AMZN 2001 1648 36
NFLX 2001 1648 37
RIM	 2001 1648 36
NZKA 2001 1648 36
GTW	 2001 967  23
BBRY 2001 967  23
NOK	 2001 967  23
AEOK 2001 967  23
BBRY 2001 816  23
NOK	 2001 816  23
AEOK 2001 816  23
NOK	 2001 1322 23
AEOK 2001 1322 23
GTW	 2002 967  23
BBRY 2002 967  23
NOK	 2002 967  23
AEOK 2002 967  23
BBRY 2002 816  23
NOK	 2002 816  23
AEOK 2002 816  23
NOK	 2002 1322 23
AEOK 2002 1322 23
;

proc sort data=have;
by firmid year;
run;

proc sql;
create table want as select count(year) as cnt,firmid,year,IndustryCode from have group by firmid,year,IndustryCode having cnt=3;
create table want2 as select count(distinct year) as cnt,firmid from have group by firmid ;
quit;

data all;
merge want(in=a drop=cnt) want2(in=b);
by firmid ;
if a and b;
run;

Thanks,
Jag

Lipty · Posted 09-13-2019 05:45 PM

Thanks @Jagadishkatam .

I will try it. : )

PGStats · Posted 09-12-2019 06:54 PM

Consider the A-B-C case described before with links A-B and A-C but no B-C link. My previous code would put all three firms A-B-C in the same group. But if firms without a link cannot belong to the same group then there must be two groups, A-B and A-C. This new code does that. In graph theory terms, the previous code found connected components, the new one finds cliques.

data work.have;
input FirmID $ year AnalystCode IndustryCode;
datalines;
PAMP 2001 1047 36
AMZN 2001 1047 36
NFLX 2001 1047 37
RIM 2001 1047 36
NZKA 2001 1047 36
AMZN 2001 1048 36
NFLX 2001 1048 37
RIM 2001 1048 36
PAMP 2001 1648 36
AMZN 2001 1648 36
NFLX 2001 1648 37
RIM 2001 1648 36
NZKA 2001 1648 36
GTW 2001 967 23
BBRY 2001 967 23
NOK 2001 967 23
AEOK 2001 967 23
BBRY 2001 816 23
NOK 2001 816 23
AEOK 2001 816 23
NOK 2001 1322 23
AEOK 2001 1322 23
GTW 2002 967 23
BBRY 2002 967 23
NOK 2002 967 23
AEOK 2002 967 23
BBRY 2002 816 23
NOK 2002 816 23
AEOK 2002 816 23
NOK 2002 1322 23
AEOK 2002 1322 23
;

proc sql;
create table links as
select
	a.firmId as from, b.firmId as to, a.year, a.IndustryCode
from
	have as a inner join have as b on a.year=b.year and 
	a.analystCode=b.analystCode and a.IndustryCode=b.IndustryCode and a.firmId<b.firmId
group by a.firmId, b.firmId, a.year, a.IndustryCode
having count(*) >=3;
quit;

%macro optnet(year);
proc optnet data_links=links graph_direction=undirected;
where year=&year.;
clique out=work.nodes_&year.;
run;
%mend;

%optnet(2001);
%optnet(2002);

data want;
set nodes_: indsname=dsn;
year = input(scan(dsn,2,"_"),best.);
rename node=firmId clique=group;
run;

proc print data=want noobs; run;

PG

PGStats · Posted 09-13-2019 12:16 PM

Here is a more efficient solution than my previous try:

proc sql;
create table links as
select
	catx("_",a.year,a.IndustryCode,a.firmId) as from, 
	catx("_",a.year,a.IndustryCode,b.firmId) as to
from
	have as a inner join 
	have as b on a.year=b.year and 
		a.analystCode=b.analystCode and a.IndustryCode=b.IndustryCode and a.firmId<b.firmId
group by a.firmId, b.firmId, a.year, a.IndustryCode
having count(*) >=3;
quit;

proc optnet data_links=links graph_direction=undirected;
clique out=work.nodes;
run;

data want;
set nodes;
length firmId $4;
year = input(scan(node, 1, "_"),best.);
IndustryCode = input(scan(node, 2, "_"), best.);
firmId = scan(node, 3, "_");
rename clique=group;
drop node;
run;

proc print data=want noobs; run;

group 	firmId 	year 	IndustryCode
1 	AEOK 	2001 	23
1 	NOK 	2001 	23
2 	AEOK 	2002 	23
2 	NOK 	2002 	23
3 	AMZN 	2001 	36
3 	RIM 	2001 	36

PG

Lipty · Posted 09-13-2019 05:42 PM

Thank you very much, @PGStats

I will try it. The desired table is a little different from what I have listed. 🙂

FirmID	year	Group	IndstutryCode
AMZN	2001	1	36
RIM	2001	1	36
NOK	2001	2	23
AEOK	2001	2	23
NOK	2002	1	23
AEOK	2002	1	23

Registration is open

SAS Training: Just a Click Away