BookmarkSubscribeRSS Feed
Lipty
Obsidian | Level 7

Thank you very much for taking time to read this post. I have the following data set:

FirmIDyearAnalystCode
PAMP20011047
AMZN20011047
NFLX20011047
RIM20011047
NZKA20011047
AMZN20011048
NFLX20011048
RIM20011048
PAMP20011648
AMZN20011648
NFLX20011648
RIM20011648
NZKA20011648
GTW2001967
BBRY2001967
NOK2001967
AEOK2001967
BBRY2001816
NOK2001816
AEOK2001816
NOK20011322
AEOK20011322
GTW2002967
BBRY2002967
NOK2002967
AEOK2002967
BBRY2002816
NOK2002816
AEOK2002816
NOK20021322
AEOK20021322

 

Every year, once the firms have at least 3 same AnalystCode, they will be classified as one group. So my desired table should be like the following one:

FirmIDyear Group 
AMZN20011
NFLX20011
RIM20011
NOK20012
AEOK20012
NOK20021
AEOK20021

 

Can anyone help? Thank you very much. : )

22 REPLIES 22
Jagadishkatam
Amethyst | Level 16

please try

 

proc sort data=have;
by firmid year;
run;

proc sql;
create table want as select count(year) as cnt,firmid,year from have group by firmid,year having cnt=3;
create table want2 as select count(distinct year) as cnt,firmid from have group by firmid ;
quit;

data all;
merge want(in=a drop=cnt) want2(in=b);
by firmid ;
if a and b;
run;
Thanks,
Jag
PGStats
Opal | Level 21

Can a firm belong to many groups for a given year?

If firm A shares 3 analysts with firm B and 3 other analysts with firm C, but firm B and C have no analysts in common, how are they grouped?

PG
Lipty
Obsidian | Level 7

 

Thank you very much for your time, @PGStats@Ksharp@PeterClemmensen , @Jagadishkatam . 

 

@PGStats I checked my original data which includes more than 200,000 observations. The case mentioned by @PGStats does exist. If firm A shares 3 analysts with firm B and 3 other analysts with firm C, but firm B and C have no analysts in common, B and C cannot be made in one group. Only when they have common and mutually same analyst code, and from the same industry, they can be grouped together. 

 

@PGStats@Ksharp@PeterClemmensen , @Jagadishkatam  Sorry for missing one more piece of information. There is one more variable of "industry". Once the firms have at least 3 same AnalystCode and are from the same industry, they will fall in one group. This can eliminate most of the cases mentioned by @PGStats . 

 

Could you please help me with the additional piece of information. Thanks again for your help.

PeterClemmensen
Tourmaline | Level 20

@Lipty can you update the sample data and the desired result and take this into account?

Lipty
Obsidian | Level 7

Sure, will do it after a meeting. : )

 

Thanks

PGStats
Opal | Level 21

Try this:

 

data work.have;
input FirmID $ year AnalystCode;
datalines;
PAMP 2001 1047
AMZN 2001 1047
NFLX 2001 1047
RIM 2001 1047
NZKA 2001 1047
AMZN 2001 1048
NFLX 2001 1048
RIM 2001 1048
PAMP 2001 1648
AMZN 2001 1648
NFLX 2001 1648
RIM 2001 1648
NZKA 2001 1648
GTW 2001 967
BBRY 2001 967
NOK 2001 967
AEOK 2001 967
BBRY 2001 816
NOK 2001 816
AEOK 2001 816
NOK 2001 1322
AEOK 2001 1322
GTW 2002 967
BBRY 2002 967
NOK 2002 967
AEOK 2002 967
BBRY 2002 816
NOK 2002 816
AEOK 2002 816
NOK 2002 1322
AEOK 2002 1322
;

proc sql;
create table links as
select
	a.firmId as from, b.firmId as to, a.year
from
	have as a inner join have as b on a.year=b.year and a.analystCode=b.analystCode and a.firmId<b.firmId
group by a.firmId, b.firmId, a.year
having count(*) >=3;
quit;

%macro optnet(year);
proc optnet data_links=links graph_direction=undirected out_nodes=work.nodes_&year.;
where year=&year.;
concomp;
run;
%mend;

%optnet(2001);
%optnet(2002);

data want;
set nodes_: indsname=dsn;
year = input(scan(dsn,2,"_"),best.);
rename node=firmId concomp=group;
run;

proc print data=want noobs; run;
PG
Ksharp
Super User

I love this kind of question . Put me to the limit .

 

data work.have;
input FirmID $ year AnalystCode;
flag=1;
datalines;
PAMP 2001 1047
AMZN 2001 1047
NFLX 2001 1047
RIM 2001 1047
NZKA 2001 1047
AMZN 2001 1048
NFLX 2001 1048
RIM 2001 1048
PAMP 2001 1648
AMZN 2001 1648
NFLX 2001 1648
RIM 2001 1648
NZKA 2001 1648
GTW 2001 967
BBRY 2001 967
NOK 2001 967
AEOK 2001 967
BBRY 2001 816
NOK 2001 816
AEOK 2001 816
NOK 2001 1322
AEOK 2001 1322
GTW 2002 967
BBRY 2002 967
NOK 2002 967
AEOK 2002 967
BBRY 2002 816
NOK 2002 816
AEOK 2002 816
NOK 2002 1322
AEOK 2002 1322
;
proc sort data=have out=temp;
by year AnalystCode;
run;
proc transpose data=temp out=temp1(drop=_name_);
by year AnalystCode;
id FirmID;
var flag;
run;
proc stdize data=temp1 out=temp2 missing=0 reponly;
run;
proc sql noprint;
select distinct cats('_',FirmID) into : Firm separated by ' ' from have;
select distinct cats('_',FirmID,'=cats(_',FirmId,',',FirmID,')')
  into : concat separated by ';'
 from have;
quit;
data temp3;
length &Firm $ 200;
 do until(last.year);
  set temp2;
  by year;
  &concat ;
 end;
 keep year _: ;
run;
data temp4;
 set temp3;
 array x{*} $ _: ;
 do i=1 to dim(x)-1;
  if countc(x{i},'1')>2 then do;
   do j=i+1 to dim(x);
       if x{i}=x{j}  then do;
	     value=x{i};
         vname=vname(x{i});output;
	     vname=vname(x{j});output;
	   end;
	 end;
   end;
 end;
keep year value vname;
run;
proc sort data=temp4 out=temp5 nodupkey;
by year value vname ;
run;
data want;
 set temp5;
 by year value;
 if first.year then group=0;
 group+first.value;
run;
PeterClemmensen
Tourmaline | Level 20
data want;
    declare hash HoH(ordered:'Y');
    HoH.definekey ('year', 'FirmID');
    HoH.definedata('h', 'FirmID', 'year');
    HoH.definedone();
    declare hiter HoHiter("HoH");

    declare hash g();
    g.definekey('year');
    g.definedata('Group');
    g.definedone();

    declare hash h;

    do until (lr);
      set have end=lr;

      if g.check() ne 0 then do;
          Group+1;g.add();
      end;

      if HoH.find() ne 0 then do;
         h=_new_ hash();
         h.definekey('FirmID', 'year', 'AnalystCode');
         h.definedone();
         HoH.add();
      end;

      h.replace();
    end;

    do while(HoHiter.next() = 0);
        if h.num_items ge 3 then do;
            rc=g.find();
            output;
        end;
    end;

    keep FirmID year Group;
run;
Lipty
Obsidian | Level 7

Hi all,

 

Sorry for missing one piece of information. I add one more variable "IndustryCode" to the sample data. @PeterClemmensen @Ksharp @PGStats @Jagadishkatam 

 

FirmIDyearAnalystCodeIndustryCode
PAMP2001104736
AMZN2001104736
NFLX2001104737
RIM2001104736
NZKA2001104736
AMZN2001104836
NFLX2001104837
RIM2001104836
PAMP2001164836
AMZN2001164836
NFLX2001164837
RIM2001164836
NZKA2001164836
GTW200196723
BBRY200196723
NOK200196723
AEOK200196723
BBRY200181623
NOK200181623
AEOK200181623
NOK2001132223
AEOK2001132223
GTW200296723
BBRY200296723
NOK200296723
AEOK200296723
BBRY200281623
NOK200281623
AEOK200281623
NOK2002132223
AEOK2002132223

 

Once the firms have at least 3 same AnalystCode and are from the same industry (IndustryCode), they will fall in one group. I desired table is as follows:

 

FirmIDyear Group IndstutryCode
AMZN2001136
RIM2001136
NOK2001223
AEOK2001223
NOK2002123
AEOK2002123

 

Thank you very much for your time. 

Jagadishkatam
Amethyst | Level 16

@Lipty , please try the below code , it is generating the expected output

 

data have;
input FirmID$ year AnalystCode IndustryCode;
cards;
PAMP 2001 1047 36
AMZN 2001 1047 36
NFLX 2001 1047 37
RIM	 2001 1047 36
NZKA 2001 1047 36
AMZN 2001 1048 36
NFLX 2001 1048 37
RIM	 2001 1048 36
PAMP 2001 1648 36
AMZN 2001 1648 36
NFLX 2001 1648 37
RIM	 2001 1648 36
NZKA 2001 1648 36
GTW	 2001 967  23
BBRY 2001 967  23
NOK	 2001 967  23
AEOK 2001 967  23
BBRY 2001 816  23
NOK	 2001 816  23
AEOK 2001 816  23
NOK	 2001 1322 23
AEOK 2001 1322 23
GTW	 2002 967  23
BBRY 2002 967  23
NOK	 2002 967  23
AEOK 2002 967  23
BBRY 2002 816  23
NOK	 2002 816  23
AEOK 2002 816  23
NOK	 2002 1322 23
AEOK 2002 1322 23
;

proc sort data=have;
by firmid year;
run;

proc sql;
create table want as select count(year) as cnt,firmid,year,IndustryCode from have group by firmid,year,IndustryCode having cnt=3;
create table want2 as select count(distinct year) as cnt,firmid from have group by firmid ;
quit;

data all;
merge want(in=a drop=cnt) want2(in=b);
by firmid ;
if a and b;
run;
Thanks,
Jag
Lipty
Obsidian | Level 7

Thanks @Jagadishkatam .

 

I will try it. : )

PGStats
Opal | Level 21

Consider the A-B-C case described before with links A-B and A-C but no B-C link. My previous code would put all three firms A-B-C in the same group. But if firms without a link cannot belong to the same group then there must be two groups, A-B and A-C. This new code does that. In graph theory terms, the previous code found connected components, the new one finds cliques.

 

data work.have;
input FirmID $ year AnalystCode IndustryCode;
datalines;
PAMP 2001 1047 36
AMZN 2001 1047 36
NFLX 2001 1047 37
RIM 2001 1047 36
NZKA 2001 1047 36
AMZN 2001 1048 36
NFLX 2001 1048 37
RIM 2001 1048 36
PAMP 2001 1648 36
AMZN 2001 1648 36
NFLX 2001 1648 37
RIM 2001 1648 36
NZKA 2001 1648 36
GTW 2001 967 23
BBRY 2001 967 23
NOK 2001 967 23
AEOK 2001 967 23
BBRY 2001 816 23
NOK 2001 816 23
AEOK 2001 816 23
NOK 2001 1322 23
AEOK 2001 1322 23
GTW 2002 967 23
BBRY 2002 967 23
NOK 2002 967 23
AEOK 2002 967 23
BBRY 2002 816 23
NOK 2002 816 23
AEOK 2002 816 23
NOK 2002 1322 23
AEOK 2002 1322 23
;

proc sql;
create table links as
select
	a.firmId as from, b.firmId as to, a.year, a.IndustryCode
from
	have as a inner join have as b on a.year=b.year and 
	a.analystCode=b.analystCode and a.IndustryCode=b.IndustryCode and a.firmId<b.firmId
group by a.firmId, b.firmId, a.year, a.IndustryCode
having count(*) >=3;
quit;

%macro optnet(year);
proc optnet data_links=links graph_direction=undirected;
where year=&year.;
clique out=work.nodes_&year.;
run;
%mend;

%optnet(2001);
%optnet(2002);

data want;
set nodes_: indsname=dsn;
year = input(scan(dsn,2,"_"),best.);
rename node=firmId clique=group;
run;

proc print data=want noobs; run;
PG
PGStats
Opal | Level 21

Here is a more efficient solution than my previous try:

 

proc sql;
create table links as
select
	catx("_",a.year,a.IndustryCode,a.firmId) as from, 
	catx("_",a.year,a.IndustryCode,b.firmId) as to
from
	have as a inner join 
	have as b on a.year=b.year and 
		a.analystCode=b.analystCode and a.IndustryCode=b.IndustryCode and a.firmId<b.firmId
group by a.firmId, b.firmId, a.year, a.IndustryCode
having count(*) >=3;
quit;

proc optnet data_links=links graph_direction=undirected;
clique out=work.nodes;
run;

data want;
set nodes;
length firmId $4;
year = input(scan(node, 1, "_"),best.);
IndustryCode = input(scan(node, 2, "_"), best.);
firmId = scan(node, 3, "_");
rename clique=group;
drop node;
run;

proc print data=want noobs; run;
group 	firmId 	year 	IndustryCode
1 	AEOK 	2001 	23
1 	NOK 	2001 	23
2 	AEOK 	2002 	23
2 	NOK 	2002 	23
3 	AMZN 	2001 	36
3 	RIM 	2001 	36
PG
Lipty
Obsidian | Level 7

Thank you very much, @PGStats 

I will try it. The desired table is a little different from what I have listed. 🙂

 

FirmIDyear Group IndstutryCode
AMZN2001136
RIM2001136
NOK2001223
AEOK2001223
NOK2002123
AEOK2002123

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 22 replies
  • 1415 views
  • 5 likes
  • 5 in conversation