Thank you very much for taking time to read this post. I have the following data set:
FirmID | year | AnalystCode |
PAMP | 2001 | 1047 |
AMZN | 2001 | 1047 |
NFLX | 2001 | 1047 |
RIM | 2001 | 1047 |
NZKA | 2001 | 1047 |
AMZN | 2001 | 1048 |
NFLX | 2001 | 1048 |
RIM | 2001 | 1048 |
PAMP | 2001 | 1648 |
AMZN | 2001 | 1648 |
NFLX | 2001 | 1648 |
RIM | 2001 | 1648 |
NZKA | 2001 | 1648 |
GTW | 2001 | 967 |
BBRY | 2001 | 967 |
NOK | 2001 | 967 |
AEOK | 2001 | 967 |
BBRY | 2001 | 816 |
NOK | 2001 | 816 |
AEOK | 2001 | 816 |
NOK | 2001 | 1322 |
AEOK | 2001 | 1322 |
GTW | 2002 | 967 |
BBRY | 2002 | 967 |
NOK | 2002 | 967 |
AEOK | 2002 | 967 |
BBRY | 2002 | 816 |
NOK | 2002 | 816 |
AEOK | 2002 | 816 |
NOK | 2002 | 1322 |
AEOK | 2002 | 1322 |
Every year, once the firms have at least 3 same AnalystCode, they will be classified as one group. So my desired table should be like the following one:
FirmID | year | Group |
AMZN | 2001 | 1 |
NFLX | 2001 | 1 |
RIM | 2001 | 1 |
NOK | 2001 | 2 |
AEOK | 2001 | 2 |
NOK | 2002 | 1 |
AEOK | 2002 | 1 |
Can anyone help? Thank you very much. : )
please try
proc sort data=have;
by firmid year;
run;
proc sql;
create table want as select count(year) as cnt,firmid,year from have group by firmid,year having cnt=3;
create table want2 as select count(distinct year) as cnt,firmid from have group by firmid ;
quit;
data all;
merge want(in=a drop=cnt) want2(in=b);
by firmid ;
if a and b;
run;
Can a firm belong to many groups for a given year?
If firm A shares 3 analysts with firm B and 3 other analysts with firm C, but firm B and C have no analysts in common, how are they grouped?
Thank you very much for your time, @PGStats, @Ksharp, @PeterClemmensen , @Jagadishkatam .
@PGStats I checked my original data which includes more than 200,000 observations. The case mentioned by @PGStats does exist. If firm A shares 3 analysts with firm B and 3 other analysts with firm C, but firm B and C have no analysts in common, B and C cannot be made in one group. Only when they have common and mutually same analyst code, and from the same industry, they can be grouped together.
@PGStats, @Ksharp, @PeterClemmensen , @Jagadishkatam Sorry for missing one more piece of information. There is one more variable of "industry". Once the firms have at least 3 same AnalystCode and are from the same industry, they will fall in one group. This can eliminate most of the cases mentioned by @PGStats .
Could you please help me with the additional piece of information. Thanks again for your help.
@Lipty can you update the sample data and the desired result and take this into account?
Sure, will do it after a meeting. : )
Thanks
Try this:
data work.have;
input FirmID $ year AnalystCode;
datalines;
PAMP 2001 1047
AMZN 2001 1047
NFLX 2001 1047
RIM 2001 1047
NZKA 2001 1047
AMZN 2001 1048
NFLX 2001 1048
RIM 2001 1048
PAMP 2001 1648
AMZN 2001 1648
NFLX 2001 1648
RIM 2001 1648
NZKA 2001 1648
GTW 2001 967
BBRY 2001 967
NOK 2001 967
AEOK 2001 967
BBRY 2001 816
NOK 2001 816
AEOK 2001 816
NOK 2001 1322
AEOK 2001 1322
GTW 2002 967
BBRY 2002 967
NOK 2002 967
AEOK 2002 967
BBRY 2002 816
NOK 2002 816
AEOK 2002 816
NOK 2002 1322
AEOK 2002 1322
;
proc sql;
create table links as
select
a.firmId as from, b.firmId as to, a.year
from
have as a inner join have as b on a.year=b.year and a.analystCode=b.analystCode and a.firmId<b.firmId
group by a.firmId, b.firmId, a.year
having count(*) >=3;
quit;
%macro optnet(year);
proc optnet data_links=links graph_direction=undirected out_nodes=work.nodes_&year.;
where year=&year.;
concomp;
run;
%mend;
%optnet(2001);
%optnet(2002);
data want;
set nodes_: indsname=dsn;
year = input(scan(dsn,2,"_"),best.);
rename node=firmId concomp=group;
run;
proc print data=want noobs; run;
I love this kind of question . Put me to the limit .
data work.have;
input FirmID $ year AnalystCode;
flag=1;
datalines;
PAMP 2001 1047
AMZN 2001 1047
NFLX 2001 1047
RIM 2001 1047
NZKA 2001 1047
AMZN 2001 1048
NFLX 2001 1048
RIM 2001 1048
PAMP 2001 1648
AMZN 2001 1648
NFLX 2001 1648
RIM 2001 1648
NZKA 2001 1648
GTW 2001 967
BBRY 2001 967
NOK 2001 967
AEOK 2001 967
BBRY 2001 816
NOK 2001 816
AEOK 2001 816
NOK 2001 1322
AEOK 2001 1322
GTW 2002 967
BBRY 2002 967
NOK 2002 967
AEOK 2002 967
BBRY 2002 816
NOK 2002 816
AEOK 2002 816
NOK 2002 1322
AEOK 2002 1322
;
proc sort data=have out=temp;
by year AnalystCode;
run;
proc transpose data=temp out=temp1(drop=_name_);
by year AnalystCode;
id FirmID;
var flag;
run;
proc stdize data=temp1 out=temp2 missing=0 reponly;
run;
proc sql noprint;
select distinct cats('_',FirmID) into : Firm separated by ' ' from have;
select distinct cats('_',FirmID,'=cats(_',FirmId,',',FirmID,')')
into : concat separated by ';'
from have;
quit;
data temp3;
length &Firm $ 200;
do until(last.year);
set temp2;
by year;
&concat ;
end;
keep year _: ;
run;
data temp4;
set temp3;
array x{*} $ _: ;
do i=1 to dim(x)-1;
if countc(x{i},'1')>2 then do;
do j=i+1 to dim(x);
if x{i}=x{j} then do;
value=x{i};
vname=vname(x{i});output;
vname=vname(x{j});output;
end;
end;
end;
end;
keep year value vname;
run;
proc sort data=temp4 out=temp5 nodupkey;
by year value vname ;
run;
data want;
set temp5;
by year value;
if first.year then group=0;
group+first.value;
run;
data want;
declare hash HoH(ordered:'Y');
HoH.definekey ('year', 'FirmID');
HoH.definedata('h', 'FirmID', 'year');
HoH.definedone();
declare hiter HoHiter("HoH");
declare hash g();
g.definekey('year');
g.definedata('Group');
g.definedone();
declare hash h;
do until (lr);
set have end=lr;
if g.check() ne 0 then do;
Group+1;g.add();
end;
if HoH.find() ne 0 then do;
h=_new_ hash();
h.definekey('FirmID', 'year', 'AnalystCode');
h.definedone();
HoH.add();
end;
h.replace();
end;
do while(HoHiter.next() = 0);
if h.num_items ge 3 then do;
rc=g.find();
output;
end;
end;
keep FirmID year Group;
run;
Hi all,
Sorry for missing one piece of information. I add one more variable "IndustryCode" to the sample data. @PeterClemmensen @Ksharp @PGStats @Jagadishkatam
FirmID | year | AnalystCode | IndustryCode |
PAMP | 2001 | 1047 | 36 |
AMZN | 2001 | 1047 | 36 |
NFLX | 2001 | 1047 | 37 |
RIM | 2001 | 1047 | 36 |
NZKA | 2001 | 1047 | 36 |
AMZN | 2001 | 1048 | 36 |
NFLX | 2001 | 1048 | 37 |
RIM | 2001 | 1048 | 36 |
PAMP | 2001 | 1648 | 36 |
AMZN | 2001 | 1648 | 36 |
NFLX | 2001 | 1648 | 37 |
RIM | 2001 | 1648 | 36 |
NZKA | 2001 | 1648 | 36 |
GTW | 2001 | 967 | 23 |
BBRY | 2001 | 967 | 23 |
NOK | 2001 | 967 | 23 |
AEOK | 2001 | 967 | 23 |
BBRY | 2001 | 816 | 23 |
NOK | 2001 | 816 | 23 |
AEOK | 2001 | 816 | 23 |
NOK | 2001 | 1322 | 23 |
AEOK | 2001 | 1322 | 23 |
GTW | 2002 | 967 | 23 |
BBRY | 2002 | 967 | 23 |
NOK | 2002 | 967 | 23 |
AEOK | 2002 | 967 | 23 |
BBRY | 2002 | 816 | 23 |
NOK | 2002 | 816 | 23 |
AEOK | 2002 | 816 | 23 |
NOK | 2002 | 1322 | 23 |
AEOK | 2002 | 1322 | 23 |
Once the firms have at least 3 same AnalystCode and are from the same industry (IndustryCode), they will fall in one group. I desired table is as follows:
FirmID | year | Group | IndstutryCode |
AMZN | 2001 | 1 | 36 |
RIM | 2001 | 1 | 36 |
NOK | 2001 | 2 | 23 |
AEOK | 2001 | 2 | 23 |
NOK | 2002 | 1 | 23 |
AEOK | 2002 | 1 | 23 |
Thank you very much for your time.
@Lipty , please try the below code , it is generating the expected output
data have;
input FirmID$ year AnalystCode IndustryCode;
cards;
PAMP 2001 1047 36
AMZN 2001 1047 36
NFLX 2001 1047 37
RIM 2001 1047 36
NZKA 2001 1047 36
AMZN 2001 1048 36
NFLX 2001 1048 37
RIM 2001 1048 36
PAMP 2001 1648 36
AMZN 2001 1648 36
NFLX 2001 1648 37
RIM 2001 1648 36
NZKA 2001 1648 36
GTW 2001 967 23
BBRY 2001 967 23
NOK 2001 967 23
AEOK 2001 967 23
BBRY 2001 816 23
NOK 2001 816 23
AEOK 2001 816 23
NOK 2001 1322 23
AEOK 2001 1322 23
GTW 2002 967 23
BBRY 2002 967 23
NOK 2002 967 23
AEOK 2002 967 23
BBRY 2002 816 23
NOK 2002 816 23
AEOK 2002 816 23
NOK 2002 1322 23
AEOK 2002 1322 23
;
proc sort data=have;
by firmid year;
run;
proc sql;
create table want as select count(year) as cnt,firmid,year,IndustryCode from have group by firmid,year,IndustryCode having cnt=3;
create table want2 as select count(distinct year) as cnt,firmid from have group by firmid ;
quit;
data all;
merge want(in=a drop=cnt) want2(in=b);
by firmid ;
if a and b;
run;
Consider the A-B-C case described before with links A-B and A-C but no B-C link. My previous code would put all three firms A-B-C in the same group. But if firms without a link cannot belong to the same group then there must be two groups, A-B and A-C. This new code does that. In graph theory terms, the previous code found connected components, the new one finds cliques.
data work.have;
input FirmID $ year AnalystCode IndustryCode;
datalines;
PAMP 2001 1047 36
AMZN 2001 1047 36
NFLX 2001 1047 37
RIM 2001 1047 36
NZKA 2001 1047 36
AMZN 2001 1048 36
NFLX 2001 1048 37
RIM 2001 1048 36
PAMP 2001 1648 36
AMZN 2001 1648 36
NFLX 2001 1648 37
RIM 2001 1648 36
NZKA 2001 1648 36
GTW 2001 967 23
BBRY 2001 967 23
NOK 2001 967 23
AEOK 2001 967 23
BBRY 2001 816 23
NOK 2001 816 23
AEOK 2001 816 23
NOK 2001 1322 23
AEOK 2001 1322 23
GTW 2002 967 23
BBRY 2002 967 23
NOK 2002 967 23
AEOK 2002 967 23
BBRY 2002 816 23
NOK 2002 816 23
AEOK 2002 816 23
NOK 2002 1322 23
AEOK 2002 1322 23
;
proc sql;
create table links as
select
a.firmId as from, b.firmId as to, a.year, a.IndustryCode
from
have as a inner join have as b on a.year=b.year and
a.analystCode=b.analystCode and a.IndustryCode=b.IndustryCode and a.firmId<b.firmId
group by a.firmId, b.firmId, a.year, a.IndustryCode
having count(*) >=3;
quit;
%macro optnet(year);
proc optnet data_links=links graph_direction=undirected;
where year=&year.;
clique out=work.nodes_&year.;
run;
%mend;
%optnet(2001);
%optnet(2002);
data want;
set nodes_: indsname=dsn;
year = input(scan(dsn,2,"_"),best.);
rename node=firmId clique=group;
run;
proc print data=want noobs; run;
Here is a more efficient solution than my previous try:
proc sql;
create table links as
select
catx("_",a.year,a.IndustryCode,a.firmId) as from,
catx("_",a.year,a.IndustryCode,b.firmId) as to
from
have as a inner join
have as b on a.year=b.year and
a.analystCode=b.analystCode and a.IndustryCode=b.IndustryCode and a.firmId<b.firmId
group by a.firmId, b.firmId, a.year, a.IndustryCode
having count(*) >=3;
quit;
proc optnet data_links=links graph_direction=undirected;
clique out=work.nodes;
run;
data want;
set nodes;
length firmId $4;
year = input(scan(node, 1, "_"),best.);
IndustryCode = input(scan(node, 2, "_"), best.);
firmId = scan(node, 3, "_");
rename clique=group;
drop node;
run;
proc print data=want noobs; run;
group firmId year IndustryCode 1 AEOK 2001 23 1 NOK 2001 23 2 AEOK 2002 23 2 NOK 2002 23 3 AMZN 2001 36 3 RIM 2001 36
Thank you very much, @PGStats
I will try it. The desired table is a little different from what I have listed. 🙂
FirmID | year | Group | IndstutryCode |
AMZN | 2001 | 1 | 36 |
RIM | 2001 | 1 | 36 |
NOK | 2001 | 2 | 23 |
AEOK | 2001 | 2 | 23 |
NOK | 2002 | 1 | 23 |
AEOK | 2002 | 1 | 23 |
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.