The missing reordering step is left as an exercise !
data want;
declare hash HoH(ordered:'Y');
HoH.definekey ('year', 'FirmID', 'IndustryCode');
HoH.definedata('h', 'FirmID', 'year', 'IndustryCode');
HoH.definedone();
declare hiter HoHiter("HoH");
declare hash g();
g.definekey('year');
g.definedata('Group');
g.definedone();
declare hash h;
do until (lr);
set have end=lr;
if g.check() ne 0 then do;
Group+1;g.add();
end;
if HoH.find() ne 0 then do;
h=_new_ hash();
h.definekey('FirmID', 'year', 'AnalystCode', 'IndustryCode');
h.definedone();
HoH.add();
end;
h.replace();
end;
do while(HoHiter.next() = 0);
if h.num_items ge 3 then do;
rc=g.find();
output;
end;
end;
keep FirmID year Group IndustryCode;
run;
Result:
FirmID year IndustryCode Group AEOK 2001 23 1 AMZN 2001 36 1 NFLX 2001 37 1 NOK 2001 23 1 RIM 2001 36 1 AEOK 2002 23 2 NOK 2002 23 2
Thank you very much, @PeterClemmensen But my desired table is as follows:
FirmID | year | Group | IndstutryCode |
AMZN | 2001 | 1 | 36 |
RIM | 2001 | 1 | 36 |
NOK | 2001 | 2 | 23 |
AEOK | 2001 | 2 | 23 |
NOK | 2002 | 1 | 23 |
AEOK | 2002 | 1 | 23 |
37 should not be included and NOK and AEOK should be group 2. : )
Please explain why this obs
NFLX 2001 37 1
should not be in the desired output?
They should belong to the same industry and followed by the same analyst. : )
I might have missed something here, but what exactly separates NFLX and AMZN so that AMZN is included and NFLX is excluded?
data have;
input FirmID$ year AnalystCode IndustryCode;
flag=1;
cards;
PAMP 2001 1047 36
AMZN 2001 1047 36
NFLX 2001 1047 37
RIM 2001 1047 36
NZKA 2001 1047 36
AMZN 2001 1048 36
NFLX 2001 1048 37
RIM 2001 1048 36
PAMP 2001 1648 36
AMZN 2001 1648 36
NFLX 2001 1648 37
RIM 2001 1648 36
NZKA 2001 1648 36
GTW 2001 967 23
BBRY 2001 967 23
NOK 2001 967 23
AEOK 2001 967 23
BBRY 2001 816 23
NOK 2001 816 23
AEOK 2001 816 23
NOK 2001 1322 23
AEOK 2001 1322 23
GTW 2002 967 23
BBRY 2002 967 23
NOK 2002 967 23
AEOK 2002 967 23
BBRY 2002 816 23
NOK 2002 816 23
AEOK 2002 816 23
NOK 2002 1322 23
AEOK 2002 1322 23
;
proc sort data=have out=temp;
by year IndustryCode AnalystCode;
run;
proc transpose data=temp out=temp1(drop=_name_);
by year IndustryCode AnalystCode;
id FirmID;
var flag;
run;
proc stdize data=temp1 out=temp2 missing=0 reponly;
run;
proc sql noprint;
select distinct cats('_',FirmID) into : Firm separated by ' ' from have;
select distinct cats('_',FirmID,'=cats(_',FirmId,',',FirmID,')')
into : concat separated by ';'
from have;
quit;
data temp3;
length &Firm $ 200;
do until(last.IndustryCode);
set temp2 ;
by year IndustryCode;
&concat ;
end;
keep year IndustryCode _: ;
run;
data temp4;
set temp3;
array x{*} $ _: ;
do i=1 to dim(x)-1;
if countc(x{i},'1')>2 then do;
do j=i+1 to dim(x);
if x{i}=x{j} then do;
value=x{i};
vname=vname(x{i});output;
vname=vname(x{j});output;
end;
end;
end;
end;
keep year IndustryCode value vname;
run;
proc sort data=temp4 out=temp5 nodupkey;
by year IndustryCode value vname ;
run;
data want;
set temp5;
by year IndustryCode;
if first.year then group=0;
group+first.IndustryCode;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.