Dear all,
I have the following data set and would like to count the unique number of industries of a firm.
Original data set:
gvkey | year | sic | sic1 | sic2 |
4839 | 2018 | 3711 | 3711 | 3714 |
4839 | 2018 | 3711 | 6141 | 6411 |
4839 | 2018 | 3711 | 5233 | 5622 |
4839 | 2017 | 3711 | 3711 | 3714 |
4839 | 2017 | 3711 | 3711 | 7379 |
This is the final data I would like to get
gvkey | year | numberofsic |
4839 | 2018 | 6 |
4839 | 2017 | 3 |
Thank you very much~~~~
Hello @Lipty
Yes as @Reeza suggests narrow and long works best for SAS, but FWIW
data have;
input gvkey year sic sic1 sic2;
cards;
4839 2018 3711 3711 3714
4839 2018 3711 6141 6411
4839 2018 3711 5233 5622
4839 2017 3711 3711 3714
4839 2017 3711 3711 7379
;
data want;
do until(last.year);
set have;
by gvkey descending year;
array t sic:;
array j(9999) _temporary_;
do over t;
if t not in j then do;
i=sum(i,1);
j(i)=t;
end;
end;
end;
numberofsic=n(of j(*));
call missing(of j(*));
drop i sic:;
run;
Hello @Lipty
Yes as @Reeza suggests narrow and long works best for SAS, but FWIW
data have;
input gvkey year sic sic1 sic2;
cards;
4839 2018 3711 3711 3714
4839 2018 3711 6141 6411
4839 2018 3711 5233 5622
4839 2017 3711 3711 3714
4839 2017 3711 3711 7379
;
data want;
do until(last.year);
set have;
by gvkey descending year;
array t sic:;
array j(9999) _temporary_;
do over t;
if t not in j then do;
i=sum(i,1);
j(i)=t;
end;
end;
end;
numberofsic=n(of j(*));
call missing(of j(*));
drop i sic:;
run;
Thank you very much @Reeza But my original data has 168344 observations. When I transpose it, the data is wired. : )
Thank you very much, @novinosrin . I will try your method now. Many thanks. : )
But my original data has 168344 observations. When I transpose it, the data is wired. : )
No idea why that's relevant. Long data doesn't matter to SAS and is more efficient overall.
Thank you very much. : ) It works~~~~ ^_^
You are welcome!
Sorry I am feeling bored and dull. So for some entertainment, here is a solution that uses Hash method
data have;
input gvkey year sic sic1 sic2;
cards;
4839 2018 3711 3711 3714
4839 2018 3711 6141 6411
4839 2018 3711 5233 5622
4839 2017 3711 3711 3714
4839 2017 3711 3711 7379
;
data want ;
if _n_=1 then do;
dcl hash H () ;
h.definekey ("_n_") ;
h.definedone () ;
end;
do until(last.year);
set have;
by gvkey descending year;
array t sic:;
do over t;
if not missing(t) then _n_=t;
_rc=h.add();
end;
end;
numberofsic=h.num_items;
h.clear();
drop _: sic:;
run;
Thanks~~~~
I tried both and the results are consistent. : )
I am glad. Some fun that made me feel so stupid in retrospect. Did we realize that I was silly(1st solution) in using
numberofsic=n(of j(*));
where we could have simply used
numberofsic=i;
LOL. That proves I was sleeping. It's very trivial but attention to detail is very important in achieving immaculate code. 🙂
LOL. You are so funny. Thank you very much. I might bother you next time. ^_^
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.