Dear all,
I have the following data set and would like to count the unique number of industries of a firm.
Original data set:
gvkey | year | sic | sic1 | sic2 |
4839 | 2018 | 3711 | 3711 | 3714 |
4839 | 2018 | 3711 | 6141 | 6411 |
4839 | 2018 | 3711 | 5233 | 5622 |
4839 | 2017 | 3711 | 3711 | 3714 |
4839 | 2017 | 3711 | 3711 | 7379 |
This is the final data I would like to get
gvkey | year | numberofsic |
4839 | 2018 | 6 |
4839 | 2017 | 3 |
Thank you very much~~~~
Hello @Lipty
Yes as @Reeza suggests narrow and long works best for SAS, but FWIW
data have;
input gvkey year sic sic1 sic2;
cards;
4839 2018 3711 3711 3714
4839 2018 3711 6141 6411
4839 2018 3711 5233 5622
4839 2017 3711 3711 3714
4839 2017 3711 3711 7379
;
data want;
do until(last.year);
set have;
by gvkey descending year;
array t sic:;
array j(9999) _temporary_;
do over t;
if t not in j then do;
i=sum(i,1);
j(i)=t;
end;
end;
end;
numberofsic=n(of j(*));
call missing(of j(*));
drop i sic:;
run;
Hello @Lipty
Yes as @Reeza suggests narrow and long works best for SAS, but FWIW
data have;
input gvkey year sic sic1 sic2;
cards;
4839 2018 3711 3711 3714
4839 2018 3711 6141 6411
4839 2018 3711 5233 5622
4839 2017 3711 3711 3714
4839 2017 3711 3711 7379
;
data want;
do until(last.year);
set have;
by gvkey descending year;
array t sic:;
array j(9999) _temporary_;
do over t;
if t not in j then do;
i=sum(i,1);
j(i)=t;
end;
end;
end;
numberofsic=n(of j(*));
call missing(of j(*));
drop i sic:;
run;
Thank you very much @Reeza But my original data has 168344 observations. When I transpose it, the data is wired. : )
Thank you very much, @novinosrin . I will try your method now. Many thanks. : )
But my original data has 168344 observations. When I transpose it, the data is wired. : )
No idea why that's relevant. Long data doesn't matter to SAS and is more efficient overall.
Thank you very much. : ) It works~~~~ ^_^
You are welcome!
Sorry I am feeling bored and dull. So for some entertainment, here is a solution that uses Hash method
data have;
input gvkey year sic sic1 sic2;
cards;
4839 2018 3711 3711 3714
4839 2018 3711 6141 6411
4839 2018 3711 5233 5622
4839 2017 3711 3711 3714
4839 2017 3711 3711 7379
;
data want ;
if _n_=1 then do;
dcl hash H () ;
h.definekey ("_n_") ;
h.definedone () ;
end;
do until(last.year);
set have;
by gvkey descending year;
array t sic:;
do over t;
if not missing(t) then _n_=t;
_rc=h.add();
end;
end;
numberofsic=h.num_items;
h.clear();
drop _: sic:;
run;
Thanks~~~~
I tried both and the results are consistent. : )
I am glad. Some fun that made me feel so stupid in retrospect. Did we realize that I was silly(1st solution) in using
numberofsic=n(of j(*));
where we could have simply used
numberofsic=i;
LOL. That proves I was sleeping. It's very trivial but attention to detail is very important in achieving immaculate code. 🙂
LOL. You are so funny. Thank you very much. I might bother you next time. ^_^
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.