Dear all,
I have a dataset with the following variables:
gvkey = firm identification code
fyear = financial year (since year 1950)
sic3 = industry classification code based on the first 3-digit
zipcode = US zipcode
I wish to calculate the following variable:
Cluster age, measured as the number of years since the first firm in a given industry (sic3) was founded in the cluster, where
Cluster, is measured as 50 miles radius from the firm (gvkey) (note: 50 miles radius can be calculated by using the zipcode of the firm), for a given financial year (fyear) in a particular industry (sic3).
In order to calculate cluster age, I must first to know the year that the first firm (in the same industry - SIC3) was found within 50 miles radius from the firm (gvkey), which is also in the same industry as the firm (gvkey).
Mathematical formula to calculate cluster age = fyear - firstyear + 1, where firstyear = the year that the first firm (in the same industry - SIC3) was found within 50 miles radius from the firm (gvkey).
I have no idea on how to calculate this variable.
Hope to get any suggested SAS code which can help me to calculate the cluster age efficiently.
Thank you in advance.
Regards,
mspak
You might have a better source for lat/long than sashelp...but this might get you started: (yeah for spherical geometry)
%let pi180=0.0174532925199433;
proc sort in=tmp.testdata out=work.testdata;
by gvkey fyear;
run;
proc sort in=tmp.testdata out=work.testdata_srt nodupkey;
by gvkey;
run;
proc sql;
create table work.testdata_geo as
select
t1.gvkey
,t1.fyear
,t1.sic3
,t1.zipcode
,t2.y as latitude_y
,t2.x as longitude_x
from
work.testdata_srt t1
left outer join sashelp.zipcode t2
on input(t1.zipcode,5.)=t2.zip;
alter table work.testdata_geo add constraint pk_tg primary key (gvkey);
create table work.clusters as
select
t1.sic3
,t1.zipcode
,min(t2.fyear) as Cluster_Start
from
work.testdata_geo t1, work.testdata_geo t2
where
t1.sic3=t2.sic3
and (7921.6623*arsin(sqrt((sin((&pi180*t2.latitude_y - &pi180*t1.latitude_y)/2))**2 +
cos(&pi180*t1.latitude_y)*cos(&pi180*t2.latitude_y)*(sin((&pi180*t2.longitude_x - &pi180*t1.longitude_x)/2))**2))) < 50
group by t1.sic3, t1.zipcode;
create table want as
select
t1.*
,t2.cluster_start
,t1.fyear - t2.cluster_start + 1 as Cluster_Age
from
tmp.testdata t1
left outer join work.clusters t2
on t1.sic3=t2.sic3
and t1.zipcode=t2.zipcode;
quit;
You might have a better source for lat/long than sashelp...but this might get you started: (yeah for spherical geometry)
%let pi180=0.0174532925199433;
proc sort in=tmp.testdata out=work.testdata;
by gvkey fyear;
run;
proc sort in=tmp.testdata out=work.testdata_srt nodupkey;
by gvkey;
run;
proc sql;
create table work.testdata_geo as
select
t1.gvkey
,t1.fyear
,t1.sic3
,t1.zipcode
,t2.y as latitude_y
,t2.x as longitude_x
from
work.testdata_srt t1
left outer join sashelp.zipcode t2
on input(t1.zipcode,5.)=t2.zip;
alter table work.testdata_geo add constraint pk_tg primary key (gvkey);
create table work.clusters as
select
t1.sic3
,t1.zipcode
,min(t2.fyear) as Cluster_Start
from
work.testdata_geo t1, work.testdata_geo t2
where
t1.sic3=t2.sic3
and (7921.6623*arsin(sqrt((sin((&pi180*t2.latitude_y - &pi180*t1.latitude_y)/2))**2 +
cos(&pi180*t1.latitude_y)*cos(&pi180*t2.latitude_y)*(sin((&pi180*t2.longitude_x - &pi180*t1.longitude_x)/2))**2))) < 50
group by t1.sic3, t1.zipcode;
create table want as
select
t1.*
,t2.cluster_start
,t1.fyear - t2.cluster_start + 1 as Cluster_Age
from
tmp.testdata t1
left outer join work.clusters t2
on t1.sic3=t2.sic3
and t1.zipcode=t2.zipcode;
quit;
Thanks DBailey for the code.
I always used zipcitydistance function in SAS to calculate the geographical distance.
Can the following
(7921.6623*arsin(sqrt((sin((&pi180*t2.latitude_y - &pi180*t1.latitude_y)/2))**2 +
cos(&pi180*t1.latitude_y)*cos(&pi180*t2.latitude_y)*(sin((&pi180*t2.longitude_x - &pi180*t1.longitude_x)/2))**2))) < 50
replaced with
.<zipcitydistance<=50?
Thank you.
Regards,
mspak
well...knowing that sas had a function to do that would have saved me countless hours of scanning my old math books several years ago when I needed the same calculation.
yes it works fine.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.