BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mspak
Quartz | Level 8

Dear all,

I have a dataset with the following variables:

gvkey = firm identification code

fyear = financial year (since year 1950)

sic3 = industry classification code based on the first 3-digit

zipcode = US zipcode

I wish to calculate the following variable:

Cluster age, measured as the number of years since the first firm in a given industry (sic3) was founded in the cluster, where

Cluster, is measured as 50 miles radius from the firm (gvkey) (note: 50 miles radius can be calculated by using the zipcode of the firm), for a given financial year (fyear) in a particular industry (sic3).

In order to calculate cluster age, I must first to know the year that the first firm (in the same industry - SIC3) was found within 50 miles radius from the firm (gvkey), which is also in the same industry as the firm (gvkey).

Mathematical formula to calculate cluster age = fyear - firstyear + 1, where firstyear = the year that the first firm (in the same industry - SIC3) was found within 50 miles radius from the firm (gvkey).

I have no idea on how to calculate this variable.

Hope to get any suggested SAS code which can help me to calculate the cluster age efficiently.

Thank you in advance.

Regards,

mspak

1 ACCEPTED SOLUTION

Accepted Solutions
DBailey
Lapis Lazuli | Level 10

You might have a better source for lat/long than sashelp...but this might get you started: (yeah for spherical geometry)

%let pi180=0.0174532925199433;

proc sort in=tmp.testdata out=work.testdata;

by gvkey fyear;

run;

proc sort in=tmp.testdata out=work.testdata_srt nodupkey;

by gvkey;

run;

proc sql;

create table work.testdata_geo as

select

    t1.gvkey

    ,t1.fyear

    ,t1.sic3

    ,t1.zipcode

    ,t2.y as latitude_y

    ,t2.x as longitude_x

from

    work.testdata_srt t1

    left outer join sashelp.zipcode t2

        on input(t1.zipcode,5.)=t2.zip;

alter table work.testdata_geo add constraint pk_tg primary key (gvkey);

create table work.clusters as

select

    t1.sic3

    ,t1.zipcode

    ,min(t2.fyear) as Cluster_Start

from

    work.testdata_geo t1, work.testdata_geo t2

where

    t1.sic3=t2.sic3

    and (7921.6623*arsin(sqrt((sin((&pi180*t2.latitude_y - &pi180*t1.latitude_y)/2))**2 +

        cos(&pi180*t1.latitude_y)*cos(&pi180*t2.latitude_y)*(sin((&pi180*t2.longitude_x - &pi180*t1.longitude_x)/2))**2))) < 50

group by t1.sic3, t1.zipcode;

create table want as

select

    t1.*

    ,t2.cluster_start

    ,t1.fyear - t2.cluster_start + 1 as Cluster_Age

from

    tmp.testdata t1

    left outer join work.clusters t2

        on t1.sic3=t2.sic3

            and t1.zipcode=t2.zipcode;

quit;

View solution in original post

3 REPLIES 3
DBailey
Lapis Lazuli | Level 10

You might have a better source for lat/long than sashelp...but this might get you started: (yeah for spherical geometry)

%let pi180=0.0174532925199433;

proc sort in=tmp.testdata out=work.testdata;

by gvkey fyear;

run;

proc sort in=tmp.testdata out=work.testdata_srt nodupkey;

by gvkey;

run;

proc sql;

create table work.testdata_geo as

select

    t1.gvkey

    ,t1.fyear

    ,t1.sic3

    ,t1.zipcode

    ,t2.y as latitude_y

    ,t2.x as longitude_x

from

    work.testdata_srt t1

    left outer join sashelp.zipcode t2

        on input(t1.zipcode,5.)=t2.zip;

alter table work.testdata_geo add constraint pk_tg primary key (gvkey);

create table work.clusters as

select

    t1.sic3

    ,t1.zipcode

    ,min(t2.fyear) as Cluster_Start

from

    work.testdata_geo t1, work.testdata_geo t2

where

    t1.sic3=t2.sic3

    and (7921.6623*arsin(sqrt((sin((&pi180*t2.latitude_y - &pi180*t1.latitude_y)/2))**2 +

        cos(&pi180*t1.latitude_y)*cos(&pi180*t2.latitude_y)*(sin((&pi180*t2.longitude_x - &pi180*t1.longitude_x)/2))**2))) < 50

group by t1.sic3, t1.zipcode;

create table want as

select

    t1.*

    ,t2.cluster_start

    ,t1.fyear - t2.cluster_start + 1 as Cluster_Age

from

    tmp.testdata t1

    left outer join work.clusters t2

        on t1.sic3=t2.sic3

            and t1.zipcode=t2.zipcode;

quit;

mspak
Quartz | Level 8

Thanks DBailey for the code.

I always used zipcitydistance function in SAS to calculate the geographical distance.

Can the following

(7921.6623*arsin(sqrt((sin((&pi180*t2.latitude_y - &pi180*t1.latitude_y)/2))**2 +

        cos(&pi180*t1.latitude_y)*cos(&pi180*t2.latitude_y)*(sin((&pi180*t2.longitude_x - &pi180*t1.longitude_x)/2))**2))) < 50

replaced with

.<zipcitydistance<=50?

Thank you.

Regards,

mspak

DBailey
Lapis Lazuli | Level 10

well...knowing that sas had a function to do that would have saved me countless hours of scanning my old math books several years ago when I needed the same calculation.

yes it works fine.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 906 views
  • 0 likes
  • 2 in conversation