Help using Base SAS procedures

Number of years since the first observation was found in a geographical area

Accepted Solution Solved
Reply
Regular Contributor
Posts: 162
Accepted Solution

Number of years since the first observation was found in a geographical area

Dear all,

I have a dataset with the following variables:

gvkey = firm identification code

fyear = financial year (since year 1950)

sic3 = industry classification code based on the first 3-digit

zipcode = US zipcode

I wish to calculate the following variable:

Cluster age, measured as the number of years since the first firm in a given industry (sic3) was founded in the cluster, where

Cluster, is measured as 50 miles radius from the firm (gvkey) (note: 50 miles radius can be calculated by using the zipcode of the firm), for a given financial year (fyear) in a particular industry (sic3).

In order to calculate cluster age, I must first to know the year that the first firm (in the same industry - SIC3) was found within 50 miles radius from the firm (gvkey), which is also in the same industry as the firm (gvkey).

Mathematical formula to calculate cluster age = fyear - firstyear + 1, where firstyear = the year that the first firm (in the same industry - SIC3) was found within 50 miles radius from the firm (gvkey).

I have no idea on how to calculate this variable.

Hope to get any suggested SAS code which can help me to calculate the cluster age efficiently.

Thank you in advance.

Regards,

mspak

Attachment

Accepted Solutions
Solution
‎02-07-2013 08:20 AM
Super Contributor
Posts: 578

Re: Number of years since the first observation was found in a geographical area

You might have a better source for lat/long than sashelp...but this might get you started: (yeah for spherical geometry)

%let pi180=0.0174532925199433;

proc sort in=tmp.testdata out=work.testdata;

by gvkey fyear;

run;

proc sort in=tmp.testdata out=work.testdata_srt nodupkey;

by gvkey;

run;

proc sql;

create table work.testdata_geo as

select

    t1.gvkey

    ,t1.fyear

    ,t1.sic3

    ,t1.zipcode

    ,t2.y as latitude_y

    ,t2.x as longitude_x

from

    work.testdata_srt t1

    left outer join sashelp.zipcode t2

        on input(t1.zipcode,5.)=t2.zip;

alter table work.testdata_geo add constraint pk_tg primary key (gvkey);

create table work.clusters as

select

    t1.sic3

    ,t1.zipcode

    ,min(t2.fyear) as Cluster_Start

from

    work.testdata_geo t1, work.testdata_geo t2

where

    t1.sic3=t2.sic3

    and (7921.6623*arsin(sqrt((sin((&pi180*t2.latitude_y - &pi180*t1.latitude_y)/2))**2 +

        cos(&pi180*t1.latitude_y)*cos(&pi180*t2.latitude_y)*(sin((&pi180*t2.longitude_x - &pi180*t1.longitude_x)/2))**2))) < 50

group by t1.sic3, t1.zipcode;

create table want as

select

    t1.*

    ,t2.cluster_start

    ,t1.fyear - t2.cluster_start + 1 as Cluster_Age

from

    tmp.testdata t1

    left outer join work.clusters t2

        on t1.sic3=t2.sic3

            and t1.zipcode=t2.zipcode;

quit;

View solution in original post


All Replies
Solution
‎02-07-2013 08:20 AM
Super Contributor
Posts: 578

Re: Number of years since the first observation was found in a geographical area

You might have a better source for lat/long than sashelp...but this might get you started: (yeah for spherical geometry)

%let pi180=0.0174532925199433;

proc sort in=tmp.testdata out=work.testdata;

by gvkey fyear;

run;

proc sort in=tmp.testdata out=work.testdata_srt nodupkey;

by gvkey;

run;

proc sql;

create table work.testdata_geo as

select

    t1.gvkey

    ,t1.fyear

    ,t1.sic3

    ,t1.zipcode

    ,t2.y as latitude_y

    ,t2.x as longitude_x

from

    work.testdata_srt t1

    left outer join sashelp.zipcode t2

        on input(t1.zipcode,5.)=t2.zip;

alter table work.testdata_geo add constraint pk_tg primary key (gvkey);

create table work.clusters as

select

    t1.sic3

    ,t1.zipcode

    ,min(t2.fyear) as Cluster_Start

from

    work.testdata_geo t1, work.testdata_geo t2

where

    t1.sic3=t2.sic3

    and (7921.6623*arsin(sqrt((sin((&pi180*t2.latitude_y - &pi180*t1.latitude_y)/2))**2 +

        cos(&pi180*t1.latitude_y)*cos(&pi180*t2.latitude_y)*(sin((&pi180*t2.longitude_x - &pi180*t1.longitude_x)/2))**2))) < 50

group by t1.sic3, t1.zipcode;

create table want as

select

    t1.*

    ,t2.cluster_start

    ,t1.fyear - t2.cluster_start + 1 as Cluster_Age

from

    tmp.testdata t1

    left outer join work.clusters t2

        on t1.sic3=t2.sic3

            and t1.zipcode=t2.zipcode;

quit;

Regular Contributor
Posts: 162

Re: Number of years since the first observation was found in a geographical area

Thanks DBailey for the code.

I always used zipcitydistance function in SAS to calculate the geographical distance.

Can the following

(7921.6623*arsin(sqrt((sin((&pi180*t2.latitude_y - &pi180*t1.latitude_y)/2))**2 +

        cos(&pi180*t1.latitude_y)*cos(&pi180*t2.latitude_y)*(sin((&pi180*t2.longitude_x - &pi180*t1.longitude_x)/2))**2))) < 50

replaced with

.<zipcitydistance<=50?

Thank you.

Regards,

mspak

Super Contributor
Posts: 578

Re: Number of years since the first observation was found in a geographical area

well...knowing that sas had a function to do that would have saved me countless hours of scanning my old math books several years ago when I needed the same calculation.

yes it works fine.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 298 views
  • 0 likes
  • 2 in conversation