Solved: Re: Number of months in which the company remain in same zip code area...

mspak · Posted 02-19-2012 09:33 AM

Good days to all,

I attached a file containing the following variables:

crsp_fundn = Identification code for a company

chgdt = Beginning date for contact information

chgenddt = Ending date for contact information

City =City for contact information

State = State for contact information

zip = Zipcodes for contact information.

My main task now is to calculate 'number of months in which the company keep an unique zipcode for a given year' starting for year 2003 - 2010.

For example, company with crsp_fundn=105, keep the same zipcode for 12 mths for each year 2003 - 2010.

My expected output:

All my input:

crsp_fundn = Identification code for a company

chgdt = Beginning date for contact information

chgenddt = Ending date for contact information

City =City for contact information

State = State for contact information

zip = Zipcodes for contact information.

Plus:

number of months which company keep a given zipcode for a specific year (for years 2003 - 2010; one value ie. months for each year)

Thank you.

mspak

art297 · Posted 03-07-2012 10:14 AM

Yes they can provide the same results if applied correctly. However, in your case, I think a simple datastep would be both easier and quicker. e.g.:

libname art "c:\art";

data want (drop=in:);

set art.fmutualzip;

array years in:;

do over years;

if years ge 180 then do;

year=input(substr(vname(years),3),4.);

output;

end;

run;

View solution in original post

art297 · Posted 02-19-2012 11:49 AM

While SAS has functions to count the number of month boundaries crossed between two dates, I think that would end up giving you rather biased results. I think you could get more accurate results by obtaining the number of days at particular zip codes.

The following is only offered as a possible method. I did not check to ensure that it is doing everything correctly .. you would have to do that on your own:

libname thedata "c:\";

data temp (rename=

(years1=in2003

years2=in2004

years3=in2005

years4=in2006

years5=in2007

years6=in2008

years7=in2009

years8=in2010))

;

set thedata.mutualcom (

where=(year(chgdt) le 2010 and

year(chgdt) gt 2002)

drop=city state);

array years(2003:2010);

if year(chgenddt) ge 2011 then

chgenddt='31DEC2010'd;

if year(chgdt) lt 2003 then

chgdt='01JAN2003'd;

do year=year(chgdt) to year(chgenddt);

if year ne year(chgenddt) then

end=mdy(12,31,year);

else end=chgenddt;

if year gt year(chgdt) then

start=mdy(1,1,year);

else start=chgdt;

years(year)=end-start+1;

end;

run;

proc summary data=temp nway;

var in:;

class crsp_fundn zip;

output out=want (drop=_:)

sum=;

run;

mspak · Posted 02-20-2012 01:24 AM

Thank you very much

Ksharp · Posted 02-20-2012 02:11 AM

You'd better post some output to clarify what you want.

Ksharp

mspak · Posted 03-07-2012 12:55 AM

Hi all,

I ran the program suggested and yield the output with file name "fmutualzip". I get the number of days in which the mutual fund companies available for a given year. I would like to set a criteria that if a particular mutual fund available for 180 or more days, then the firms is considered as "available for the year".

My purpose is to calculate the distance between the US firms (as attached file "comp") and mutual fund companies available for years 2003 to 2010. How should I restructure the data as per in "fmutualzip" into a format that enables distance calculation?

Thank you for any help.

Regards,

MEI SEN

Ksharp · Posted 03-07-2012 02:00 AM

Sorry. I don't understand what you mean.

What is Distance you refered to ?

Ksharp

mspak · Posted 03-07-2012 02:07 AM

geographic distance..I have zipcodes for both mutual fund companies and US listed companies in files. I used zipcodes to calculate distance between two locations.

As all firms in US, the zipcitydistance function is suitable for distance calculation. But i don't know how to restructure the data in "fmutualzip" into the following columns:

crsp_fundn (mutual fund identification code) zip year

I wish only have one "year" column. The mutual fund which available for 180 days or more considered as "mutual companies "available" for a given year.

Thanks.

mspak

art297 · Posted 03-07-2012 09:11 AM

Do the zipcode for mutual funds change over time? If not, I would just get rid of all of the duplicate records in file comp and eliminate the year variable from that file. If they do change over time, I think you will have to restructure one of your files so that they only have one year variable each, or both have wide records that include fields for multiple years.

mspak · Posted 03-07-2012 09:23 AM

Hi art297,

the zipcodes for mutual funds might change as they might move their headquarter to another area. The original files that I posted have all starting date and ending date for the same zipcode for a given mutual funds. I couldn't include the mutual funds that have been in a given area for a very short period of time (perhaps only 1 day). Therefore, I set a criteria that if the mutual fund is available for at least 180 days for a given year, then I should calculate the distance between the mutual fund companies and the US listed firms.

At the final stage, I have to calculate how many mutual funds companies are located within 60 miles radius from the US firms' headquarter.

Thank you.

Regards,

mspak

art297 · Posted 03-07-2012 09:29 AM

Then, if it were me, I would transpose the fmutual file so that it (1) each year was on a separate record (with the same variable name as in your comp file) and (2) only keep those records that meet your criteria.

Then, the problem becomes a simple one-to-one merge that can either be done in a datastep or with proc sql.

mspak · Posted 03-07-2012 09:53 AM

Thanks,

I am new user of SAS and still exploring countless of the SAS functions. As per information that I read, I can use either proc transpose or proc summary using IDGROUP option or data step to transpose data. Let me to explore more. Will these tranposition methods give rise to the same result?

Regards,

mspak

art297 · Posted 03-07-2012 10:14 AM

Yes they can provide the same results if applied correctly. However, in your case, I think a simple datastep would be both easier and quicker. e.g.:

libname art "c:\art";

data want (drop=in:);

set art.fmutualzip;

array years in:;

do over years;

if years ge 180 then do;

year=input(substr(vname(years),3),4.);

output;

end;

run;

mspak · Posted 03-07-2012 10:22 AM

Hi art297,

Thank you very much. I will also explore more for SAS transposition methods for my future use.

Regards,

mspak

Number of months in which the company remain in same zip code area for a given year

Re: Number of months in which the company remain in same zip code area for a given year

Number of months in which the company remain in same zip code area for a given year

Number of months in which the company remain in same zip code area for a given year

Number of months in which the company remain in same zip code area for a given year

Re: Number of months in which the company remain in same zip code area for a given year

Re: Number of months in which the company remain in same zip code area for a given year

Re: Number of months in which the company remain in same zip code area for a given year

Re: Number of months in which the company remain in same zip code area for a given year

Re: Number of months in which the company remain in same zip code area for a given year

Re: Number of months in which the company remain in same zip code area for a given year

Re: Number of months in which the company remain in same zip code area for a given year

Re: Number of months in which the company remain in same zip code area for a given year

Number of months in which the company remain in same zip code area for a given year

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away