Good days to all,
I attached a file containing the following variables:
crsp_fundn = Identification code for a company
chgdt = Beginning date for contact information
chgenddt = Ending date for contact information
City =City for contact information
State = State for contact information
zip = Zipcodes for contact information.
My main task now is to calculate 'number of months in which the company keep an unique zipcode for a given year' starting for year 2003 - 2010.
For example, company with crsp_fundn=105, keep the same zipcode for 12 mths for each year 2003 - 2010.
My expected output:
All my input:
crsp_fundn = Identification code for a company
chgdt = Beginning date for contact information
chgenddt = Ending date for contact information
City =City for contact information
State = State for contact information
zip = Zipcodes for contact information.
Plus:
number of months which company keep a given zipcode for a specific year (for years 2003 - 2010; one value ie. months for each year)
Thank you.
mspak
Yes they can provide the same results if applied correctly. However, in your case, I think a simple datastep would be both easier and quicker. e.g.:
libname art "c:\art";
data want (drop=in:);
set art.fmutualzip;
array years in:;
do over years;
if years ge 180 then do;
year=input(substr(vname(years),3),4.);
output;
end;
end;
run;
While SAS has functions to count the number of month boundaries crossed between two dates, I think that would end up giving you rather biased results. I think you could get more accurate results by obtaining the number of days at particular zip codes.
The following is only offered as a possible method. I did not check to ensure that it is doing everything correctly .. you would have to do that on your own:
libname thedata "c:\";
data temp (rename=
(years1=in2003
years2=in2004
years3=in2005
years4=in2006
years5=in2007
years6=in2008
years7=in2009
years8=in2010))
;
set thedata.mutualcom (
where=(year(chgdt) le 2010 and
year(chgdt) gt 2002)
drop=city state);
array years(2003:2010);
if year(chgenddt) ge 2011 then
chgenddt='31DEC2010'd;
if year(chgdt) lt 2003 then
chgdt='01JAN2003'd;
do year=year(chgdt) to year(chgenddt);
if year ne year(chgenddt) then
end=mdy(12,31,year);
else end=chgenddt;
if year gt year(chgdt) then
start=mdy(1,1,year);
else start=chgdt;
years(year)=end-start+1;
end;
run;
proc summary data=temp nway;
var in:;
class crsp_fundn zip;
output out=want (drop=_:)
sum=;
run;
Thank you very much
You'd better post some output to clarify what you want.
Ksharp
Hi all,
I ran the program suggested and yield the output with file name "fmutualzip". I get the number of days in which the mutual fund companies available for a given year. I would like to set a criteria that if a particular mutual fund available for 180 or more days, then the firms is considered as "available for the year".
My purpose is to calculate the distance between the US firms (as attached file "comp") and mutual fund companies available for years 2003 to 2010. How should I restructure the data as per in "fmutualzip" into a format that enables distance calculation?
Thank you for any help.
Regards,
MEI SEN
Sorry. I don't understand what you mean.
What is Distance you refered to ?
Ksharp
geographic distance..I have zipcodes for both mutual fund companies and US listed companies in files. I used zipcodes to calculate distance between two locations.
As all firms in US, the zipcitydistance function is suitable for distance calculation. But i don't know how to restructure the data in "fmutualzip" into the following columns:
crsp_fundn (mutual fund identification code) zip year
I wish only have one "year" column. The mutual fund which available for 180 days or more considered as "mutual companies "available" for a given year.
Thanks.
mspak
Do the zipcode for mutual funds change over time? If not, I would just get rid of all of the duplicate records in file comp and eliminate the year variable from that file. If they do change over time, I think you will have to restructure one of your files so that they only have one year variable each, or both have wide records that include fields for multiple years.
Hi art297,
the zipcodes for mutual funds might change as they might move their headquarter to another area. The original files that I posted have all starting date and ending date for the same zipcode for a given mutual funds. I couldn't include the mutual funds that have been in a given area for a very short period of time (perhaps only 1 day). Therefore, I set a criteria that if the mutual fund is available for at least 180 days for a given year, then I should calculate the distance between the mutual fund companies and the US listed firms.
At the final stage, I have to calculate how many mutual funds companies are located within 60 miles radius from the US firms' headquarter.
Thank you.
Regards,
mspak
Then, if it were me, I would transpose the fmutual file so that it (1) each year was on a separate record (with the same variable name as in your comp file) and (2) only keep those records that meet your criteria.
Then, the problem becomes a simple one-to-one merge that can either be done in a datastep or with proc sql.
Thanks,
I am new user of SAS and still exploring countless of the SAS functions. As per information that I read, I can use either proc transpose or proc summary using IDGROUP option or data step to transpose data. Let me to explore more. Will these tranposition methods give rise to the same result?
Regards,
mspak
Yes they can provide the same results if applied correctly. However, in your case, I think a simple datastep would be both easier and quicker. e.g.:
libname art "c:\art";
data want (drop=in:);
set art.fmutualzip;
array years in:;
do over years;
if years ge 180 then do;
year=input(substr(vname(years),3),4.);
output;
end;
end;
run;
Hi art297,
Thank you very much. I will also explore more for SAS transposition methods for my future use.
Regards,
mspak
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.