BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mspak
Quartz | Level 8

Good days to all,

I attached a file containing the following variables:

crsp_fundn = Identification code for a company

chgdt = Beginning date for contact information

chgenddt = Ending date for contact information

City =City for contact information

State = State for contact information

zip = Zipcodes for contact information.

My main task now is to calculate 'number of months in which the company keep an unique zipcode for a given year' starting for year 2003 - 2010.

For example, company with crsp_fundn=105, keep the same zipcode for 12 mths for each year 2003 - 2010.

My expected output:

All my input:

crsp_fundn = Identification code for a company

chgdt = Beginning date for contact information

chgenddt = Ending date for contact information

City =City for contact information

State = State for contact information

zip = Zipcodes for contact information.

Plus:

number of months which company keep a given zipcode for a specific year (for years 2003 - 2010; one value ie. months for each year)

Thank you.

mspak

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Yes they can provide the same results if applied correctly.  However, in your case, I think a simple datastep would be both easier and quicker.  e.g.:

libname art "c:\art";

data want (drop=in:);

  set art.fmutualzip;

  array years in:;

  do over years;

    if years ge 180 then do;

      year=input(substr(vname(years),3),4.);

      output;

    end;

  end;

run;

View solution in original post

12 REPLIES 12
art297
Opal | Level 21

While SAS has functions to count the number of month boundaries crossed between two dates, I think that would end up giving you rather biased results.  I think you could get more accurate results by obtaining the number of days at particular zip codes.

The following is only offered as a possible method.  I did not check to ensure that it is doing everything correctly .. you would have to do that on your own:

libname thedata "c:\";

data temp (rename=

    (years1=in2003

    years2=in2004

    years3=in2005

    years4=in2006

    years5=in2007

    years6=in2008

    years7=in2009

    years8=in2010))

;

  set thedata.mutualcom (

    where=(year(chgdt) le 2010 and

      year(chgdt) gt 2002)

    drop=city state);

  array years(2003:2010);

  if year(chgenddt) ge 2011 then

    chgenddt='31DEC2010'd;

  if year(chgdt) lt 2003 then

    chgdt='01JAN2003'd;

  do year=year(chgdt) to year(chgenddt);

    if year ne year(chgenddt) then

      end=mdy(12,31,year);

    else end=chgenddt;

    if year gt year(chgdt) then

      start=mdy(1,1,year);

    else start=chgdt;

    years(year)=end-start+1;

  end;

run;

proc summary data=temp nway;

  var in:;

  class crsp_fundn zip;

  output out=want (drop=_:)

             sum=;

run;

Ksharp
Super User

You'd better post some output to clarify what you want.

Ksharp

mspak
Quartz | Level 8

Hi all,

I ran the program suggested  and yield the output with file name "fmutualzip". I get the number of days in which the mutual fund companies available for a given year. I would like to set a criteria that if a particular mutual fund available for 180 or more days, then the firms is considered as "available for the year".

My purpose is to calculate the distance between the US firms (as attached file "comp") and mutual fund companies available for years 2003 to 2010.  How should I restructure the data as per in "fmutualzip" into a format that enables distance calculation?

Thank you for any help.

Regards,

MEI SEN

Ksharp
Super User

Sorry. I don't understand what you mean.

What is Distance you refered to ?

Ksharp

mspak
Quartz | Level 8

geographic distance..I have zipcodes for both mutual fund companies and US listed companies in files. I used zipcodes to calculate distance between two locations.

As all firms in US, the zipcitydistance function is suitable for distance calculation. But i don't know how to restructure the data in "fmutualzip"  into the following columns:

crsp_fundn (mutual fund identification code)   zip  year

I wish only have one "year" column. The mutual fund which available for 180 days or more considered as "mutual companies "available" for a given year.

Thanks.

mspak

art297
Opal | Level 21

Do the zipcode for mutual funds change over time?  If not, I would just get rid of all of the duplicate records in file comp and eliminate the year variable from that file.  If they do change over time, I think you will have to restructure one of your files so that they only have one year variable each, or both have wide records that include fields for multiple years.

mspak
Quartz | Level 8

Hi art297,

the zipcodes for mutual funds might change as they might move their headquarter to another area. The original files that I posted have all starting date and ending date for the same zipcode for a given mutual funds. I couldn't include the mutual funds that have been in a given area for a very short period of time (perhaps only 1 day). Therefore, I set a criteria that if the mutual fund is available for at least 180 days for a given year, then I should calculate the distance between the mutual fund companies and the US listed firms.

At the final stage, I have to calculate how many mutual funds companies are located within 60 miles radius from the US firms' headquarter.

Thank you.

Regards,

mspak

art297
Opal | Level 21

Then, if it were me, I would transpose the fmutual file so that it (1) each year was on a separate record (with the same variable name as in your comp file) and (2) only keep those records that meet your criteria.

Then, the problem becomes a simple one-to-one merge that can either be done in a datastep or with proc sql.

mspak
Quartz | Level 8

Thanks,

I am new user of SAS and still exploring countless of the SAS functions. As per information that I read, I can use  either proc transpose or proc summary using IDGROUP option or data step to transpose data. Let me to explore more. Will these tranposition methods give rise to the same result?

Smiley Happy

Regards,

mspak

art297
Opal | Level 21

Yes they can provide the same results if applied correctly.  However, in your case, I think a simple datastep would be both easier and quicker.  e.g.:

libname art "c:\art";

data want (drop=in:);

  set art.fmutualzip;

  array years in:;

  do over years;

    if years ge 180 then do;

      year=input(substr(vname(years),3),4.);

      output;

    end;

  end;

run;

mspak
Quartz | Level 8

Hi art297,

Thank you very much. I will also explore more for SAS transposition methods for my future use.

Regards,

mspak

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 857 views
  • 3 likes
  • 3 in conversation