Help using Base SAS procedures

PROC SQL PROCESSING

Reply
Regular Contributor
Posts: 162

PROC SQL PROCESSING

Good days to all,

I recently used SQL to generate possible pairs of geograhic zipcodes between one location to another. I have about 13,000 firms per year for 8 years. Therefore, number of firm-years is estimated to be 104,000. Then, I have to calculate the distance between every firm-year to all of the mutual fund companies (which is about 17,000 firms for each year). By the end, I have to calculate mutual fund density, which is defined as log (1+mutual fund companies with distance within 60 miles radius from the firm).    

The submitted program was run for 7 hours but I failed to generate output due to insufficient spaces. My notebook has free space about 365 GB but yet insufficient for storing the output. I am looking forward any response which can solve the program.

My concerns:

1. How to submit the program in batches?

2. How can I estimate the processing time. I couldn't find any indicator on the estimated time and storage needed for running programs.

3. Any recommendations and suggestions...etc.

The following is my program (with libname "geog"):

proc sql;

create table geog.fundcom as

select a.fyear, tic, crsp_fundn, zipcitydistance(zipcode, zip)as mutualdist

from geog.comp a , geog.Mutualcom b

where a.fyear=b.year

group by a.fyear,tic;

having not missing(calculated mutualdist);

quit;

proc sql;

create table geog.mutualnum as

select fyear, tic, sum(0<=mutualdist<=60) as mutualnum

from geog.fundcom

group by tic, fyear;

quit; run;

data geog.mutualdensity; set geog.mutualnum;

mutual_density = log (1+mutualnum);

run;

Thank you.

Mspak

PROC Star
Posts: 7,356

PROC SQL PROCESSING

Your code shouldn't have worked as posted  The first proc sql had a semicolon after the group by clause.  Plus, why create a big file only to eliminate the unnecessary records?

Doe the following accomplish the initial step you wanted?:

proc sql;

  create table geog.fundcom as

    select a.fyear,

           tic,

           crsp_fundn,

           zipcitydistance(zipcode, zip)as mutualdist,

           count(*) as mutualnum

      from geog.comp a , geog.Mutualcom b

        where a.fyear=b.year

          group by a.fyear,tic

            having calculated mutualdist between 0 and 60

  ;

quit;

Regular Contributor
Posts: 162

PROC SQL PROCESSING

Thanks art297 for your kind helps especially to me who is a new SAS user.

I ran the program and it proves successful. There are many obervations with different pairs of possible origin and destination. In fact I wish to have mutualnum for each tic-fyear. Therefore, the crsp_fundn is not important to be included. I have modified it to generate my desired output format.

I came across a data step approach which can significantly reduce the processing time, namely DATA Step Hash Object. I am not sure whether this method can be used for this example and can this method further reduces the data storage and time?

Have a nice day Smiley Happy

Regards,

mspak

Respected Advisor
Posts: 3,124

Re: PROC SQL PROCESSING

Hi Mspak,

My solution to your another question could be applied to this senario without many changes:

data want (drop=_Smiley Happy;

  if _n_=1 then do;

     set geog.mutualcom (obs=1 rename=(year=fyear zip=_zip));

dcl hash h(dataset:'geog.mutualcom(rename=(year=fyear zip=_zip))', multidata:'yes', hashexp:16);

h.definekey('fyear');

h.definedata('_zip');

h.definedone();

end;

set geog.comp;

   _ct=0;

   _rc=find();

   do while (_rc=0);

      if .< zipcitydistance(zipcode, _zip)<=60 then _ct+1;

_rc=find_next();

end;

    mutual_density=log(1+_ct);

run;

of couse not tested, so my code is subjected to typo and other errors.

Regards,

Haikuo

Edit: Before running the code, I would clean up the tables to make sure all of the zip codes are legit per zipcitydistance().

Regular Contributor
Posts: 162

PROC SQL PROCESSING

Thanks HaiKuo for helps,

I will test run the program too. Learning SAS Programming with helps from all the experts here is fun Smiley Happy. Hope that I can master this skill one day!!!

Goodnite (12 Marc 2012, 11.57pm)

Regards,

mspak

Regular Contributor
Posts: 162

Re: PROC SQL PROCESSING

Hi art297 and Hai.kuo,

I tested program suggested by Hai.kuo, some error messages on log:

16   data geog.want (drop=_Smiley Happy;

17

18     if _n_=1 then do;

19

20        set geog.mutualcom (obs=1 rename=(year=fyear zip=_zip));

21

22   dcl hash h(dataset:'geog.mutualcom(rename=(year=fyear zip=_zip))', multidata:'yes',

22 ! hashexp:16);

23

24   h.definekey('fyear');

25

26   h.definedata('_zip');

27

28   h.definedone();

29

30   end;

31

32

33

34   set geog.comp;

35

36      _ct=0;

37

38      _rc=find();

            ----

            71

ERROR 71-185: The FIND function call does not have enough arguments.

39

40      do while (_rc=0);

41

42         if .< zipcitydistance(zipcode, _zip)<=60 then _ct+1;

43

44   _rc=find_next();

         ---------

         68

ERROR 68-185: The function FIND_NEXT is unknown, or cannot be accessed.

45

46   end;

47

48       mutual_density=log(1+_ct);

49

50   run;

NOTE: The SAS System stopped processing this step because of errors.

WARNING: The data set GEOG.WANT may be incomplete.  When this step was stopped there were 0

         observations and 7 variables.

NOTE: DATA statement used (Total process time):

      real time           1.06 seconds

      cpu time            0.03 seconds

Thanks for help. I carefully study the data generated by submitting the program suggested by art297, the output is strange as firms in the same year have the same nutualnum and mutual_density. I came across the same problem too. The count function for firms within 60-mile radius is tricky, I re-calculated mutualnum using sum function instead as per follows:

proc sql;

create table mutualnum as

select fyear, tic, sum(0<=mutualdist<=60) as mutualnum1

from geog.fundcom

group by tic, fyear;

quit; run;

proc sql;

create table geog.mutualden as

select unique fyear, tic, mutualnum1, log(1+ mutualnum1) as mutual_density

from mutualnum

having not missing (calculated mutual_density);

quit;

quit; run;

The correct result can be achieved by using sum function.

I think I must carefully study why and in what circumstances that count function can be tricky?

Thank you.

mspak

Respected Advisor
Posts: 3,124

Re: PROC SQL PROCESSING

oops. There were some obvious errors, I did not reference the hash name. Try this:

data want (drop=_Smiley Happy;

  if _n_=1 then do;

     set geog.mutualcom (obs=1 rename=(year=fyear zip=_zip));

dcl hash h(dataset:'geog.mutualcom(rename=(year=fyear zip=_zip))', multidata:'yes', hashexp:16);

h.definekey('fyear');

h.definedata('_zip');

h.definedone();

end;

set geog.comp;

   _ct=0;

   _rc=h.find();

   do while (_rc=0);

      if .< zipcitydistance(zipcode, _zip)<=60 then _ct+1;

_rc=h.find_next();

end;

    mutual_density=log(1+_ct);

run;

Regular Contributor
Posts: 162

Re: PROC SQL PROCESSING

Hi Hai.Kuo,

Thanks and I will try again. I just download a few articles relating to thsi topics so that I can make use of this efficient function in future.

Have a nice day!!!

Time now at my place: 7.19pm

Regards,

MEI SEN

Ask a Question
Discussion stats
  • 7 replies
  • 354 views
  • 6 likes
  • 3 in conversation