BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

The code below works fine to to link the nearest geographic points given the condition race, gender and age match. However, it keeps crashing at larger counties such as 059. Is there anyway not to create a physical data &dist? Interim dist data seems to take up lot of space since it's involved with cartesian product.

I appreciate your help.

 


%macro models(dist,county,data);
proc sql;
  create table &dist as
    select t1.*,
    t1.latitude as x1_registry,t1.longitude as y1_registry,
    t2.latitude as x2_synthetic,t2.longitude as y2_synthetic,
    t2.sp_hh_id, t2.sp_id_p,
    t2.agecat as s_agecat, t2.racecat as s_racecat,
    t2.hh_income, t2.sex as s_sex, t1.agecat as c_agecat,
    t1.racecat as c_racecat, t1.sex as c_sex,
    geodist(x1_registry, y1_registry, x2_synthetic, y2_synthetic) as distance
      from reg_pancreas4(where=(county=&county)) t1 join p.synpop3(where=(county=&county)) t2
       on t1.agecat=t2.agecat and t1.sex=t2.sex and t1.racecat=t2.racecat
       order by display_id, calculated distance;
quit;

/*hh id*/
data p.&data(compress=yes);
  array mag(11000) $;
  set &dist;
  retain i needone mag:;
  by display_id;
  if _n_ eq 1 then i=0;
  if first.display_id then needone=1;
  if needone then do;
    if sp_hh_id not in mag then do;
      i+1;
      needone=0;
      mag(i)=sp_hh_id;
      output;
    end;
  end;
drop mag: needone i;
run; 

%mend models;
%models(dist059,'059',county059);
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

You should modify your macro first.

 

1. In the first SQL query, create the table with a fixed name. You don't need it past that step so delete it at the end of the step using PROC DATASETS. 

2. Remove the parameters for dist and data, you don't really need them. Add one for SEX.

3. At the end of your macro (before %MEND) add the PROC APPEND to append the data sets

4. After the Append, delete any data set you don't need after the loop finishes. Makes sure that the master data set is empty. If you run it a few times you'll want to delete it first and then recreate it for the full run or you may end up with duplicates. 

 

To call the macro for each combination, see the code below. It's verbose on purpose, it could be done in a single data step so you can follow each step. 

 

1. Create list of all gender/sex

2. Make a string that looks like your  macro call

3. Execute macro using CALL EXECUTE and passing the string from #2 to the function. 

 

As for deleting data sets - I'll leave that to you. untested:

 

*assumes this table has all the county/sex combinations needed;
proc sql;
create table report_list as
select distinct county, sex
from p.synpop3
order by 1, 2;
quit;

*create string variable;
data call_macro;
set report_list;
models(county,data)
str = catt('%models(',
               county, 
              ', '
              sex,
             ');');
run;

*execute macro;
data _null_;
set call_macro (obs=2);
call execute(str);
run;

View solution in original post

8 REPLIES 8
Cruise
Ammonite | Level 13

I'm trying to stratify further by sex: (where=(county=&county and sex='1')). I'll post how it went

PeterClemmensen
Tourmaline | Level 20

Why not simply create a view instead?

Cruise
Ammonite | Level 13

@PeterClemmensen

Any hints for how? I will use the resulting datasets for modelling eventually 

Reeza
Super User

How many points do you have? Are those latitude and longitude specific to anything?

 

I did this as part of my MSc thesis (over a decade ago) and I used a double set with the distance matrix already created to quickly find the nearest neighbor. It's possible this or a hash solution would work as well. Both are beyond my current SAS skills but perhaps someone else can help with that approach if you think it may work. This will only work if the distance matrix/table is a fixed one - that you can consistently reference. 

 

If it's zip codes, maybe you can download the distance tables here and just use as a lookup rather than calculation?

http://www.nber.org/data/zip-code-distance-database.html

 

And/or consider QGIS. It's designed for this type of analysis. 

 

And a last thought...can you filter the data sets you're creating distances for such that you only go X KM outside state and then use GINSIDE or something to specifically limit the points you're using. The odds of a NY zip for example being the closest to Hawaii is 0. Or a distance limit of some sort to drop irrelevant results.

Cruise
Ammonite | Level 13

Hi Reeza, Nassau county (NY) has 1,339,532 people and I'm running by gender which halves the size. reg_panreas data has 3500 people needed to be linked to the people in Nassau county by nearest geodist. both datasets have lat and long of individual residential homes. This method is more intuitive to me since I have no time to learn supposedly great methods you mentioned here. It just finished running without crashing as I write this with gender stratified. I guess I just have to keep cleaning the SAS temp folder and concatenate all stratified datasets later.

Reeza
Super User

@Cruise wrote:

Hi Reeza, Nassau county (NY) has 1,339,532 people and I'm running by gender which halves the size. reg_panreas data has 3500 people needed to be linked to the people in Nassau county by nearest geodist. both datasets have lat and long of individual residential homes. This method is more intuitive to me since I have no time to learn supposedly great methods you mentioned here. It just finished running without crashing as I write this with gender stratified. I guess I just have to keep cleaning the SAS temp folder and concatenate all stratified datasets later.


Fair enough. Use CALL EXECUTE to call the macro for all your age/gender stratification levels and add a step to drop temporary data sets that you don't need. If you can append the results as you go through it's probably a good idea, then less smaller data sets floating around.

 

Cruise
Ammonite | Level 13
Reeza, please show me in the code, would you?
Reeza
Super User

You should modify your macro first.

 

1. In the first SQL query, create the table with a fixed name. You don't need it past that step so delete it at the end of the step using PROC DATASETS. 

2. Remove the parameters for dist and data, you don't really need them. Add one for SEX.

3. At the end of your macro (before %MEND) add the PROC APPEND to append the data sets

4. After the Append, delete any data set you don't need after the loop finishes. Makes sure that the master data set is empty. If you run it a few times you'll want to delete it first and then recreate it for the full run or you may end up with duplicates. 

 

To call the macro for each combination, see the code below. It's verbose on purpose, it could be done in a single data step so you can follow each step. 

 

1. Create list of all gender/sex

2. Make a string that looks like your  macro call

3. Execute macro using CALL EXECUTE and passing the string from #2 to the function. 

 

As for deleting data sets - I'll leave that to you. untested:

 

*assumes this table has all the county/sex combinations needed;
proc sql;
create table report_list as
select distinct county, sex
from p.synpop3
order by 1, 2;
quit;

*create string variable;
data call_macro;
set report_list;
models(county,data)
str = catt('%models(',
               county, 
              ', '
              sex,
             ');');
run;

*execute macro;
data _null_;
set call_macro (obs=2);
call execute(str);
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1367 views
  • 0 likes
  • 3 in conversation