Solved: Complex merging

Demographer · Posted 11-06-2019 04:02 AM

I have a dataset pop_dm that contains individuals with some characteristics. I have another dataset lfp which contains parameters from a logit regression for the labor force participation. I want to use those parameters to calculate in pop_dm the individual probability of being in the labor force. I would like to know if there is a simple way to merge both datasets. The variable intercept from lfp should for instance match individuals by sex only. Similarly, the variable agegr_edu_p (parameters for the interaction agegr and edu) should match by sex, agegr and education. I thus cannot use a single merge statement, because the “by” are not the same for the different parameters.

My first idea was to do it in many data steps, but it looks over complicated.

Patrick · Posted 11-06-2019 04:57 PM

You could either use SQL joins or data step hash lookups. Below a SQL approach (code not tested).

proc sql;
  create table want as
    select 
      p.*,
      t1.intercept,
      t2.agegr_edu_p
    from 
      pop_dm p

      left join
      (
        select sex, intercept
        from lfp
        where not missing(intercept)
      ) t1
      on p.sex=t1.sex

      left join
      (
        select sex, agegr, edu, agegr_edu_p
        from lfp
        where not missing(agegr_edu_p)
      )t2
      on p.sex=t2.sex and p.agegr=t2.agegr and p.edu=t2.edu
    ;
quit;

And here a hash lookup (not tested).

data want;
  if _n_=1 then
    do;
      if 0 then set lpf(keep=intercept agegr_edu_p);

      dcl hash h1(dataset:'lpf(keep=sex intercept where=(not missing(intercept))');
      h1.defineKey('sex');
      h1.defineData('intercept');
      h1.defineDone();

      dcl hash h2(dataset:'lpf(keep=sex agegr edu agegr_edu_p where=(not missing(agegr_edu_p))');
      h2.defineKey('sex', 'agegr', 'edu');
      h2.defineData('agegr_edu_p');
      h2.defineDone();
    end;

    set pop_dm;
    h1.find();
    h2.find();
run;

View solution in original post

Kurt_Bremser · Posted 11-06-2019 04:17 AM

Create a format for the sex:

data cntlin;
set ifp;
where nmiss(sex,intercept) = 0 and nmiss(edu,agegr,young_kid,region) = 4;
fmtname = 'sex';
type = 'n';
keep fmtname type sex intercept;
rename sex=start intercept=label;
run;

proc format cntlin=cntlin;
run;

Similarly, extract the data for the more complex value agegr_edu_p, then join on sex,agegr,edu and apply the format during that.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ballardw · Posted 11-06-2019 01:43 PM

It might help to share the code used to build the model. Several of the modeling procedures have an option to build a special data that can be used to score another data set using the result from the model using Proc PLM.

Patrick · Posted 11-06-2019 04:57 PM

You could either use SQL joins or data step hash lookups. Below a SQL approach (code not tested).

proc sql;
  create table want as
    select 
      p.*,
      t1.intercept,
      t2.agegr_edu_p
    from 
      pop_dm p

      left join
      (
        select sex, intercept
        from lfp
        where not missing(intercept)
      ) t1
      on p.sex=t1.sex

      left join
      (
        select sex, agegr, edu, agegr_edu_p
        from lfp
        where not missing(agegr_edu_p)
      )t2
      on p.sex=t2.sex and p.agegr=t2.agegr and p.edu=t2.edu
    ;
quit;

And here a hash lookup (not tested).

data want;
  if _n_=1 then
    do;
      if 0 then set lpf(keep=intercept agegr_edu_p);

      dcl hash h1(dataset:'lpf(keep=sex intercept where=(not missing(intercept))');
      h1.defineKey('sex');
      h1.defineData('intercept');
      h1.defineDone();

      dcl hash h2(dataset:'lpf(keep=sex agegr edu agegr_edu_p where=(not missing(agegr_edu_p))');
      h2.defineKey('sex', 'agegr', 'edu');
      h2.defineData('agegr_edu_p');
      h2.defineDone();
    end;

    set pop_dm;
    h1.find();
    h2.find();
run;

Complex merging

Re: Complex merging

Re: Complex merging

Re: Complex merging

Re: Complex merging

Complex merging

Re: Complex merging

Re: Complex merging

Re: Complex merging

Re: Complex merging

SAS Innovate 2025: Call for Content