I have a dataset pop_dm that contains individuals with some characteristics. I have another dataset lfp which contains parameters from a logit regression for the labor force participation. I want to use those parameters to calculate in pop_dm the individual probability of being in the labor force. I would like to know if there is a simple way to merge both datasets. The variable intercept from lfp should for instance match individuals by sex only. Similarly, the variable agegr_edu_p (parameters for the interaction agegr and edu) should match by sex, agegr and education. I thus cannot use a single merge statement, because the “by” are not the same for the different parameters.
My first idea was to do it in many data steps, but it looks over complicated.
You could either use SQL joins or data step hash lookups. Below a SQL approach (code not tested).
proc sql;
create table want as
select
p.*,
t1.intercept,
t2.agegr_edu_p
from
pop_dm p
left join
(
select sex, intercept
from lfp
where not missing(intercept)
) t1
on p.sex=t1.sex
left join
(
select sex, agegr, edu, agegr_edu_p
from lfp
where not missing(agegr_edu_p)
)t2
on p.sex=t2.sex and p.agegr=t2.agegr and p.edu=t2.edu
;
quit;
And here a hash lookup (not tested).
data want;
if _n_=1 then
do;
if 0 then set lpf(keep=intercept agegr_edu_p);
dcl hash h1(dataset:'lpf(keep=sex intercept where=(not missing(intercept))');
h1.defineKey('sex');
h1.defineData('intercept');
h1.defineDone();
dcl hash h2(dataset:'lpf(keep=sex agegr edu agegr_edu_p where=(not missing(agegr_edu_p))');
h2.defineKey('sex', 'agegr', 'edu');
h2.defineData('agegr_edu_p');
h2.defineDone();
end;
set pop_dm;
h1.find();
h2.find();
run;
Create a format for the sex:
data cntlin;
set ifp;
where nmiss(sex,intercept) = 0 and nmiss(edu,agegr,young_kid,region) = 4;
fmtname = 'sex';
type = 'n';
keep fmtname type sex intercept;
rename sex=start intercept=label;
run;
proc format cntlin=cntlin;
run;
Similarly, extract the data for the more complex value agegr_edu_p, then join on sex,agegr,edu and apply the format during that.
It might help to share the code used to build the model. Several of the modeling procedures have an option to build a special data that can be used to score another data set using the result from the model using Proc PLM.
You could either use SQL joins or data step hash lookups. Below a SQL approach (code not tested).
proc sql;
create table want as
select
p.*,
t1.intercept,
t2.agegr_edu_p
from
pop_dm p
left join
(
select sex, intercept
from lfp
where not missing(intercept)
) t1
on p.sex=t1.sex
left join
(
select sex, agegr, edu, agegr_edu_p
from lfp
where not missing(agegr_edu_p)
)t2
on p.sex=t2.sex and p.agegr=t2.agegr and p.edu=t2.edu
;
quit;
And here a hash lookup (not tested).
data want;
if _n_=1 then
do;
if 0 then set lpf(keep=intercept agegr_edu_p);
dcl hash h1(dataset:'lpf(keep=sex intercept where=(not missing(intercept))');
h1.defineKey('sex');
h1.defineData('intercept');
h1.defineDone();
dcl hash h2(dataset:'lpf(keep=sex agegr edu agegr_edu_p where=(not missing(agegr_edu_p))');
h2.defineKey('sex', 'agegr', 'edu');
h2.defineData('agegr_edu_p');
h2.defineDone();
end;
set pop_dm;
h1.find();
h2.find();
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.