Hi,
I have this code that merges regression parameters (param.lfp) to an micro dataset (pop_lfp1). The regression file is organised in a way that each variable has its own column with their specific categories on different lines and parameters corresponding to these categories being on another column (see attached file). The point of the merge is to have for each individuals of pop_lfp1 the regression parameters that correspond to their own characteristics. Since the code is repeating for each variable (intercept, edu, agegr, etc.), I'd like to have a macro that makes it shorter, in which I could just declare variables to merge in parameters.
proc sql;
create table pop_lfp1 as
select
p.*,
t1.intercept, t2.edu_p, t3.agegr_p, t4.agegr_edu_p,
t5.young_kid_p, t6.young_kid_edu_p, t7.region_p
from
pop2 p
left join
( select sex, intercept
from param.lfp
where not missing(intercept)
) t1
on p.sex=t1.sex
left join
( select sex, edu, edu_p
from param.lfp
where not missing(edu_p)
)t2
on p.sex=t2.sex and p.edu=t2.edu
left join
( select sex, agegr, agegr_p
from param.lfp
where not missing(agegr_p)
)t3
on p.sex=t3.sex and p.agegr=t3.agegr
left join
( select sex, agegr, edu, agegr_edu_p
from param.lfp
where not missing(agegr_edu_p)
)t4
on p.sex=t4.sex and p.agegr=t4.agegr and p.edu=t4.edu
left join
( select sex, young_kid, young_kid_p
from param.lfp
where not missing(young_kid_p)
)t5
on p.sex=t5.sex and p.young_kid=t5.young_kid
left join
( select sex, edu, young_kid, young_kid_edu_p
from param.lfp
where not missing(young_kid_edu_p)
)t6
on p.sex=t6.sex and p.edu=t6.edu and p.young_kid=t6.young_kid
left join
( select sex, region, region_p
from param.lfp
where not missing(region_p)
)t7
on p.sex=t7.sex and p.region=t7.region;
quit;
I have not gone through your code in any detail, but usually this is not necessary. Usually, if you want the regression coefficients in the same data set as the data, you can get this without macros and without all this SQL, but what is the purpose of doing this? Is it to use the coefficients and the data to create predictions? Then all of this has already been built into SAS so you don't have to program it yourself.
So, what is your goal in doing this?
Thank you for your reply. The regression parameters are estimated from another dataset. I want to use them to impute variables to this new dataset.
So, again it sounds to me like no macro and no complicated SQL is needed here. But I still have questions.
When you say the parameters are estimated from another data set, are they estimated by PROC REG or PROC GLM or some other SAS PROC? Which one?
I want to use them to impute variables to this new dataset.
Does this mean you want to use the regression equation to predict values?
Yes, I want to use the regression parameters to predict a value.
The parameters are estimated with proc logistic, but the imputation need to be done independently of the regression. Eventually, the imputation will be done by people who don't have access to the dataset used for the regression.
@Demographer wrote:
Eventually, the imputation will be done by people who don't have access to the dataset used for the regression.
In SAS, or elsewhere? How does your original code relate to the "eventually ..."? Is your question about how to do this today, or how to do this eventually when people don't have access to the data set?
Sorry, maybe it's confusing. I'll say in other words.
The regression parameters are estimated from a dataset (pop1) that only me can access.
I want to prepare a code showing to my students how imputing values in the dataset "pop2" from those regression parameters. My students won't have access to pop1, but have access to pop2 and to regression parameters (param.lfp).
It still bothers me that you keep saying "impute" and I am about to show the method to make predictions. So, are we really talking about the same thing?
Here is an example of PROC LOGISTIC, creating a model and storing the regression using the STORE statement. This is followed by a test data set being created and then PROC PLM uses the test data set and the STORE data set, but not the original data, to get predictions.
For this to work, you have to have access to SAS/STAT.
Thanks. Is the "sasuser.SimuModel" can somehow be exported in csv or some sas file, then imported in a new SAS session?
@Demographer wrote:
Thanks. Is the "sasuser.SimuModel" can somehow be exported in csv or some sas file, then imported in a new SAS session?
It is a SAS Data set that you can create and use later, with the proper LIBNAME statement. You can name it anything you want like MYFILES.LOGISTIC_REGRESSION
You're showing us how you're currently doing the calculation without both datasets. What would be better is if you'd show us the input data sets and expected data sets and we could help you build a process that automates this. From the code a data step or SQL would work just as well but I can't help but wonder if VVALUEX() or an array may not make things simpler.
@Demographer wrote:
Sorry, maybe it's confusing. I'll say in other words.
The regression parameters are estimated from a dataset (pop1) that only me can access.
I want to prepare a code showing to my students how imputing values in the dataset "pop2" from those regression parameters. My students won't have access to pop1, but have access to pop2 and to regression parameters (param.lfp).
Have you investigated the CODE statement in proc logistic ?
Many thanks for all this. But I'm not sure if it works with what I have in mind.
I want to predict with random experiments and with different scenarios. For instance a scenario in which the value for low educated people is the same as for high educated ones. So I need to be able to change manually the parameters (which is why I used a csv file). The sas item exported with store the statement doesn't allow to change parameters afterward (or if so, it's not very obvious).
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.