What the CODE statement produces is SAS program code, i.e. a text file. It can be edited with a simple text editor.
@Demographer wrote:
Many thanks for all this. But I'm not sure if it works with what I have in mind.
I want to predict with random experiments and with different scenarios. For instance a scenario in which the value for low educated people is the same as for high educated ones. So I need to be able to change manually the parameters (which is why I used a csv file). The sas item exported with store the statement doesn't allow to change parameters afterward (or if so, it's not very obvious).
Are you manually changing the parameter estimates or using a different set of parameter estimates in different circumstances?
@Demographer wrote:
Many thanks for all this. But I'm not sure if it works with what I have in mind.
I want to predict with random experiments and with different scenarios. For instance a scenario in which the value for low educated people is the same as for high educated ones. So I need to be able to change manually the parameters (which is why I used a csv file). The sas item exported with store the statement doesn't allow to change parameters afterward (or if so, it's not very obvious).
Hello, this is new information. We have gone down a path, and then in the 15th or so message, we get this new information. This is not a good way to get useful answers to your questions.
Please look at the CODE statement in PROC LOGISTIC, as suggested by others, and see if that meets your needs.
Thank you all, but none of those options really fit what I need. I just want to make the SQL code shorter with a macro if possible.
The code produced by the CODE statement is indeed much longer than my SQL code and much harder to understand and explain to SAS beginners. My objective is not to have the best method possible from a statistical point of view, but rather have the easiest to understand for master students in sociology.
I'll do the prediction in a data step, such as:
data work.pop_lfp2;
set work.pop_lfp1;
labour=0;
if 15<=agegr<74 then do;
exp_lab = exp(intercept + agegr_p + edu_p + agegr_edu_p + region_p);
prob_lab = exp_lab/(1+exp_lab);
if rand('uniform')<prob_lab then labour=1;
end;
drop intercept agegr_p edu_p agegr_edu_p region_p exp_lab prob_lab;
run;
My objective is not to have the best method possible from a statistical point of view, but rather have the easiest to understand for master students in sociology.
Should that say, based on an earlier comment of yours: "My objective is not to have the best method possible from a statistical point of view, but rather have the easiest to understand for master students in sociology and allow them to change the regression coefficients manually"?
If so, I consider this a very dangerous and suspicious thing to do. Just because you can do it (with the proper code) does not mean you should do it.
Then, this part is not correct
exp_lab = exp(intercept + agegr_p + edu_p + agegr_edu_p + region_p);
the proper use of a regression equation from a logistic regression is
exp( intercept + x1*beta1 + x2*beta2 + ... )
where x1,x2,... are the data set variables, and beta1, beta2, etc. are the regression coefficients.
However, since the CODE statement doesn't fit your needs and the STORE command doesn't fit your needs, then the only other thing I can think of here is to use PROC SCORE.
The equation is fine. All variables are categorical. So when I do the merge with proc SQL, individuals get only parameters that correspond to their characteristics (+the intercept).
@Demographer wrote:
The equation is fine. All variables are categorical. So when I do the merge with proc SQL, individuals get only parameters that correspond to their characteristics (+the intercept).
I disagree.
Variables are categorical and for each of them, the parameter merged is the one that corresponds to the individual's characteristic.For instance, someone with a low level of education will get the parameter corresponding for low education. Xs are therefore dummies (=1). I don't see how omitting them from the equation can change something.
The values of dummy variables for categorical variables are either 0 or 1.
Yes, but only parameters with x=1 are merged to individuals. When x=0, the corresponding parameter is not merged. That's the whole point of the proc SQL I want to reduce.
In that case, I don't understand your math or your SQL.
I point out SAS has already done the work to take a model generated on one data set and apply it and get predictions from another data set, and they have also done the work to validate the results. There are at least three methods that I know of. If none of them are what you want, then I suggest there is no simple method of doing this, and you are going to have to reproduce these calculations in a way that works for you, and then you will have to validate the results.
I personally have little interest in creating a new method of doing this, and so I wish you well.
I really appreciate learning about those three methods that I didn't know.
I did already multiple validations of the imputation with the code and it works well. My question is only about how creating a macro that reduces the PROC SQL code quoted in the first post.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.