BookmarkSubscribeRSS Feed
Demographer
Pyrite | Level 9

Hi,

I have this code that merges regression parameters (param.lfp) to an micro dataset (pop_lfp1). The regression file is organised in a way that each variable has its own column with their specific categories on different lines and parameters corresponding to these categories being on another column (see attached file). The point of the merge is to have for each individuals of pop_lfp1 the regression parameters that correspond to their own characteristics. Since the code is repeating for each variable (intercept, edu, agegr, etc.), I'd like to have a macro that makes it shorter, in which I could just declare variables to merge in parameters. 

 

	proc sql;
	  create table pop_lfp1 as
	    select 
	      p.*,
	      t1.intercept, t2.edu_p, t3.agegr_p, t4.agegr_edu_p,
	      t5.young_kid_p, t6.young_kid_edu_p, t7.region_p
	     
	    from 
	      pop2 p
	
	      left join
	      ( select sex, intercept
	        from param.lfp
	        where not missing(intercept)
	      ) t1
	      on p.sex=t1.sex
	      
	      left join
	      ( select sex, edu, edu_p
	        from param.lfp
	        where not missing(edu_p)
	      )t2
	      on p.sex=t2.sex and p.edu=t2.edu
	      
	      left join
	      ( select sex, agegr, agegr_p
	        from param.lfp
	        where not missing(agegr_p)
	      )t3
	      on p.sex=t3.sex and p.agegr=t3.agegr
	      
	      left join
	      ( select sex, agegr, edu, agegr_edu_p
	        from param.lfp
	        where not missing(agegr_edu_p)
	      )t4
	      on p.sex=t4.sex and p.agegr=t4.agegr and p.edu=t4.edu
	
	      left join
	      ( select sex, young_kid, young_kid_p
	        from param.lfp
	        where not missing(young_kid_p)
	      )t5
	      on p.sex=t5.sex and p.young_kid=t5.young_kid
	
		  left join
	      ( select sex, edu, young_kid, young_kid_edu_p
	        from param.lfp
	        where not missing(young_kid_edu_p)
	      )t6
	      on p.sex=t6.sex and p.edu=t6.edu and p.young_kid=t6.young_kid
	     
	     left join
	      ( select sex, region, region_p
	        from param.lfp
	        where not missing(region_p)
	      )t7
	      on p.sex=t7.sex and p.region=t7.region;
	quit;
26 REPLIES 26
PaigeMiller
Diamond | Level 26

I have not gone through your code in any detail, but usually this is not necessary. Usually, if you want the regression coefficients in the same data set as the data, you can get this without macros and without all this SQL, but what is the purpose of doing this? Is it to use the coefficients and the data to create predictions? Then all of this has already been built into SAS so you don't have to program it yourself.

 

So, what is your goal in doing this?

--
Paige Miller
Demographer
Pyrite | Level 9

Thank you for your reply. The regression parameters are estimated from another dataset. I want to use them to impute variables to this new dataset.

PaigeMiller
Diamond | Level 26

So, again it sounds to me like no macro and no complicated SQL is needed here. But I still have questions.

 

When you say the parameters are estimated from another data set, are they estimated by PROC REG or PROC GLM or some other SAS PROC? Which one?

 

I want to use them to impute variables to this new dataset.

Does this mean you want to use the regression equation to predict values?

--
Paige Miller
Demographer
Pyrite | Level 9

Yes, I want to use the regression parameters to predict a value. 

The parameters are estimated with proc logistic, but the imputation need to be done independently of the regression. Eventually, the imputation will be done by people who don't have access to the dataset used for the regression.

PaigeMiller
Diamond | Level 26

@Demographer wrote:

Eventually, the imputation will be done by people who don't have access to the dataset used for the regression.


In SAS, or elsewhere? How does your original code relate to the "eventually ..."? Is your question about how to do this today, or how to do this eventually when people don't have access to the data set?

--
Paige Miller
Demographer
Pyrite | Level 9

Sorry, maybe it's confusing. I'll say in other words.

 

The regression parameters are estimated from a dataset (pop1) that only me can access.

I want to prepare a code showing to my students how imputing values in the dataset "pop2" from those regression parameters. My students won't have access to pop1, but have access to pop2 and to regression parameters (param.lfp).

PaigeMiller
Diamond | Level 26

It still bothers me that you keep saying "impute" and I am about to show the method to make predictions. So, are we really talking about the same thing?

 

Here is an example of PROC LOGISTIC, creating a model and storing the regression using the STORE statement. This is followed by a test data set being created and then PROC PLM uses the test data set and the STORE data set, but not the original data, to get predictions.

 

For this to work, you have to have access to SAS/STAT.

--
Paige Miller
Demographer
Pyrite | Level 9

Thanks. Is the "sasuser.SimuModel" can somehow be exported in csv or some sas file, then imported in a new SAS session? 

Reeza
Super User
This presentation walks through how to score models. If that's your ultimate goal, then Paige Miller (once again) is correct in suggesting that doing a manual calculation isn't a good method to teach these days as there are easier methods.

https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2019/3337-2019.pdf
PaigeMiller
Diamond | Level 26

@Demographer wrote:

Thanks. Is the "sasuser.SimuModel" can somehow be exported in csv or some sas file, then imported in a new SAS session? 


It is a SAS Data set that you can create and use later, with the proper LIBNAME statement. You can name it anything you want like MYFILES.LOGISTIC_REGRESSION

--
Paige Miller
Reeza
Super User

You're showing us how you're currently doing the calculation without both datasets. What would be better is if you'd show us the input data sets and expected data sets and we could help you build a process that automates this. From the code a data step or SQL would work just as well but I can't help but wonder if VVALUEX() or an array may not make things simpler.

 


@Demographer wrote:

Sorry, maybe it's confusing. I'll say in other words.

 

The regression parameters are estimated from a dataset (pop1) that only me can access.

I want to prepare a code showing to my students how imputing values in the dataset "pop2" from those regression parameters. My students won't have access to pop1, but have access to pop2 and to regression parameters (param.lfp).


 

 

PGStats
Opal | Level 21

Have you investigated the CODE statement in proc logistic ?

PG
Demographer
Pyrite | Level 9

Many thanks for all this. But I'm not sure if it works with what I have in mind.

 

I want to predict with random experiments and with different scenarios. For instance a scenario in which the value for low educated people is the same as for high educated ones. So I need to be able to change manually the parameters (which is why I used a csv file). The sas item exported with store the statement doesn't allow to change parameters afterward (or if so, it's not very obvious).

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 26 replies
  • 3052 views
  • 2 likes
  • 4 in conversation