BookmarkSubscribeRSS Feed
beleeve
Calcite | Level 5

I'm trying to run bivar logistic regression on 2 variables but from 2 datasets that have the same var names just from different timepoints. I've listed the code I have so far but essentially, var 2 needs to be from the first dataset and var 1 and 3 need to be from a combination of both datasets. I've tried renaming the variable that needs data from only one data set but when I add the other set, there are fewer values for some reason.

 

data work.asdf;
set 'dataset1'(rename=(var2=var2a));

set 'dataset2';

...

 

This is what I was trying to run:

proc sort;
by var1;

proc freq;
by var1;
table var2a*var3 / expected cellchi2 chisq;

run;

 

 

2 REPLIES 2
Quentin
Super User

Hi,

 

It's hard to understand your goal.  Can you please show five records of dataset1, five records of dataset2, and what you want to create when you combine them into work.asdf?

 

If you are combining variables from dataset1 and dataset2, typically you would do that with a MERGE statement.

 

If you are combining rows from dataset1 and dataset2, typically you would do that with a single SET statement which lists both datasets.

 

Your current code, with two SET statements, is almost certainly not doing what you want.  But I'm not sure what you're trying to do, so not sure how to help.

BASUG is hosting free webinars Next up: Mike Raithel presenting on validating data files on Wednesday July 17. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
ballardw
Super User

You might also indicate exactly how you need to use the variable with the same name.

You can add an variable than indicates which data set a specific record comes from such as

data work.combined;
   set dataset1  (in=in1)
         dataset2  (in=in2)
   ;
   if in1 then Source='Dataset1';
   else if in2 then Source='Dataset2';
run;

You could then use the Source variable in analysis to differentiate between the original set such as

 

proc freq data=combined;
   tables source*var2;
run;

Note that multiple SET statements, while allowed by syntax, will get you into some pretty complex behaviors. I suspect that you only have as many records from the larger data set as appeared in the smaller one, especially if dataset1 is the smaller.

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 380 views
  • 0 likes
  • 3 in conversation