BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
beleeve
Calcite | Level 5

I uploaded 2 datasets from different timepoints but they use the same variable names. Is there a way I can perform proc freq on a variable from dataset 1 and the variable from dataset 2 specifically even though the variables use the same names in both?  

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@beleeve wrote:

Sorry I forgot to mention that the other variable is from only one of the data sets. Right now I have something like this where var 1 and var2 is from both datasets and var 3 is only from data1:

data work.abc;
set 'data1.sas7bdat';
set 'data2.sas7bdat';

proc sort;
by var1;

proc freq;
by var1;
table var2*var3/ expected cellchi2 chisq;
run;

 

 


It is past time that you learned to use libraries, such as WORK instead of file literal names like  'data1.sas7bdat' That just adds to complexity of code.

Second, TWO SET STATEMENTS behave differently than you expect, which is why you lose records.

If you want to do something like that Chisq how do you know that the var2 and var3 are the right combination of observations from your source sets? For chi-square to be meaningful the values have to be matched some way and your double set statements are almost certainly not doing anything sensible.

 

SHOW example data sets as data step code. Or output proc print of the relevant variables from each set to the LISTING  destination and paste the result into a TEXT opened on the forum with the </> icon above the message window.

 

View solution in original post

5 REPLIES 5
SASKiwi
PROC Star
proc freq data  = Mydata1;
  table MyVar;
run;

proc freq data  = Mydata2;
  table MyVar;
run;
ballardw
Super User

Another approach is to combine the two sets into a single data set and add a variable that indicates the name of the data set that contributes which records. Then use that added variable to indicate the source in analysis.

 

Something like:

data combined;
   set data1 data2 indsname=source;
   table_name=source;
run;

proc freq data=combined;
    tables  table_name* (var1 var2);
run;

Combining the data will require that all variables with the same name be of the same type and best to have the same length.

beleeve
Calcite | Level 5

Sorry I forgot to mention that the other variable is from only one of the data sets. Right now I have something like this where var 1 and var2 is from both datasets and var 3 is only from data1:

data work.abc;
set 'data1.sas7bdat';
set 'data2.sas7bdat';

proc sort;
by var1;

proc freq;
by var1;
table var2*var3/ expected cellchi2 chisq;
run;

 

 

ballardw
Super User

@beleeve wrote:

Sorry I forgot to mention that the other variable is from only one of the data sets. Right now I have something like this where var 1 and var2 is from both datasets and var 3 is only from data1:

data work.abc;
set 'data1.sas7bdat';
set 'data2.sas7bdat';

proc sort;
by var1;

proc freq;
by var1;
table var2*var3/ expected cellchi2 chisq;
run;

 

 


It is past time that you learned to use libraries, such as WORK instead of file literal names like  'data1.sas7bdat' That just adds to complexity of code.

Second, TWO SET STATEMENTS behave differently than you expect, which is why you lose records.

If you want to do something like that Chisq how do you know that the var2 and var3 are the right combination of observations from your source sets? For chi-square to be meaningful the values have to be matched some way and your double set statements are almost certainly not doing anything sensible.

 

SHOW example data sets as data step code. Or output proc print of the relevant variables from each set to the LISTING  destination and paste the result into a TEXT opened on the forum with the </> icon above the message window.

 

Kurt_Bremser
Super User

To minimize disk space consumption, I would use a view in @ballardw 's code:

data combined / view=combined;
   set data1 data2 indsname=source;
   table_name=source;
run;

Which means that the append is done dynamically when the view is used in PROC FREQ later.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 811 views
  • 1 like
  • 4 in conversation