Solved: Re: How to use variables with same name but different datasets

beleeve · Posted 02-22-2023 03:51 PM

I uploaded 2 datasets from different timepoints but they use the same variable names. Is there a way I can perform proc freq on a variable from dataset 1 and the variable from dataset 2 specifically even though the variables use the same names in both?

ballardw · Posted 03-11-2023 07:13 AM

@beleeve wrote:

Sorry I forgot to mention that the other variable is from only one of the data sets. Right now I have something like this where var 1 and var2 is from both datasets and var 3 is only from data1:

data work.abc;
set 'data1.sas7bdat';
set 'data2.sas7bdat';

proc sort;
by var1;

proc freq;
by var1;
table var2*var3/ expected cellchi2 chisq;
run;

It is past time that you learned to use libraries, such as WORK instead of file literal names like 'data1.sas7bdat' That just adds to complexity of code.

Second, TWO SET STATEMENTS behave differently than you expect, which is why you lose records.

If you want to do something like that Chisq how do you know that the var2 and var3 are the right combination of observations from your source sets? For chi-square to be meaningful the values have to be matched some way and your double set statements are almost certainly not doing anything sensible.

SHOW example data sets as data step code. Or output proc print of the relevant variables from each set to the LISTING destination and paste the result into a TEXT opened on the forum with the </> icon above the message window.

View solution in original post

SASKiwi · Posted 02-22-2023 04:08 PM

proc freq data  = Mydata1;
  table MyVar;
run;

proc freq data  = Mydata2;
  table MyVar;
run;

ballardw · Posted 02-22-2023 06:44 PM

Another approach is to combine the two sets into a single data set and add a variable that indicates the name of the data set that contributes which records. Then use that added variable to indicate the source in analysis.

Something like:

data combined;
   set data1 data2 indsname=source;
   table_name=source;
run;

proc freq data=combined;
    tables  table_name* (var1 var2);
run;

Combining the data will require that all variables with the same name be of the same type and best to have the same length.

beleeve · Posted 03-10-2023 01:53 PM

Sorry I forgot to mention that the other variable is from only one of the data sets. Right now I have something like this where var 1 and var2 is from both datasets and var 3 is only from data1:

data work.abc;
set 'data1.sas7bdat';
set 'data2.sas7bdat';

proc sort;
by var1;

proc freq;
by var1;
table var2*var3/ expected cellchi2 chisq;
run;

ballardw · Posted 03-11-2023 07:13 AM

@beleeve wrote:

Sorry I forgot to mention that the other variable is from only one of the data sets. Right now I have something like this where var 1 and var2 is from both datasets and var 3 is only from data1:

data work.abc;
set 'data1.sas7bdat';
set 'data2.sas7bdat';

proc sort;
by var1;

proc freq;
by var1;
table var2*var3/ expected cellchi2 chisq;
run;

It is past time that you learned to use libraries, such as WORK instead of file literal names like 'data1.sas7bdat' That just adds to complexity of code.

Second, TWO SET STATEMENTS behave differently than you expect, which is why you lose records.

If you want to do something like that Chisq how do you know that the var2 and var3 are the right combination of observations from your source sets? For chi-square to be meaningful the values have to be matched some way and your double set statements are almost certainly not doing anything sensible.

SHOW example data sets as data step code. Or output proc print of the relevant variables from each set to the LISTING destination and paste the result into a TEXT opened on the forum with the </> icon above the message window.

Kurt_Bremser · Posted 02-23-2023 02:32 AM

To minimize disk space consumption, I would use a view in @ballardw 's code:

data combined / view=combined;
   set data1 data2 indsname=source;
   table_name=source;
run;

Which means that the append is done dynamically when the view is used in PROC FREQ later.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

How to use variables with same name but different datasets

Re: How to use variables with same name but different datasets

Re: How to use variables with same name but different datasets

Re: How to use variables with same name but different datasets

Re: How to use variables with same name but different datasets

Re: How to use variables with same name but different datasets

Re: How to use variables with same name but different datasets

Ready to join fellow brilliant minds for the SAS Hackathon?