You talk about "values" but you seem to be implying it is the variable names that is of concern. Could you please clarify?
If it is variable names, PROC CONTENTS ought to be able to show you what the variable names and then you can compare across data sets.
I'm afraid you can't avoid hard work.
First, check which variables from main data seem to fit which variables from any other data.
Suppose you have common ID and varM of main data seem to fit var2 from data2 and Var3 from data3, then I think that next code can help you decide are the values the same or not:
/* assumed all data sorted by ID */
data to_check;
marge dataM (keep=ID varM)
data2 (keep=ID var2)
data3 (keep=ID var3)
;
by ID;
flag2 = (varM = var2); /* flag is 1 for true, 0 for false */
flag3 = (varM = var3);
run;
proc freq data=to_check;
table flag2 flag3 / nopercent;
run;
1) Avoid merge many to many by extending ID into several variables like -
ID_number visit_date department (or a like) - until you get unique key.
2) While proc compare is limited to compare two datasets in a time, you can compare
all datasets with all fitting couples of variables using your own MERGE code,
thus creating as many as needed flags within one run
3) Proc compare creates a long output while proc freq of the flags will show exact amount
of equal and non equal values per couple
4) Use KEEP to keep only those variables needed to be checked and
use RENAME to enable comparing and avoiding value overriding in case of
same variable names in two or more datasets.
Are these data sets supposed to have the data in the same column order, which might happen when read from an external source with the same order?
Sounds like you need solve this problem bit by bit instead of trying to attack all in one fell swoop.
Start by explaining the issue more clearly. Use concrete examples with just a few variables and a few observations. You can make up the data from scratch or select a few representative examples from your real data.
Some questions to help figure out what you issue really is.
Do you not know how creatively they changed the names of the variables and are looking for ways to check which variable names might refer to the same type of information? For example does AGE in DATA1 contain the same type of information as AGE_AT_ONSET in DATA2?
Use PROC CONTENTS (or query DICTIONARY.COLUMNS or SASHELP.VCOLUMN) to get a list of all of the variables and check for similar variable names and types.
Do you actually know the variable names in the different datasets and are looking to compare the values? For example are you asking how to compare values of AGE in DATA1 with values of AGE_AT_ONSET in DATA2? Do you know how to match the observations between the datasets?
Use PROC COMPARE.
proc compare data=data1 compare=data2;
id id visit ;
var age;
with age_at_onset;
run;
@Emma8 wrote:
I have more than 2 datasets. How can do proc compare?
You have to write each proc compare step, because - as you have said earlier - the variable-names are different. Creating lookup-dataset would open the possibility to generate the code, but i would waste time for that only, if i had to do such things regularly.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.