BookmarkSubscribeRSS Feed
9 REPLIES 9
PaigeMiller
Diamond | Level 26

You talk about "values" but you seem to be implying it is the variable names that is of concern. Could you please clarify?

 

If it is variable names, PROC CONTENTS ought to be able to show you what the variable names and then you can compare across data sets.

--
Paige Miller
Emma8
Quartz | Level 8
 
Shmuel
Garnet | Level 18

I'm afraid you can't avoid hard work.

First, check which variables from main data seem to fit which variables from any other data.

Suppose you have common ID and varM of main data seem to fit var2 from data2 and Var3 from data3, then I think that next code can help you decide are the values the same or not:

/* assumed all data sorted by ID */
data to_check;
   marge dataM (keep=ID varM)
              data2  (keep=ID var2)
              data3  (keep=ID var3)
     ;
   by ID;
        flag2 = (varM = var2);  /* flag is 1 for true, 0 for false */
        flag3 = (varM = var3);
run;
proc freq data=to_check;
      table flag2 flag3 / nopercent;
run;

 

Emma8
Quartz | Level 8
 
Shmuel
Garnet | Level 18

1) Avoid merge many to many by extending ID into several variables like -

    ID_number  visit_date department (or a like) - until you get unique key.

2) While proc compare is limited to compare two datasets in a time, you can compare

    all datasets with all fitting couples of variables using your own MERGE code,

     thus creating as many as needed flags within one run

3) Proc compare creates a long output while proc freq of the flags will show exact amount

    of equal and non equal values per couple

4) Use KEEP to keep only those variables needed to be checked and 

    use RENAME to enable comparing and avoiding value overriding in case of 

    same variable names in two or more datasets.

ballardw
Super User

Are these data sets supposed to have the data in the same column order, which might happen when read from an external source with the same order?

Tom
Super User Tom
Super User

Sounds like you need solve this problem bit by bit instead of trying to attack all in one fell swoop.

Start by explaining the issue more clearly.  Use concrete examples with just a few variables and a few observations.  You can make up the data from scratch or select a few representative examples from your real data. 

 

Some questions to help figure out what you issue really is.

Do you not know how creatively they changed the names of the variables and are looking for ways to check which variable names might refer to the same type of information?  For example does AGE in DATA1 contain the same type of information as AGE_AT_ONSET in DATA2?

Use PROC CONTENTS (or query DICTIONARY.COLUMNS or SASHELP.VCOLUMN) to get a list of all of the variables and check for similar variable names and types.

 

Do you actually know the variable names in the different datasets and are looking to compare the values? For example are you asking how to compare values of AGE in DATA1 with values of AGE_AT_ONSET in DATA2?  Do you know how to match the observations between the datasets?
Use PROC COMPARE.

proc compare data=data1 compare=data2;
  id id visit ;
  var age;
  with age_at_onset;
run; 

 

Emma8
Quartz | Level 8
 
andreas_lds
Jade | Level 19

@Emma8 wrote:
I have more than 2 datasets. How can do proc compare?

You have to write each proc compare step, because - as you have said earlier - the variable-names are different. Creating lookup-dataset would open the possibility to generate the code, but i would waste time for that only, if i had to do such things regularly.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1072 views
  • 3 likes
  • 6 in conversation