Help using Base SAS procedures

Program to assist in identifying the same data collected in different datasets with inconsistent variable names

Reply
New Contributor
Posts: 4

Program to assist in identifying the same data collected in different datasets with inconsistent variable names

I need to run the same program over many datasets which are similar but do not always use exactly the same variable names to collect the same data.

e.g. Age might be collected in all datasets but has a range of variable names: AGE, AGE1, AGE_NUM, CALCAG, AGNUM

Is there a program I can run to assist me in identifying variables collecting the same data and applying a generic variable name?

Interestingly, the label names are relatively consistent so maybe I can use these...?

Thanks,

Stew

PS. Clearly, I am a SAS novice!

PROC Star
Posts: 7,363

Program to assist in identifying the same data collected in different datasets with inconsistent variable names

One possibility would depend on the names of the other variables in your various datasets.  In your example, all of the age-like variables contained the string "AG".  If that is the case, and none of your other variables contain the string "AG" then you could write a small proc sql call that created a rename statement in a macro variable from the dictionary.columns view.

Let us know if the above scenario might be a solution and, if it is, then I or someone can show you how you could write such a routine.

New Contributor
Posts: 4

Program to assist in identifying the same data collected in different datasets with inconsistent variable names

Thanks,

I think I can definitely use something like this to rename some/most of the variables (as in the AGE example) and some guidance on the code would be very useful!

However, there are other variable names where there may not be a unique string. I guess for these, I'll just have to manually identify and rename the variables on a case-by-case basis...

Cheers,

Stew

PROC Star
Posts: 7,363

Program to assist in identifying the same data collected in different datasets with inconsistent variable names

Stew,

You could use something like:

data one;

  input name $ theage;

  cards;

John 25

;

data two;

  input name $ myage;

  cards;

Joe 25

;

 

data three;

  input name $ myage;

  cards;

Mary 32

;

%macro rename(filename);

  %let renames=;

  proc sql noprint;

    select 'rename=('||trim(name)||'=age)'

      into :renames

        from dictionary.columns

          where libname='WORK' and

            memname=upcase("&filename.") and

              index(upcase(name),'AG')

    ;

  quit;

 

  data new_&filename.;

    set &filename. (&renames.);

  run;

%mend rename;

%rename(one)

%rename(two)

%rename(three)

New Contributor
Posts: 4

Program to assist in identifying the same data collected in different datasets with inconsistent variable names

Thanks! I will implement this today and see how many variables I can rename.

Super User
Posts: 10,526

Program to assist in identifying the same data collected in different datasets with inconsistent variable names

Are the similar values in the same order in the datasets that you have examined? If so, there may be some hope of using one base dataset and renaming others based on order.

New Contributor
Posts: 4

Program to assist in identifying the same data collected in different datasets with inconsistent variable names

Hi there. Unfortunately, the variable order is not always the same.

Ask a Question
Discussion stats
  • 6 replies
  • 142 views
  • 0 likes
  • 3 in conversation