DATA Step, Macro, Functions and more

How to systematically unify variable attributes across multiple dataset

Reply
Contributor
Posts: 36

How to systematically unify variable attributes across multiple dataset

I've come across this task that requires me to concatenate patient information from 30 different sites/30 different tables, each table has 20 columns, I've rearranged them so that all the column orders and names are the same across 30 dataset.

 

However, each table was submitted by different sites, so there were lots of messy formats going on. i.e., patient_Ids are charter values in some sites' data (they were unique identifier, but can be anything, like UDWGER15581DE2) but are numeric values in others (1,2,3....). Same case for other variables, like drug_ID. For the same reason, this also means the length of the values, if there were character, were different, some Patient_ID are $15, some are $49.....

 

To sum up, I was wondering if there is any easy way to systematically unify them? Instead of changing 20*30=600 variables by hand?

 

Any suggestion will be greatly appreciated.

Thank you

Super User
Posts: 13,283

Re: How to systematically unify variable attributes across multiple dataset

Posted in reply to LisaYIN9309

If by this:  I've rearranged them so that all the column orders and names are the same across 30 dataset. You mean that you have arranged some SAS datasets with the variables in a common order I would be very tempted to:

 

1) Proc Export all of the sets to CSV files, may want to use the option Putnames=no so it is easier in the next step

2) combine all the csv files into a single one

3) Proc import the large csv file with a large value for guessing rows.

4) either modify the data step generated by proc import to use your desired variable names and formats or use proc datasets to do so.

 

 

Super User
Posts: 23,224

Re: How to systematically unify variable attributes across multiple dataset

Posted in reply to LisaYIN9309

You'd never do it by hand, I like @ballardw solution the best, but if you want to keep it in SAS you should set up a master data set that has the formats you want and then use that to check variables one by one and convert. You'd design it for one data set and then automate but it's a pain. The writing it to a CSV file and then importing is a good idea. 

 

 

Esteemed Advisor
Posts: 5,474

Re: How to systematically unify variable attributes across multiple dataset

Posted in reply to LisaYIN9309

If the problem is that some columns sometimes come up as numeric but you would like them to be character, you can convert the problematic columns with something like:

 

length tmp1 $nn;

tmp1 = cats(myProblematicColumn);

drop myProblematicColumn;

rename tmp1=myProblematicColumn;

 

This doesn't affect the column if it is already character.

PG
PROC Star
Posts: 249

Re: How to systematically unify variable attributes across multiple dataset

Posted in reply to LisaYIN9309

@LisaYIN9309:

If you can define an output format where all variables are character, this method may work: use the VVALUEX function.

Given a couple of tables with different variable types, but same variable names:

data a;
  length x 8 y $30 z $14;
  input x y z;
cards;
1 2 3
5 fdsfgh 2
3 c 31254
;run;

data b;
  length x $3 y 8 z $4;
  input x y z;
cards;
1rr 2 3
25 2 gg
33 31254 f
;run;

You then define a model table where all variables are character, and long enough to hold all values, e.g.:

data want;
  length x $8 y $30 z $20;
  stop;
run;

You can then submit the same datastep for each input to add your data to the model dataset - you may want to put that in a macro:

%macro add_table(table);
data want;
  if 0 then modify want;
  array out _character_;
  set &table(rename=(x=_1 y=_2 z=_3));
  do _N_=1 to dim(out);
    out(_N_)=left(vvaluex(cats('_',_N_)));
    end;
  output;
run;
%mend;

Just make sure that the variables in the RENAME option do not collide with the real variable names.

Now you need just one line of code for each input table:

%add_table(a);
%add_table(b);
Ask a Question
Discussion stats
  • 4 replies
  • 117 views
  • 0 likes
  • 5 in conversation