When working with multiple datasets, there may be a need to compare attributes of common variables. This is generally important to confirm attributes are consistent across common variables in multiple datasets. For example, a difference in length can cause data truncation when merging datasets or worse a difference in type can cause programming errors. Smarter programmers are proactive to identify all common variables and then confirm all attributes, label, length, type and format are consistent.
In this tip, I show how easy it is to compare common variable attributes using dictionary tables. In the SAS example below, I create two temporary datasets, CLASS and CLASS2. CLASS2 dataset is exactly as CLASS dataset except that I converted AGE from numeric to character and also renamed AGE to AGE_NUM. The objective is to programmatically identify AGE being numeric in one dataset and character in another dataset. This informs us to take action to make AGE type consistent.
* create working class dataset; data class; set sashelp.class; run; * create class2 with age char; data class2 (rename = (age2 = age)); length age2 $2.; set sashelp.class; age2 = left(put(age, best.)); rename age = age_num; run;
Proc SQL data dictionary table, SASHELP.VCOLUMN, is used with WORK libname to review CLASS and CLASS2 variable attributes. Upcase(NAME) is used since NAME can be case sensitive. Name variable is the key variable for CLASS. GROUP BY is important to GROUP BY key variable and HAVING COUNT(NAME) > 1 identifies multiple records for the same variable name or the common variables.
proc sql; create table comvar as select upcase(name) as name2, memname, type, length, label, format from sashelp.vcolumn where libname='WORK' group by upcase(name) having count(name) > 1; quit;
Next, the dataset COMVAR is sorted by variable name, dataset and type. This is important to sort order the records for comparison.
proc sort data=comvar; by name2 memname type; run;
The next data step identifies differences in variable type. It does this by retaining two variables NAMEHLD and TYPEHLD. When the retained variable name is the same but the retained type is not the same, then we keep that record. These are the common variables with different types.
data comtype; retain namehld typehld; set comvar; by name2 memname type; * for each name retain name and type; if first.name2 then do; namehld = name2; typehld = type; end; * output if name is same and type is different; if name2 = namehld and type ^= typehld; run;
Next, we merge this dataset back to display all variables for these different type records. We do this with Proc SQL right join. This gives us the big picture from both datasets so that we can then take the next step to convert variable type.
proc sql; create table comtype2 as select upcase(name) as name2, memname, type, length, label, format from sashelp.vcolumn as a right join ( select unique name2 as b_name from comtype) as b on upcase(a.name)=b.b_name and libname='WORK' group by upcase(a.name); quit; proc print data=comtype2; run;
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.