BookmarkSubscribeRSS Feed
JonasW
SAS Employee

Proc append is a great SAS function when you need to combine two datasets, and you do not want to write many lines of code. But what happens if some of the columns in the datasets you want to combine share the same name, but have different datatypes? This could be the case when you have one system where the column CustomerID was declared as numeric, then you need to append this data source to another data source where CustomerID is declared as character.

 

For code to run in production you would probably need to do data conversion in your program, but what if you are exploring data, or building a one-time job where speed is more important than “accurate” code. Then this macro might come in handy. It lets you append two data sources where matching is made on column name, and if any column in the data source to append have a different data type than in the target dataset, the column is converted to the datatype of target dataset.

 

I have used this macro from time to time, and just a couple of months ago I noticed that missing numeric values where not converted correct, so I added that that functionality to the macro. Please feel free to use the macro or add your own features to extend the functionality of the macro.

*******************************************************************************************************************;
* Program: append_diff_data_types                                                                                 *;
* Created 2018-11-23 by Jonas Wetterberg, SAS Institute                                                           *;
* Uses proc append to add two data sources, but first converts all columns with missmatching data types to match  *;
* the data type of in the target table.                                                                           *;
*******************************************************************************************************************;
%macro append_diff_data_types(base_table=, data_table=, base_lib=work, data_lib=work, drop_temp_cols=TRUE, proc_append_warn=nowarn);

	***************************************************************************************************************;
	* Make sure all data definitions are located within SAS before comparing data types                           *;
	***************************************************************************************************************;

	data target_data_types;
		set &base_lib..&base_table(obs=1);		
	run;

	data source_data_types;
		set &data_lib..&data_table(obs=1);
	run;

	proc sql noprint;
		create table target_cols as 
		select upcase(name) as name, type as type_target
		from dictionary.columns
		where libname = "WORK" 
		and   memname = "TARGET_DATA_TYPES"
		order by upcase(name);

		create table source_cols as 
		select upcase(name) as name, type as type_source
		from dictionary.columns
		where libname = "WORK"
		and memname   = "SOURCE_DATA_TYPES"
		order by upcase(name);
	quit;

	******************************************************************************************************************;
	* Merge datasets and create a macros with data type change statements                                            *;
	******************************************************************************************************************;
	data _null_;
		merge source_cols(in=a) end=last
			  target_cols;
		by name;
		length rename_stmt type_cast_stmt drop_stmt $3000 type_diffs $10;
		retain rename_stmt type_cast_stmt drop_stmt type_diffs;
		* Inner join one SAS way :-);
		if a;

		if _n_ = 1 then
			type_diffs = 'FALSE';
		if type_target = 'char' and type_source = 'num' then do;
			rename_stmt    = catx(' ', rename_stmt, name, ' = ', cats(name, '_Num'));	
			type_cast_stmt = catx(' ', type_cast_stmt, name, ' = strip(put(', cats(name, '_Num'), ', best12.));');

			**********************************************************************************************************;
			* Make sure numeric missing values are converted to character missing values                             *;
			**********************************************************************************************************;
			type_cast_stmt = catx(' ', type_cast_stmt, 'if', name, '="."', 'then call missing(', name, ');');
			if "&drop_temp_cols" = "TRUE" then do;
				drop_stmt = catx(' ', drop_stmt, cats(name, '_Num'));
			end;
			type_diffs = 'TRUE';		
		end;
		else if type_target = 'num' and type_source = 'char' then do;
			rename_stmt    = catx(' ', rename_stmt, name, ' = ', cats(name, '_Chr'));	
			type_cast_stmt = catx(' ', type_cast_stmt, name, ' = input(', cats(name, '_Chr'), ', nlnum12.);');
			if "&drop_temp_cols" = "TRUE" then do;
				drop_stmt = catx(' ', drop_stmt, cats(name, '_Chr'));
			end;
			type_diffs = 'TRUE';		
		end;
		call symputx('rename_stmt',    rename_stmt,    'L');
		call symputx('type_cast_stmt', type_cast_stmt, 'L');
		call symputx('drop_stmt',      drop_stmt,      'L');
		call symputx('type_diffs',     type_diffs,     'L');
	run;

	%if &type_diffs = TRUE %then %do;
		data &data_lib..&data_table (drop=&drop_stmt);
			set &data_lib..&data_table(rename=( &rename_stmt ));	
			
			*************************************************************************************************************;
			* Convert from number to character                                                                          *;
			*************************************************************************************************************;
			&type_cast_stmt 
		run;
	%end;

	*********************************************************************************************************************;
	* Append the two datasets                                                                                           *;
	*********************************************************************************************************************;
	proc append base=&base_lib..&base_table data=&data_lib..&data_table force &proc_append_warn;
	run;

%mend append_diff_data_types;

Sample usage of the macro:

 

/* Below is sample code for usage of the macro */
data my_class_1;
	set sashelp.class;
	length col2 $30.;
	new_age = strip(put(age, best12.));
	chr_age = 12;
run;

data my_class_2;
	set sashelp.class;
	length col2 $20.;
	new_age = .;
	chr_age = ' ';
	col1 = 12;
run;

%append_diff_data_types(base_table=my_class_1, data_table=my_class_2, base_lib=WORK, data_lib=WORK, drop_temp_cols=TRUE);

Enjoy!

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Discussion stats
  • 0 replies
  • 520 views
  • 3 likes
  • 1 in conversation