BookmarkSubscribeRSS Feed
Al_senior
Fluorite | Level 6

I know this is a basic question but any help is appreciated. Also I numbered my questions in order to makes it easier to answer. 

I have multiple datasets (lets call them dt1, dt2, dt3, ...)  that I need to concatenate. My datasets have the same columns, but when using the bellow code: 

data aggregated_dt; 
   set dt1 dt2 dt3;
run;

 I get error that many columns with the same name don't have compatible formats. (for the sake of simplicity lets say I only get error for one column and that column is col1) so col1 is numeric in dt1 and character in dt2, consequently concatenation fails.

 

As a solution I want to convert col1 from numeric to character in dt1. So here are my questions: 

  1.  Can I convert col1 to character when importing dt1, if yes, how? (right now, I import dt1 first and then convert col1 format.)

As far as I know, there is no way that I can overwrite col1 in one step so I use the following code to convert the col1:

data dt1 (drop= new_col);
   set dt1(rename=(col1=new_col));
   col1 = put(new_col, $12.);
run;

When I check the result I have a SAS table that looks fine but when I use proc contents step as follow: 

proc contents data= dt1;
run;

 I see that informat and format column are empty for col1 in variable table.

 

2. Is this going to make issue down the road?

3. I can add "format col1 $12. ;" in data step but I am not sure if this is a redundancy or not?

 

But my most important question is, I have multiple datasets and multiple columns that I need to convert to character.

 

4. Converting one column at a time doesn't seem an efficient way. Is there any better way I can do it?  

 

 

2 REPLIES 2
Reeza
Super User

@Al_senior wrote:

 

  1.  Is it better to convert col1 to numeric when I reading data1 to a SAS table or is it better to read data in a SAS table and then convert it?

It's better to convert it when reading it in - less steps to wrangle the data later on, ensures the process is correct and the same for all data sets. 

 


@Al_senior wrote:

 I see that informat and format column are empty for col1 in variable table.

 

2. Is this going to make issue down the road?

 


 No. Formats don't cause major issues when appending, I suspect your error actually references TYPES not FORMATS. See example below of mismatch of types.

If your lengths are different you may end up with truncation and a WARNING in your log, but it shouldn't error out. It may very well be wrong though.

 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 68         
 69         data class1;
 70         set sashelp.class (rename =(age=age_num));
 71         Age = put(age_num, $8. -l);
 WARNING: Variable age_num has already been defined as numeric.
 72         drop age_num;
 73         run;
 
 NOTE: There were 19 observations read from the data set SASHELP.CLASS.
 NOTE: The data set WORK.CLASS1 has 19 observations and 5 variables.
 NOTE: DATA statement used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              692.78k
       OS Memory           25508.00k
       Timestamp           07/13/2021 04:22:25 PM
       Step Count                        78  Switch Count  2
       Page Faults                       0
       Page Reclaims                     133
       Page Swaps                        0
       Voluntary Context Switches        10
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           264
       
 
 74         
 75         
 76         proc contents data=class1;
 77         run;
 
 NOTE: PROCEDURE CONTENTS used (Total process time):
       real time           0.03 seconds
       user cpu time       0.03 seconds
       system cpu time     0.00 seconds
       memory              1644.68k
       OS Memory           26024.00k
       Timestamp           07/13/2021 04:22:25 PM
       Step Count                        79  Switch Count  0
       Page Faults                       0
       Page Reclaims                     174
       Page Swaps                        0
       Voluntary Context Switches        0
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           16
       
 
 78         
 79         data class2;
 80         set sashelp.class;
 81         run;
 
 NOTE: There were 19 observations read from the data set SASHELP.CLASS.
 NOTE: The data set WORK.CLASS2 has 19 observations and 5 variables.
 NOTE: DATA statement used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              685.59k
       OS Memory           25764.00k
       Timestamp           07/13/2021 04:22:25 PM
       Step Count                        80  Switch Count  2
       Page Faults                       0
       Page Reclaims                     98
       Page Swaps                        0
       Voluntary Context Switches        15
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           264
       
 
 82         
 83         data want;
 84         set class1 class2;
 ERROR: Variable Age has been defined as both character and numeric.
 85         run;
 
 NOTE: The SAS System stopped processing this step because of errors.
 WARNING: The data set WORK.WANT may be incomplete.  When this step was stopped there were 0 observations and 5 variables.
 WARNING: Data set WORK.WANT was not replaced because this step was stopped.
 NOTE: DATA statement used (Total process time):
       real time           0.00 seconds
       user cpu time       0.00 seconds
       system cpu time     0.00 seconds
       memory              1021.59k
       OS Memory           26284.00k
       Timestamp           07/13/2021 04:22:25 PM
       Step Count                        81  Switch Count  0
       Page Faults                       0
       Page Reclaims                     129
       Page Swaps                        0
       Voluntary Context Switches        0
       Involuntary Context Switches      0
       Block Input Operations            0
       Block Output Operations           8
       
 
 86         
 87         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

@Al_senior wrote:

 

3. I can add "format col1 $12. ;" in data step but I am not sure if this is a redundancy or not?

 


You can add it, not redundant because you've never specified a format for the variable. You applied a format during the conversion to character but you can convert it and apply a different format for display. In this example it doesn't make a ton of sense, but its very common when working with dates. For example to convert a date:

 

data demo;
date = "09Jul2021";
date_num = input(date, date9.);
format date_num yymmn6.;
run;

proc print data=demo;
run;

@Al_senior wrote:

 

 

But my most important question is, I have multiple datasets and multiple columns that I need to convert to character.

 

4. Converting one column at a time doesn't seem an efficient way. Is there any better way I can do it?  

 


Your most important question was your first one - read the data in correctly first and then this isn't required at all. 

If your files are from a text file or DB you can actually read them all at once into a single data set without needing any of the above. 

For the purposes of actually answering the question, you can use an array to streamline the process. You will need to separate the process for numeric to char and char to numeric conversions as arrays can only deal with one type of variable at a time. 

 

Here's a tutorial on using Arrays in SAS
https://stats.idre.ucla.edu/sas/seminars/sas-arrays/

 


@Al_senior wrote:

I know this is a basic question but any help is appreciated. Also I numbered my questions in order to makes it easier to answer. 

I have multiple datasets (lets call them data1, data2, data3, ...)  that I need to concatenate. My datasets have the same columns, but when using the bellow code: 

data aggregated_dt; 
   set data1 data2 data3;
run;

 I get error that many columns with the same name don't have compatible formats. (for the sake of simplicity lets say I only get error for one column and that column is col1) so col1 is numeric in data1 and character in data2, consequently concatenation fails.

 

As a solution I want to convert col1 from numeric to character in data1. So here are my questions: 

  1.  Is it better to convert col1 to numeric when I reading data1 to a SAS table or is it better to read data in a SAS table and then convert it?

As far as I know, there is no way that I can overwrite col1 in one step so I use the following code to convert the col1:

data data1 (drop= new_col);
   set data1 (rename=(col1=new_col));
   col1 = put(new_col, $12.);
format col1 $12.;
run;

When I check the result I have a SAS table that looks fine but when I use proc contents step as follow: 

proc contents data= data1;
run;

 I see that informat and format column are empty for col1 in variable table.

 

2. Is this going to make issue down the road?

3. I can add "format col1 $12. ;" in data step but I am not sure if this is a redundancy or not?

 

But my most important question is, I have multiple datasets and multiple columns that I need to convert to character.

 

4. Converting one column at a time doesn't seem an efficient way. Is there any better way I can do it?  

 

 


 

ballardw
Super User

One suspects that since you want to combine these and have the same names that you should re-read the data to get the same types as a better approach. Especially if you have more data that will come later that will also need to be added.

Once you have that working it is very easy to change input file names and the output data set and each set will have identical properties.

 

That means use a data step instead of Proc Import or a wizard that "imports" data, which actually calls Import in the background. Proc Import will make assumptions based on the content, and very little of that content actually, as to the variable types, lengths and properties. If you have a document that describes what the data should be like: lengths, type of data and layout (dates, times and datetime values this is pretty important) then writing a data step is routine but tedious.

 

If the data is from a spreadsheet you may be much better off to covert the file to Comma Separated Values (CSV) using the file save as features.

 

The formats or informats will not have any affect on the concatenation of data sets. However the lengths of variables will. If one data set has variable X with length 25 and the other has X length 50 there is a potential to have 25 characters truncated from the second set.

 

It is very dangerous to use the recoding approach you show. When the input and output data set are the same any logic error could mean that data is lost.

Use something more like so that you always have your base data available.

data dt1_new (drop= new_col);
   set dt1(rename=(col1=new_col));
   col1 = put(new_col, $12.);
run;

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 616 views
  • 2 likes
  • 3 in conversation