BookmarkSubscribeRSS Feed
_maldini_
Barite | Level 11

I have 5 datasets in Excel files (5 different time points w/ the same variables, for the most part). Because of Excel’s formats, I have painstakingly converted variables in both directions - character to numeric and numeric to character - using arrays in numerous SAS programs throughout this project. It has been a huge effort.

 

In the final phase I am trying to stack the datasets using the code below. I received 83 errors stating that “Variable XXX has been defined as both character and numeric.”

 

I’m assuming that this means that in different datasets the variable has different data types. For example, VAR1 is character in one dataset and numeric in another. Most of the solutions in this forum describe converting the variables that are causing the errors.

Do I need to check each dataset to see the data type of each variable and then convert them all to the desired type? Should I do this in each dataset or is there a way I can do this globally?

 

Thanks for the help.

 

DATA a1_a5_final;
		SET 
		a1_final 
		a2_final 
		a3_final 
		a4_final 
		a5_final;
	RUN; 
9 REPLIES 9
PGStats
Opal | Level 21

If the variables causing the problems are not required, you could use keep= or drop= lists on each dataset to exclude the problem variables. You can also rename=() some variables so that both versions (character and numeric) of the variables are not named the same.

PG
amadhavi41
Fluorite | Level 6
May be those variable's datatype is different across all the datasets.
Convert those variable's datatypes as same as variables in a1_final dataset in set statement.
Kurt_Bremser
Super User

This is one of the side-effects of "working" (actually: wasting time) with Excel.

The sooner you get rid of Excel as a "data" source, the better.

 

Since your source is completely unreliable, and it is very hard (up to impossible) to set up globally correct rules for automatic type conversions, you will be stuck with eyeballing through the datasets and manually correcting your input data steps.

 

Saving Excel data in a sensible transfer format (.CSV!!!) will be the first step.

PGStats
Opal | Level 21

Well, the Real World might involve more categories that character and numeric. Real World people appreciate the flexibility of spreadsheets to communicate values such as "<0.01" or "0.01?" or "See Note 2" to fellow real world people.


If you need data from real world people, you can either try to pay them to fit your requirements or spend some time to adapt your data importing procedure.

PG
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Whilst I agree somewhat with what your saying here, there are many systems and process developed over the years to deal with data transfer.  These include databases and file formats.  In this day and age where there are free versions of these, well documented and implemented in many different environments, the question does arise, why is something like Excel still being so widely used?  Sure it is flexible on the user end, but that really is where the benefits end.  As an unstructured document format, its is not - without some work - compatible with data processing.  The file format is proprietary, although they have made some steps towards and XML implementation, it is still a zip which can contain binary components, and is not a flat text file, hence not as portable or simple to understand or use further.  Whilst its unlikely we will ever see tha back of Excel, its always good to point out its deficiencies, and to suggest alternatives, otherwise we end up with a situation where there is nothing but Excel, and I have seen people trying to use it as a code editor, a database etc. which is madness, I have even seen a game engine built in it - which does attest to its flexibility and also peoples inate ability to try to break things.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

As @Kurt_Bremser has stated, save yourself a headache and ditch Excel.  It is not a good data format.  Save to CSV, then write a datastep import program where you can specify lengths/formats etc. and if they are all the same structure of data you can read them all in one step:

data want;
  infile "<your_path>\*.csv" dlm=",";
  length a $20 b 8;
  input a $ b;
run;
Oligolas
Barite | Level 11

I profundly agree with what has been said about using Excel as a database, but would like to answer the question without work around. So assuming you rely on both Excel database and SAS import, you could automatically unify the variable names and types to ensure correct setting.

 

proc datasets lib=work kill nolist;run;quit;
** Create test data;
data A1_FINAL;
   a=1;
   b='a1        ';
   c=.;
   output;
run;
data A2_FINAL;
   a=1;
   b='a2    ';
   c=.;
   output;
run;
data A3_FINAL;
   a=1;
   b='a3 ';
   c=.;
   output;
run;
data A4_FINAL;
   a=1;
   b='a4    ';
   c=.;
   output;
run;
data A5_FINAL;
   a='a5';
   b=5;
   c=.;
   output;
run;


%macro renameWorkDs(ds);
   data &ds._mod;
      set &ds.;
   run;
%mend;
%renameWorkDs(A1_FINAL);
%renameWorkDs(A2_FINAL);
%renameWorkDs(A3_FINAL);
%renameWorkDs(A4_FINAL);
%renameWorkDs(A5_FINAL);

** select tables in which same variables have different lengths or types;
proc sql;
   create table all_t1 as   
   select memname,name,type,case when type eq 'num' then length*2 else length end as length
   from sashelp.vcolumn
   where libname eq 'WORK' and memname in ('A1_FINAL', 'A2_FINAL', 'A3_FINAL', 'A4_FINAL', 'A5_FINAL')
   group by name
   having count(distinct type) gt 1 or count(distinct length) gt 1
   order by name, length desc , type
   ;
quit;

** Prepare statement to unify length and type of these variables on all tables;
data all_t2;
   length code $200;
   set all_t1;
   by name descending length type;
   retain maxlength;
   if first.name then maxlength=length;
   if type eq 'num' then do; 
      code='attrib '||strip(name)||'_tmp length=$'||strip(put(maxlength,best32.))||'; '
           ||strip(name)||'_tmp=strip(put('||strip(name)||',best32.)); '
           ||'drop '||strip(name)||';'; 
   end;
   else do;
      code='attrib '||strip(name)||'_tmp length=$'||strip(put(maxlength,best32.))||'; '
           ||strip(name)||'_tmp=put('||strip(name)||',$'||strip(put(maxlength,best32.))||'.); '
           ||'drop '||strip(name)||';'; 
   end;
run;

proc sort data=all_t2;
   by memname name;
run;

** Perform unification;
data _NULL_;
   set all_t2 end=last;
   by memname name;
   if first.memname then call execute('data '||strip(memname)||'_mod; set '||strip(memname)||'_mod; ');
   call execute(strip(code));
   if last then call execute('run;');
run;

DATA a1_a5_final;
   SET 
   a1_final_mod
   a2_final_mod
   a3_final_mod
   a4_final_mod 
   a5_final_mod;
RUN; 

** clean work, rename variables to their original names;
data _NULL_;
   set sashelp.vcolumn end=last;
   where libname eq 'WORK' and memname eq 'A1_A5_FINAL' and substr(name,length(name)-3) eq '_tmp';
   if _N_ eq 1 then call execute('proc datasets lib=work; 
                                 delete all_t1 all_t2 a1_final_mod a2_final_mod a3_final_mod a4_final_mod a5_final_mod
                                 ;
                                 modify A1_A5_FINAL;
                                 ');
   call execute(' rename '||strip(name)||'='||substr(name,1,length(name)-4)||';');
   if last then call execute('run;quit;');
run;
________________________

- Cheers -

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Erm, that only sets together for differeing lengths does it not?  That can be achieved with an SQL trick, union all will automatically take the longest length:

proc sql;
  create table WANT as
  select  A,B,C
  from    A1_FINAL
  union all
  select  A,B,C
  from    A2_FINAL
  union all
  select  A,B,C
  from    A3_FINAL;
quit;

The original question was different types, ie. sometimes A would be character, sometimes numeric.  You would have to have set rules to convert from one to the other - character would be safest as it can handle all characters, but should the column be read into character as date, number, formatted etc.  Type change is not simple.

Oligolas
Barite | Level 11

Nope, it handles different types as well

________________________

- Cheers -

sas-innovate-2024.png

Today is the last day to save with the early bird rate! Register today for just $695 - $100 off the standard rate.

 

Plus, pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 8480 views
  • 11 likes
  • 6 in conversation