DATA Step, Macro, Functions and more

Variable has been defined as both character and numeric.

Reply
Regular Contributor
Posts: 199

Variable has been defined as both character and numeric.

I have 5 datasets in Excel files (5 different time points w/ the same variables, for the most part). Because of Excel’s formats, I have painstakingly converted variables in both directions - character to numeric and numeric to character - using arrays in numerous SAS programs throughout this project. It has been a huge effort.

 

In the final phase I am trying to stack the datasets using the code below. I received 83 errors stating that “Variable XXX has been defined as both character and numeric.”

 

I’m assuming that this means that in different datasets the variable has different data types. For example, VAR1 is character in one dataset and numeric in another. Most of the solutions in this forum describe converting the variables that are causing the errors.

Do I need to check each dataset to see the data type of each variable and then convert them all to the desired type? Should I do this in each dataset or is there a way I can do this globally?

 

Thanks for the help.

 

DATA a1_a5_final;
		SET 
		a1_final 
		a2_final 
		a3_final 
		a4_final 
		a5_final;
	RUN; 
Respected Advisor
Posts: 4,919

Re: Variable has been defined as both character and numeric.

If the variables causing the problems are not required, you could use keep= or drop= lists on each dataset to exclude the problem variables. You can also rename=() some variables so that both versions (character and numeric) of the variables are not named the same.

PG
Occasional Contributor
Posts: 12

Re: Variable has been defined as both character and numeric.

May be those variable's datatype is different across all the datasets.
Convert those variable's datatypes as same as variables in a1_final dataset in set statement.
Super User
Posts: 7,758

Re: Variable has been defined as both character and numeric.

This is one of the side-effects of "working" (actually: wasting time) with Excel.

The sooner you get rid of Excel as a "data" source, the better.

 

Since your source is completely unreliable, and it is very hard (up to impossible) to set up globally correct rules for automatic type conversions, you will be stuck with eyeballing through the datasets and manually correcting your input data steps.

 

Saving Excel data in a sensible transfer format (.CSV!!!) will be the first step.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Respected Advisor
Posts: 4,919

Re: Variable has been defined as both character and numeric.

Posted in reply to KurtBremser

Well, the Real World might involve more categories that character and numeric. Real World people appreciate the flexibility of spreadsheets to communicate values such as "<0.01" or "0.01?" or "See Note 2" to fellow real world people.


If you need data from real world people, you can either try to pay them to fit your requirements or spend some time to adapt your data importing procedure.

PG
Super User
Super User
Posts: 7,942

Re: Variable has been defined as both character and numeric.

Whilst I agree somewhat with what your saying here, there are many systems and process developed over the years to deal with data transfer.  These include databases and file formats.  In this day and age where there are free versions of these, well documented and implemented in many different environments, the question does arise, why is something like Excel still being so widely used?  Sure it is flexible on the user end, but that really is where the benefits end.  As an unstructured document format, its is not - without some work - compatible with data processing.  The file format is proprietary, although they have made some steps towards and XML implementation, it is still a zip which can contain binary components, and is not a flat text file, hence not as portable or simple to understand or use further.  Whilst its unlikely we will ever see tha back of Excel, its always good to point out its deficiencies, and to suggest alternatives, otherwise we end up with a situation where there is nothing but Excel, and I have seen people trying to use it as a code editor, a database etc. which is madness, I have even seen a game engine built in it - which does attest to its flexibility and also peoples inate ability to try to break things.

Super User
Super User
Posts: 7,942

Re: Variable has been defined as both character and numeric.

As @KurtBremser has stated, save yourself a headache and ditch Excel.  It is not a good data format.  Save to CSV, then write a datastep import program where you can specify lengths/formats etc. and if they are all the same structure of data you can read them all in one step:

data want;
  infile "<your_path>\*.csv" dlm=",";
  length a $20 b 8;
  input a $ b;
run;
Frequent Contributor
Posts: 129

Re: Variable has been defined as both character and numeric.

[ Edited ]

I profundly agree with what has been said about using Excel as a database, but would like to answer the question without work around. So assuming you rely on both Excel database and SAS import, you could automatically unify the variable names and types to ensure correct setting.

 

proc datasets lib=work kill nolist;run;quit;
** Create test data;
data A1_FINAL;
   a=1;
   b='a1        ';
   c=.;
   output;
run;
data A2_FINAL;
   a=1;
   b='a2    ';
   c=.;
   output;
run;
data A3_FINAL;
   a=1;
   b='a3 ';
   c=.;
   output;
run;
data A4_FINAL;
   a=1;
   b='a4    ';
   c=.;
   output;
run;
data A5_FINAL;
   a='a5';
   b=5;
   c=.;
   output;
run;


%macro renameWorkDs(ds);
   data &ds._mod;
      set &ds.;
   run;
%mend;
%renameWorkDs(A1_FINAL);
%renameWorkDs(A2_FINAL);
%renameWorkDs(A3_FINAL);
%renameWorkDs(A4_FINAL);
%renameWorkDs(A5_FINAL);

** select tables in which same variables have different lengths or types;
proc sql;
   create table all_t1 as   
   select memname,name,type,case when type eq 'num' then length*2 else length end as length
   from sashelp.vcolumn
   where libname eq 'WORK' and memname in ('A1_FINAL', 'A2_FINAL', 'A3_FINAL', 'A4_FINAL', 'A5_FINAL')
   group by name
   having count(distinct type) gt 1 or count(distinct length) gt 1
   order by name, length desc , type
   ;
quit;

** Prepare statement to unify length and type of these variables on all tables;
data all_t2;
   length code $200;
   set all_t1;
   by name descending length type;
   retain maxlength;
   if first.name then maxlength=length;
   if type eq 'num' then do; 
      code='attrib '||strip(name)||'_tmp length=$'||strip(put(maxlength,best32.))||'; '
           ||strip(name)||'_tmp=strip(put('||strip(name)||',best32.)); '
           ||'drop '||strip(name)||';'; 
   end;
   else do;
      code='attrib '||strip(name)||'_tmp length=$'||strip(put(maxlength,best32.))||'; '
           ||strip(name)||'_tmp=put('||strip(name)||',$'||strip(put(maxlength,best32.))||'.); '
           ||'drop '||strip(name)||';'; 
   end;
run;

proc sort data=all_t2;
   by memname name;
run;

** Perform unification;
data _NULL_;
   set all_t2 end=last;
   by memname name;
   if first.memname then call execute('data '||strip(memname)||'_mod; set '||strip(memname)||'_mod; ');
   call execute(strip(code));
   if last then call execute('run;');
run;

DATA a1_a5_final;
   SET 
   a1_final_mod
   a2_final_mod
   a3_final_mod
   a4_final_mod 
   a5_final_mod;
RUN; 

** clean work, rename variables to their original names;
data _NULL_;
   set sashelp.vcolumn end=last;
   where libname eq 'WORK' and memname eq 'A1_A5_FINAL' and substr(name,length(name)-3) eq '_tmp';
   if _N_ eq 1 then call execute('proc datasets lib=work; 
                                 delete all_t1 all_t2 a1_final_mod a2_final_mod a3_final_mod a4_final_mod a5_final_mod
                                 ;
                                 modify A1_A5_FINAL;
                                 ');
   call execute(' rename '||strip(name)||'='||substr(name,1,length(name)-4)||';');
   if last then call execute('run;quit;');
run;
________________________

- Cheers -

Super User
Super User
Posts: 7,942

Re: Variable has been defined as both character and numeric.

Erm, that only sets together for differeing lengths does it not?  That can be achieved with an SQL trick, union all will automatically take the longest length:

proc sql;
  create table WANT as
  select  A,B,C
  from    A1_FINAL
  union all
  select  A,B,C
  from    A2_FINAL
  union all
  select  A,B,C
  from    A3_FINAL;
quit;

The original question was different types, ie. sometimes A would be character, sometimes numeric.  You would have to have set rules to convert from one to the other - character would be safest as it can handle all characters, but should the column be read into character as date, number, formatted etc.  Type change is not simple.

Frequent Contributor
Posts: 129

Re: Variable has been defined as both character and numeric.

Nope, it handles different types as well

________________________

- Cheers -

Ask a Question
Discussion stats
  • 9 replies
  • 3240 views
  • 11 likes
  • 6 in conversation