Solved: Variable has been defined as both character and numeric

Sudeep_Neupane · Posted 12-10-2021 03:23 PM

Hi, I have around 50 datasets that I need to merge. SAS throws me an error saying that one of the variables(not sure if SAS will show more once this has been fixed) has been defined as both character and numeric. I saw similar questions in the forum; however, I am not sure how PUT and INPUT can be used for a large number of datasets. I need to convert that specific variable from Numeric to Character. I figured using proc compare to know which dataset needs fixing is complex for 50 datasets. I am looking to see if there is any efficient way to do it.

Thank you!

Tom · Posted 12-10-2021 05:11 PM

Use GROUP BY to check whether the TYPE is constant. For example by testing if the MIN(TYPE) is the same as the MAX(TYPE).

proc sql;
create table names as
   select upcase(name) as uname
        , count(*) as n
        , min(type) as min_type
        , max(type) as max_type
        , min(length) as min_length
        , max(length) as max_length
        , catx('.',libname,nliteral(memname)) as dsname length=60
        , nliteral(name) as name length=51
        , type
        , length
        , format
   from dictionary.columns
   where libname = 'WORK'
   group by uname
   order by uname, dsname 
;
quit;

proc print data=names;
  where min_type ne max_type;
run;

Let's try it out.

data good;
  length a 8 b $10 ;
run;
data bad;
  length a b 8;
run;

Results

                                              min_      max_
Obs    uname    n    min_type    max_type    length    length     dsname      name    type    length    format

  3      B      2      char        num          8        10      WORK.BAD      b      num        8
  4      B      2      char        num          8        10      WORK.GOOD     b      char      10

View solution in original post

Reeza · Posted 12-10-2021 03:26 PM

How did you create these 50 data sets? I'm guessing you read them in via PROC IMPORT somehow?

In general, you likely need to back up a step and ensure your data was imported correctly with the right types and formats.

Sudeep_Neupane · Posted 12-10-2021 04:14 PM

I had .stc files that I used proc cimport within the macro to convert them into SAS data files. I am not sure if we can fix the data type of one specific variable during that process.

ballardw · Posted 12-10-2021 03:39 PM

Here is a way to look at a bunch of data sets and variables for type issues.

First, this assumes all of the sets are in the same library, if they may be in multiple libraries use IN ('THISLIB' 'THATLIB' 'OTHERLIB') in the where clause to look at different libraries and include the library in the tabulate.

Proc sql;
   create table forreport as
   select memname,upcase(name) as name, type
   from dictionary.columns
   where libname = 'WORK'
   ;
quit;

proc tabulate data=forreport;
   class memname name type;
   table name*type,
         memname*n=' '
         ;
quit;

Dictionary.columns is a special data view available from Proc SQL that contains all of the data sets's variables in all the libraries. The Libname and Memname (data set names) are stored as uppercase. Name, the name of variables, may be any case so I convert to upper case to see all of them. The Proc tabulate creates a grid with the variable name and type with a 1 indicating which it is under each data set.

This will at least get you started as to which sets to address in which manner.

Hint: you may also want to look at LENGTH of the character variables as well. Differing lengths can cause data truncation when you least expect or want it.

I agree with @Reeza the most common cause of this sort of inconsistency, especially if you think the source data should be of the same type, is use of Proc Import.

Once you have one data set working to create what you want, then ask how to use that to address multiple data sets.

Sudeep_Neupane · Posted 12-10-2021 04:28 PM

I am sorry, but I did not quite understand what '1' means here. I did as you suggested, and I have almost 20,000 observations with '1's all over the place. How do I tackle this?

ballardw · Posted 12-10-2021 04:48 PM

@Sudeep_Neupane wrote:

I am sorry, but I did not quite understand what '1' means here. I did as you suggested, and I have almost 20,000 observations with '1's all over the place. How do I tackle this?

Look for the variables where the name shows both Char and Num in the row headers. Those the variables you likely need to address. The data sets above the 1 are the sets with the variables. One, or more of those data sets would need the variables standardized to type (and length if character) . Since you started with at least two data sets and a variable you should find that variable in the left (row header) and look for the 1s and the data sets associated and should see others.

You could also run other summaries of the Forreport data set to find just those variables that have two types associated and reduce what you have to look through.

Tom · Posted 12-10-2021 05:11 PM

Use GROUP BY to check whether the TYPE is constant. For example by testing if the MIN(TYPE) is the same as the MAX(TYPE).

proc sql;
create table names as
   select upcase(name) as uname
        , count(*) as n
        , min(type) as min_type
        , max(type) as max_type
        , min(length) as min_length
        , max(length) as max_length
        , catx('.',libname,nliteral(memname)) as dsname length=60
        , nliteral(name) as name length=51
        , type
        , length
        , format
   from dictionary.columns
   where libname = 'WORK'
   group by uname
   order by uname, dsname 
;
quit;

proc print data=names;
  where min_type ne max_type;
run;

Let's try it out.

data good;
  length a 8 b $10 ;
run;
data bad;
  length a b 8;
run;

Results

                                              min_      max_
Obs    uname    n    min_type    max_type    length    length     dsname      name    type    length    format

  3      B      2      char        num          8        10      WORK.BAD      b      num        8
  4      B      2      char        num          8        10      WORK.GOOD     b      char      10

Sudeep_Neupane · Posted 01-10-2022 02:35 PM

Hi Tom, I understood the proc SQL and proc print part, but where do you reference your datasets? Like in your good and bad example, I see the results, but how is the code and data connected there?

SASKiwi · Posted 01-10-2022 02:42 PM

@Sudeep_Neupane - By pointing @Tom 's program at the SAS library you want to analyse. Just change the WORK library for the one that contains your 50 tables.

Sudeep_Neupane · Posted 01-10-2022 02:50 PM

Thank you, SASKiwi!

Tom · Posted 01-10-2022 03:09 PM

You can query a lot of information about SAS datasets, variables, and other objects using the DICTIONARY "tables".

In particular for this one the SQL code is querying the DICTIONARY.COLUMN metadata that lists all of the variables.

To test it first run that code that creates the GOOD and BAD datasets.

Then run the PROC SQL step that queries the metadata and check out the results.

Or just run the PROC SQL code and change the WORK to the name of the libref you want to check instead. Remember that values in LIBNAME and MEMNAME fields in DICTIONARY.COLUMN are always in uppercase only.

Sudeep_Neupane · Posted 01-10-2022 03:04 PM

Amazing technique, Tom! Thank you very much!

Shmuel · Posted 12-11-2021 05:49 AM

I would split your issue into few steps to deal with:

1) Choose the desired type (NUM/CHAR) of the variable named in the message.

2) Check each data set does it fit the desired type and make a list of data sets that need to change the variable type. Use any proposed method.

3) use next code to cahnge type of a variable:

data correct;
 set ds_in(rename=(var=_var));  /* adapt variable name */
       var = input(_var, best8.2);  /* char to num, adapt the format */
/* or */
       var = put(_var, 8.2);  /* num to char, adapt the format */

       drop _var;
run;

4) Check the code with a just one data set from the list. You can use
options obs=10; for the test.

5) Use next macro to loop over the data sets that need repair type:

%let list = dsn1 dsn2 dsn3 ...;   /* List of data set names to repair */
%macro repair;
    %do i=1 to %sysfunc(countw(&list));
            %let dsn = %scan(&list,&i);
             data &dsn;
               set &dsn((rename = (var=_var));
                     /* enter here the correct conversion statement */
             run;
     %end;
%mend repair;
%repair;

6) Run your merge code with the full list.

Have a test to ensure you repaired all data sets and all variables using:

set &dsn (obs=10 rename=(var=_var));

in the above macro.

7) Use options obs=max; to do it over the whole data set and drop

obs=10 from the macro statement.

Sudeep_Neupane · Posted 01-10-2022 03:06 PM

Thank you very much, @Shmuel! I did not have to do this step since I noticed that I only had to change the data types of a variable for three datasets. So I just did it manually. I am saving this for future reference though! Thanks!

Variable has been defined as both character and numeric

Re: Variable has been defined as both character and numeric

Re: Variable has been defined as both character and numeric

Re: Variable has been defined as both character and numeric

Re: Variable has been defined as both character and numeric

Re: Variable has been defined as both character and numeric

Re: Variable has been defined as both character and numeric

Re: Variable has been defined as both character and numeric

Re: Variable has been defined as both character and numeric

Re: Variable has been defined as both character and numeric

Re: Variable has been defined as both character and numeric

Re: Variable has been defined as both character and numeric

Re: Variable has been defined as both character and numeric

Re: Variable has been defined as both character and numeric

Re: Variable has been defined as both character and numeric

Classroom Training Available!