Hi, I have around 50 datasets that I need to merge. SAS throws me an error saying that one of the variables(not sure if SAS will show more once this has been fixed) has been defined as both character and numeric. I saw similar questions in the forum; however, I am not sure how PUT and INPUT can be used for a large number of datasets. I need to convert that specific variable from Numeric to Character. I figured using proc compare to know which dataset needs fixing is complex for 50 datasets. I am looking to see if there is any efficient way to do it.
Thank you!
Use GROUP BY to check whether the TYPE is constant. For example by testing if the MIN(TYPE) is the same as the MAX(TYPE).
proc sql;
create table names as
select upcase(name) as uname
, count(*) as n
, min(type) as min_type
, max(type) as max_type
, min(length) as min_length
, max(length) as max_length
, catx('.',libname,nliteral(memname)) as dsname length=60
, nliteral(name) as name length=51
, type
, length
, format
from dictionary.columns
where libname = 'WORK'
group by uname
order by uname, dsname
;
quit;
proc print data=names;
where min_type ne max_type;
run;
Let's try it out.
data good;
length a 8 b $10 ;
run;
data bad;
length a b 8;
run;
Results
min_ max_ Obs uname n min_type max_type length length dsname name type length format 3 B 2 char num 8 10 WORK.BAD b num 8 4 B 2 char num 8 10 WORK.GOOD b char 10
I had .stc files that I used proc cimport within the macro to convert them into SAS data files. I am not sure if we can fix the data type of one specific variable during that process.
Here is a way to look at a bunch of data sets and variables for type issues.
First, this assumes all of the sets are in the same library, if they may be in multiple libraries use IN ('THISLIB' 'THATLIB' 'OTHERLIB') in the where clause to look at different libraries and include the library in the tabulate.
Proc sql; create table forreport as select memname,upcase(name) as name, type from dictionary.columns where libname = 'WORK' ; quit; proc tabulate data=forreport; class memname name type; table name*type, memname*n=' ' ; quit;
Dictionary.columns is a special data view available from Proc SQL that contains all of the data sets's variables in all the libraries. The Libname and Memname (data set names) are stored as uppercase. Name, the name of variables, may be any case so I convert to upper case to see all of them. The Proc tabulate creates a grid with the variable name and type with a 1 indicating which it is under each data set.
This will at least get you started as to which sets to address in which manner.
Hint: you may also want to look at LENGTH of the character variables as well. Differing lengths can cause data truncation when you least expect or want it.
I agree with @Reeza the most common cause of this sort of inconsistency, especially if you think the source data should be of the same type, is use of Proc Import.
Once you have one data set working to create what you want, then ask how to use that to address multiple data sets.
I am sorry, but I did not quite understand what '1' means here. I did as you suggested, and I have almost 20,000 observations with '1's all over the place. How do I tackle this?
@Sudeep_Neupane wrote:
I am sorry, but I did not quite understand what '1' means here. I did as you suggested, and I have almost 20,000 observations with '1's all over the place. How do I tackle this?
Look for the variables where the name shows both Char and Num in the row headers. Those the variables you likely need to address. The data sets above the 1 are the sets with the variables. One, or more of those data sets would need the variables standardized to type (and length if character) . Since you started with at least two data sets and a variable you should find that variable in the left (row header) and look for the 1s and the data sets associated and should see others.
You could also run other summaries of the Forreport data set to find just those variables that have two types associated and reduce what you have to look through.
Use GROUP BY to check whether the TYPE is constant. For example by testing if the MIN(TYPE) is the same as the MAX(TYPE).
proc sql;
create table names as
select upcase(name) as uname
, count(*) as n
, min(type) as min_type
, max(type) as max_type
, min(length) as min_length
, max(length) as max_length
, catx('.',libname,nliteral(memname)) as dsname length=60
, nliteral(name) as name length=51
, type
, length
, format
from dictionary.columns
where libname = 'WORK'
group by uname
order by uname, dsname
;
quit;
proc print data=names;
where min_type ne max_type;
run;
Let's try it out.
data good;
length a 8 b $10 ;
run;
data bad;
length a b 8;
run;
Results
min_ max_ Obs uname n min_type max_type length length dsname name type length format 3 B 2 char num 8 10 WORK.BAD b num 8 4 B 2 char num 8 10 WORK.GOOD b char 10
Hi Tom, I understood the proc SQL and proc print part, but where do you reference your datasets? Like in your good and bad example, I see the results, but how is the code and data connected there?
@Sudeep_Neupane - By pointing @Tom 's program at the SAS library you want to analyse. Just change the WORK library for the one that contains your 50 tables.
Thank you, SASKiwi!
You can query a lot of information about SAS datasets, variables, and other objects using the DICTIONARY "tables".
In particular for this one the SQL code is querying the DICTIONARY.COLUMN metadata that lists all of the variables.
To test it first run that code that creates the GOOD and BAD datasets.
Then run the PROC SQL step that queries the metadata and check out the results.
Or just run the PROC SQL code and change the WORK to the name of the libref you want to check instead. Remember that values in LIBNAME and MEMNAME fields in DICTIONARY.COLUMN are always in uppercase only.
Amazing technique, Tom! Thank you very much!
I would split your issue into few steps to deal with:
1) Choose the desired type (NUM/CHAR) of the variable named in the message.
2) Check each data set does it fit the desired type and make a list of data sets that need to change the variable type. Use any proposed method.
3) use next code to cahnge type of a variable:
data correct;
set ds_in(rename=(var=_var)); /* adapt variable name */
var = input(_var, best8.2); /* char to num, adapt the format */
/* or */
var = put(_var, 8.2); /* num to char, adapt the format */
drop _var;
run;
4) Check the code with a just one data set from the list. You can use
options obs=10; for the test.
5) Use next macro to loop over the data sets that need repair type:
%let list = dsn1 dsn2 dsn3 ...; /* List of data set names to repair */
%macro repair;
%do i=1 to %sysfunc(countw(&list));
%let dsn = %scan(&list,&i);
data &dsn;
set &dsn((rename = (var=_var));
/* enter here the correct conversion statement */
run;
%end;
%mend repair;
%repair;
6) Run your merge code with the full list.
Have a test to ensure you repaired all data sets and all variables using:
set &dsn (obs=10 rename=(var=_var));
in the above macro.
7) Use options obs=max; to do it over the whole data set and drop
obs=10 from the macro statement.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.