Hi all, it's my first time posting here.
I have 8 datasets each with 800+ variables that I am trying to stack. I am getting "ERROR: Variable X has been defined as both character and numeric."
I know that this error comes from some of the datasets defining this variable as character and some as numeric.
However, with such a large number of variables (the error occurs on at least 400 of them) it's impractical to switch each variable using put/input.
I am wondering if anyone has a solution that fixes a large number of these errors relatively quickly.
Thank you!!
Proc Import will create data types (numeric or character) and lengths of the data based on what's in your external file. Proc Import will first analyse your data to determine the appropriate attributes. Make sure you use Guessingrow=Max so Proc Import analysis all data.
But even if doing so: If in your first .csv the maximum string length found for a column is 5 but in your next .csv it's 10 then you'll still have a truncation issue when combining the tables created by Proc Import.
Assuming all your .csv have the same structure (same column order) you could also first combine all the external data and then read it with a single Proc Import.
That's what below code is doing.
/* create sample .csv */
%let path_to_csv=%sysfunc(pathname(work));
data _null_;
file "&path_to_csv/have1.csv";
put 'col1,col2,col3,col4,col5';
put 'a,bbb,5, 01jan2020,1234';
file "&path_to_csv/have2.csv";
put 'col1,col2,col3,col4,col5';
put 'xx,9,yy,01feb2020,5678';
stop;
run;
/* combine sample .csv into a single temporary file
and store this temporary file in SAS Work
*/
filename alldata temp;
data _null_;
file alldata lrecl=4000;
infile datalines truncover;
input fil2read $300.;
fil2read=cats("&path_to_csv/",fil2read);
if _n_= 1 then
infile dummy1 filevar=fil2read lrecl=4000 end=done;
else
infile dummy2 filevar=fil2read end=done lrecl=4000 firstobs=2;
do while(not done);
input;
put _infile_;
end;
datalines;
have1.csv
have2.csv
;
/* read temporary file with all the data using Proc Import */
proc import
out=want
file=alldata
dbms=csv
replace
;
guessingrows=max;
delimiter=',';
getnames=yes;
run;
/* delete the temporary file in Work */
filename alldata clear;
/* print result */
proc print data=want;
run;
N.B: Make sure you set the value for LRECL to a value that's sufficient for you longest line of source data.
Thanks for your reply! I read them all using proc import as csv. Usually when I have similar issues with other data, importing it as csv resolves it.
You shouldn't use proc import for your CSVs use a data step. If you have one file read with the correct types and lengths, then use that code to read the rest of the files. You can try add the GUESSINGROWS=MAX option to the PROC IMPORT as well. It will slow things down a lot but more likely to be accurate.
You get the code from PROC IMPORT, find the one that works the best, adjust as needed and use that code to read them all into a single data step at once. If all your text files are in a single folder you can use a wildcard, if they are not, you can use the filevar option to pipe the file names in. If you check the library on here, there's an example of how to read all text files into a single data step in one step in there.
@Betsy wrote:
Thanks for your reply! I read them all using proc import as csv. Usually when I have similar issues with other data, importing it as csv resolves it.
Did you use a large value for the GUESSINGROWS option? Or the option at all?
Typically Proc Import only examines 20 rows to set variable types. For text files you have the option of setting a larger number with the Guessingrows option. It may run a little slower but you should see more consistent results.
OR
If all if all of the files have the same layout import one with the Guessingrows=Max; or 32000, option. The data step used to read the data will be in the log. Copy the data step and paste into the editor. Clean up the code by removing line numbers and such. Then change the Infile file name and the Data set output name to read each file the same.
This last will reduce the occurrence of almost certain to appear warnings of the lengths of variables differing and may result in data truncation. And likely would with multiple imports.
I usually examine the INFORMAT statements generate by Proc Import to see if I think the lengths and types are appropriate.
Warning: if you see a variable with a $1. informat it is likely that the variable had no values in the file and you may need to check documentation or the other files for example data to set a more useable informat when the data does appear. This is typical for sets with seldom recorded values such as "Income from major lottery win" or characteristics of 4th or more infant delivered in a single birth (I would say this is poor data structure but sometimes we don't get to pick our sources).
DO NOT USE PROC IMPORT!
For CSV files, use a data step; only then will you have full control over the process, including all variable attributes.
Proc Import will create data types (numeric or character) and lengths of the data based on what's in your external file. Proc Import will first analyse your data to determine the appropriate attributes. Make sure you use Guessingrow=Max so Proc Import analysis all data.
But even if doing so: If in your first .csv the maximum string length found for a column is 5 but in your next .csv it's 10 then you'll still have a truncation issue when combining the tables created by Proc Import.
Assuming all your .csv have the same structure (same column order) you could also first combine all the external data and then read it with a single Proc Import.
That's what below code is doing.
/* create sample .csv */
%let path_to_csv=%sysfunc(pathname(work));
data _null_;
file "&path_to_csv/have1.csv";
put 'col1,col2,col3,col4,col5';
put 'a,bbb,5, 01jan2020,1234';
file "&path_to_csv/have2.csv";
put 'col1,col2,col3,col4,col5';
put 'xx,9,yy,01feb2020,5678';
stop;
run;
/* combine sample .csv into a single temporary file
and store this temporary file in SAS Work
*/
filename alldata temp;
data _null_;
file alldata lrecl=4000;
infile datalines truncover;
input fil2read $300.;
fil2read=cats("&path_to_csv/",fil2read);
if _n_= 1 then
infile dummy1 filevar=fil2read lrecl=4000 end=done;
else
infile dummy2 filevar=fil2read end=done lrecl=4000 firstobs=2;
do while(not done);
input;
put _infile_;
end;
datalines;
have1.csv
have2.csv
;
/* read temporary file with all the data using Proc Import */
proc import
out=want
file=alldata
dbms=csv
replace
;
guessingrows=max;
delimiter=',';
getnames=yes;
run;
/* delete the temporary file in Work */
filename alldata clear;
/* print result */
proc print data=want;
run;
N.B: Make sure you set the value for LRECL to a value that's sufficient for you longest line of source data.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.