Hi guys,
How can I avoid the truncation of the data in the data steps(merge the datasets) of the loop?
The warnings show:
WARNING: Multiple lengths were specified for the variable SecName by input data set(s). This can cause truncation of data.
libname sd "D:\data\sasdata";
%macro indexes();
proc import out=sd.index datafile="D:\data\indexcsv\3index_all.csv" dbms=csv replace;
run;
%do k=1 %to 30;
proc import out=index datafile="D:\data\indexcsv\3index_all.part&k..csv" dbms=csv replace;
run;
data sd.index;
set sd.index index;
run;
%end;
%mend;
%indexes();
It is entirely possible that the answer is "none of the above". You need to know something about your data. You might find, for example, that SECName has longer values in the last file that you need to read. You can't just modify the generated DATA step code from one import, and assume that it applies equally well to all the other files.
If you do know the longest possible value for SECName among all your files, you can fix the program easily enough. If $30 is the longest length needed, change a piece at the end of your code by adding a LENGTH statement:
data sd.index;
length SECName $ 30;
set sd.index index;
run;
Replace proc import with a data-step, so that all variables have the same definition. Another benefit: you can import all files having the same structure with one data-step.
You will find many comments in the community explaining why it is not a good idea to use in production-code.
If you must use Proc import with delimited files (the data step is a much better idea as @error_prone suggests) at least add the guessingrows=max option.
The Proc Import code by default only examines about 20 lines of data before guessing as to a variables type, length and for date, time and datetime values the informat.
Using guessingrows will look at more rows and likely reduce the frequency of this problem.
Better is to import one file and then copy the data step code generated by Proc Import from the log, paste into the editor (remove line numbers) and verify the informat type and lengths. It may also be a good idea to examine the created variable names and apply labels.
Then use the same code to read each file changing the infile as needed.
It is entirely possible that the answer is "none of the above". You need to know something about your data. You might find, for example, that SECName has longer values in the last file that you need to read. You can't just modify the generated DATA step code from one import, and assume that it applies equally well to all the other files.
If you do know the longest possible value for SECName among all your files, you can fix the program easily enough. If $30 is the longest length needed, change a piece at the end of your code by adding a LENGTH statement:
data sd.index;
length SECName $ 30;
set sd.index index;
run;
Others already made the point that you need to know your data and that using Proc Import doesn't give you full control over the outcome.
Guessingrow=max will at least ensure that you read the data correctly from a single source.
Below two coding options for "synching" column lengths when combining data from multiple sources. That's of course only the "2nd best" option and reading the data with a data step where you specify the column attributes upfront should always be your first choice.
data haveA;
length var $1;
var='A';
run;
data haveB;
length var $2;
XX=5;
VAR='BB';
run;
/* option 1: Target table will only contain variables common to all source tables */
/* Column length will be max length found in any of the source table */
proc sql;
create table want1 as
select * from haveA
union all corr
select * from haveB
;
quit;
/* option 2: Pre-process to determine max length of common character variables */
/* Then generate a length statement to ensure max length of variables get used */
proc sql;
select
case
when type='char' then strip(name)||' $'||put(ml,f6. -l)
when type='num' then strip(name)||' '||put(ml,f6. -l)
else ' '
end
into :var_lengths separated by ' '
from
(
select propcase(name) as name, type, max(length) as ml, max(varnum) as mvn
from dictionary.columns
where libname='WORK' and memname in ('HAVEA','HAVEB')
group by name, type
)
order by mvn
;
quit;
data want2;
length &var_lengths;
set haveA haveB;
run;
This is by far the best approach.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.