DATA Step, Macro, Functions and more

How to avoid truncation of the data?

Accepted Solution Solved
Reply
Contributor
Posts: 27
Accepted Solution

How to avoid truncation of the data?

Hi guys,

 

How can I avoid the truncation of the data in the data steps(merge the datasets) of the loop?

 

The warnings show:

 

WARNING: Multiple lengths were specified for the variable SecName by input data set(s). This can cause truncation of data.

 

libname sd "D:\data\sasdata";

%macro indexes();

proc import out=sd.index datafile="D:\data\indexcsv\3index_all.csv" dbms=csv replace;
run;

%do k=1 %to 30;

proc import out=index datafile="D:\data\indexcsv\3index_all.part&k..csv" dbms=csv replace;
run;

data sd.index;
set sd.index index;
run;

%end;

%mend;

%indexes();

 


Accepted Solutions
Solution
‎06-20-2018 12:54 AM
Super User
Posts: 6,785

Re: How to avoid truncation of the data?

It is entirely possible that the answer is "none of the above".  You need to know something about your data.  You might find, for example, that SECName has longer values in the last file that you need to read.  You can't just modify the generated DATA step code from one import, and assume that it applies equally well to all the other files.

 

If you do know the longest possible value for SECName among all your files, you can fix the program easily enough.  If $30 is the longest length needed, change a piece at the end of your code by adding a LENGTH statement:

 

data sd.index;

length SECName $ 30;

set sd.index index;

run;

View solution in original post


All Replies
Regular Contributor
Posts: 213

Re: How to avoid truncation of the data?

[ Edited ]

Replace proc import with a data-step, so that all variables have the same definition. Another benefit: you can import all files having the same structure with one data-step.

You will find many comments in the community explaining why it is not a good idea to use in production-code.

Super User
Posts: 13,583

Re: How to avoid truncation of the data?

If you must use Proc import with delimited files (the data step is a much better idea as @error_prone suggests) at least add the guessingrows=max option.

The Proc Import code be default only examines about 20 lines of data before guessing as to a variables type, length and for date, time and datetime values the informat.

 

Using guessingrows will look at more rows and likely reduce the frequency of this problem.

 

Better is to import one file and then copy the data step code generated by Proc Import from the log, paste into the editor (remove line numbers) and verify the informat type and lengths. It may also be a good idea to examine the created variable names and apply labels.

Then use the same code to read each file changing the infile as needed.

Solution
‎06-20-2018 12:54 AM
Super User
Posts: 6,785

Re: How to avoid truncation of the data?

It is entirely possible that the answer is "none of the above".  You need to know something about your data.  You might find, for example, that SECName has longer values in the last file that you need to read.  You can't just modify the generated DATA step code from one import, and assume that it applies equally well to all the other files.

 

If you do know the longest possible value for SECName among all your files, you can fix the program easily enough.  If $30 is the longest length needed, change a piece at the end of your code by adding a LENGTH statement:

 

data sd.index;

length SECName $ 30;

set sd.index index;

run;

Respected Advisor
Posts: 4,742

Re: How to avoid truncation of the data?

[ Edited ]

@supmilk

Others already made the point that you need to know your data and that using Proc Import doesn't give you full control over the outcome.

Guessingrow=max will at least ensure that you read the data correctly from a single source.

 

Below two coding options for "synching" column lengths when combining data from multiple sources. That's of course only the "2nd best" option and reading the data with a data step where you specify the column attributes upfront should always be your first choice.

data haveA;
  length var $1; 
  var='A';
run;
data haveB;
  length var $2; 
  XX=5;
  VAR='BB';
run;


/* option 1: Target table will only contain variables common to all source tables */
/*           Column length will be max length found in any of the source table    */
proc sql;
  create table want1 as
  select * from haveA
  union all corr
  select * from haveB
  ;
quit;


/* option 2: Pre-process to determine max length of common character variables           */
/*           Then generate a length statement to ensure max length of variables get used */
proc sql;
  select 
    case 
      when type='char' then strip(name)||' $'||put(ml,f6. -l)
      when type='num'  then strip(name)||' '||put(ml,f6. -l)
      else ' '
      end
    into :var_lengths separated by ' '
  from
    (
      select propcase(name) as name, type, max(length) as ml, max(varnum) as mvn
      from dictionary.columns
      where libname='WORK' and memname in ('HAVEA','HAVEB')
      group by name, type
    )
  order by mvn
  ;
quit;

data want2;
  length &var_lengths;
  set haveA haveB;
run;

 

Contributor
Posts: 27

Re: How to avoid truncation of the data?

Thank you very much for your help! And it does work!
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 112 views
  • 5 likes
  • 5 in conversation