BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
supmilk
Obsidian | Level 7

Hi guys,

 

How can I avoid the truncation of the data in the data steps(merge the datasets) of the loop?

 

The warnings show:

 

WARNING: Multiple lengths were specified for the variable SecName by input data set(s). This can cause truncation of data.

 

libname sd "D:\data\sasdata";

%macro indexes();

proc import out=sd.index datafile="D:\data\indexcsv\3index_all.csv" dbms=csv replace;
run;

%do k=1 %to 30;

proc import out=index datafile="D:\data\indexcsv\3index_all.part&k..csv" dbms=csv replace;
run;

data sd.index;
set sd.index index;
run;

%end;

%mend;

%indexes();

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

It is entirely possible that the answer is "none of the above".  You need to know something about your data.  You might find, for example, that SECName has longer values in the last file that you need to read.  You can't just modify the generated DATA step code from one import, and assume that it applies equally well to all the other files.

 

If you do know the longest possible value for SECName among all your files, you can fix the program easily enough.  If $30 is the longest length needed, change a piece at the end of your code by adding a LENGTH statement:

 

data sd.index;

length SECName $ 30;

set sd.index index;

run;

View solution in original post

6 REPLIES 6
error_prone
Barite | Level 11

Replace proc import with a data-step, so that all variables have the same definition. Another benefit: you can import all files having the same structure with one data-step.

You will find many comments in the community explaining why it is not a good idea to use in production-code.

ballardw
Super User

If you must use Proc import with delimited files (the data step is a much better idea as @error_prone suggests) at least add the guessingrows=max option.

The Proc Import code by default only examines about 20 lines of data before guessing as to a variables type, length and for date, time and datetime values the informat.

 

Using guessingrows will look at more rows and likely reduce the frequency of this problem.

 

Better is to import one file and then copy the data step code generated by Proc Import from the log, paste into the editor (remove line numbers) and verify the informat type and lengths. It may also be a good idea to examine the created variable names and apply labels.

Then use the same code to read each file changing the infile as needed.

Astounding
PROC Star

It is entirely possible that the answer is "none of the above".  You need to know something about your data.  You might find, for example, that SECName has longer values in the last file that you need to read.  You can't just modify the generated DATA step code from one import, and assume that it applies equally well to all the other files.

 

If you do know the longest possible value for SECName among all your files, you can fix the program easily enough.  If $30 is the longest length needed, change a piece at the end of your code by adding a LENGTH statement:

 

data sd.index;

length SECName $ 30;

set sd.index index;

run;

Patrick
Opal | Level 21

@supmilk

Others already made the point that you need to know your data and that using Proc Import doesn't give you full control over the outcome.

Guessingrow=max will at least ensure that you read the data correctly from a single source.

 

Below two coding options for "synching" column lengths when combining data from multiple sources. That's of course only the "2nd best" option and reading the data with a data step where you specify the column attributes upfront should always be your first choice.

data haveA;
  length var $1; 
  var='A';
run;
data haveB;
  length var $2; 
  XX=5;
  VAR='BB';
run;


/* option 1: Target table will only contain variables common to all source tables */
/*           Column length will be max length found in any of the source table    */
proc sql;
  create table want1 as
  select * from haveA
  union all corr
  select * from haveB
  ;
quit;


/* option 2: Pre-process to determine max length of common character variables           */
/*           Then generate a length statement to ensure max length of variables get used */
proc sql;
  select 
    case 
      when type='char' then strip(name)||' $'||put(ml,f6. -l)
      when type='num'  then strip(name)||' '||put(ml,f6. -l)
      else ' '
      end
    into :var_lengths separated by ' '
  from
    (
      select propcase(name) as name, type, max(length) as ml, max(varnum) as mvn
      from dictionary.columns
      where libname='WORK' and memname in ('HAVEA','HAVEB')
      group by name, type
    )
  order by mvn
  ;
quit;

data want2;
  length &var_lengths;
  set haveA haveB;
run;

 

supmilk
Obsidian | Level 7
Thank you very much for your help! And it does work!
MasoomAkram
Calcite | Level 5

This is by far the best approach.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 34298 views
  • 6 likes
  • 6 in conversation