BookmarkSubscribeRSS Feed
gusfryk01
Calcite | Level 5

    On combining different datasets into one I get the following message in the log:

WARNING: Multiple lengths were specified for the variable x by input data set(s). This

         can cause truncation of data.

x is a character variable and I don´t know how to change the length or in what dataset to change the length prior to combining them.

Can someone help?

11 REPLIES 11
Ksharp
Super User

Here is an example:

data have1;
input name : $2. x  $ id;
cards;
NB sds 1
RN sdft 1
;
run;
data have2;
input name $ x : $14. id;
cards;
NB sdsfsddfs 2
RN sdsdfsdfsder 3
BOTH sdsds 4
;
run;
/*First,we get the max length of variables in every tables */
proc sql;
create table temp as
 select memname,name,max(length) as max_len
  from dictionary.columns
   where libname='WORK' and memname like 'HAVE%' and type='char' 
    group by name 
     order by memname,name;
quit;
/* then we use sql to change the variable's length fastly and easily.
we need to generate the sql code like:
proc sql;
alter table have1
 modify name char(8),x char(14);
alter table have2
 modify name char(8),x char(14);
.........
quit;
*/
data _null_;
 set temp end=last;
 by memname;
 if _n_ eq 1 then call execute('proc sql;');
 if first.memname then call execute('alter table '||memname||' modify ');
 call execute(name||' char('||put(max_len,8.)||')');
 if not last.memname then call execute(',');
   else  call execute(';');
 if last then call execute('quit;');
run;
/*finally, get what we want */
data want;
 set have1 have2 ;
 run;


Xia Keshan

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Or you could just do:

data have1;

input name : $2. x  $ id;

cards;

NB sds 1

RN sdft 1

;

run;

data have2;

input name $ x : $14. id;

cards;

NB sdsfsddfs 2

RN sdsdfsdfsder 3

BOTH sdsds 4

;

run;

proc sql;

  create table WANT as

  select  *

  from    HAVE1

  union all

  select  *

  from    HAVE2;

quit;

Ksharp
Super User

Ho, Not realize SQL will take the max length of variables smartly. Good to know.

P.S : One thing need to say, UNION can only handle 255 tables .

Message was edited by: xia keshan

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yep, it does in most (if not all cases).  I suppose from a theory side though you should really know your input and output data well enough to not be guessing.  For instance using SDTM structure most columns have a predefined length so you would know regardless of what length is coming in the output should be xx.  If I was writing the code properly I would specify the length on the first select so all others adhere to that:

proc sql;

  create table WANT as

  select  NAME length=20 label="abc",

             X length=20 label="X",

              ID length=8 label="ID"

  from    HAVE1

  union all

  select  *

  from    HAVE2;

quit;

ballardw
Super User

An easy way if you actually incur truncation is to set the length of the offending variable(s) before the set statement:

Data want;

     length x $ 25 ; /* a number of characters at least as large as the longest of the contributing data sets*/

     set datasetone datasettwo;

/*other code*/

run;

If the longer version is in the first data set of the Set statement then truncation won't occur but you still get the different lengths warning.

Ksharp
Super User

ballardw

But that will change the order of variables . and it is not convenient if there are lots of character variables.

jakarman
Barite | Level 11

@ksharp, Can you give the documentation for the internal structure of a sas-dataset? As far I know this is propierty and only by reverse engineering you see some things.

The physical order of variables in a dataset is not well defined. In a RDBMS its is DBA job to do something about that. The SQL concept is designed being not aware of those physical aspects. So how would you predict physical aspects using SQL?

For SAS-datatsets I have seen many releases. At one time (long ago) being surprised by those internal changes. Internal boundary settings being aligned to a fullword (64-bit) position for the numeric part and the characters seperated by that as a long concatenation. With encoding I mind (utf-8 utf-16) that can give also some nice aspects to behavior.

Bottom line is when changing the lay out of just one variable you have to process to whole dataset/table (schema/database).

---->-- ja karman --<-----
Ksharp
Super User

@Jaap

Sorry. I can't offer my help. The topic you talk about is too profound and broad for my knowledge .The best way is you should talk to someone worked in SAS company .

Best.

Xia Keshan

jakarman
Barite | Level 11

@Xia I replied to you as you did the comment on the order of variables. 

As you did that, your only reason could that physical structure. As that is not the case ballardw approach would be correct that is all.

---->-- ja karman --<-----
Tom
Super User Tom
Super User

The order of the variables is not just of concern to the underlying storage structure.  In fact I do not care if the underlying physical structure moves the location of the variables around to have them match word boundaries or other concerns.  But I DO care that the logical order of the variables in the table matches what I want.  So that I can use PROC PRINT; RUN; or SELECT * FROM and the variable will appear in a logical order.

jakarman
Barite | Level 11

Good point Tom, The logical order as we often are seeing is also not very well defined.

For SQL that is part of the ANSI definition. Your are expected to define that when you are needing it.

The same is for the order of observations. When you need that to be ordered you are expected to define that.

Do not rely on behavior you accidently are experiencing (it is always RTFM)

O, that is what I have seen many times happening, the trial and error method of developing instead the one of understanding the tools. 

For variables, sometimes we are realy needing them to be in a order as with array-processing in the SAS datastep.

---->-- ja karman --<-----

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 2333 views
  • 4 likes
  • 6 in conversation