On combining different datasets into one I get the following message in the log:
WARNING: Multiple lengths were specified for the variable x by input data set(s). This
can cause truncation of data.
x is a character variable and I don´t know how to change the length or in what dataset to change the length prior to combining them.
Can someone help?
Here is an example:
data have1; input name : $2. x $ id; cards; NB sds 1 RN sdft 1 ; run; data have2; input name $ x : $14. id; cards; NB sdsfsddfs 2 RN sdsdfsdfsder 3 BOTH sdsds 4 ; run; /*First,we get the max length of variables in every tables */ proc sql; create table temp as select memname,name,max(length) as max_len from dictionary.columns where libname='WORK' and memname like 'HAVE%' and type='char' group by name order by memname,name; quit; /* then we use sql to change the variable's length fastly and easily. we need to generate the sql code like: proc sql; alter table have1 modify name char(8),x char(14); alter table have2 modify name char(8),x char(14); ......... quit; */ data _null_; set temp end=last; by memname; if _n_ eq 1 then call execute('proc sql;'); if first.memname then call execute('alter table '||memname||' modify '); call execute(name||' char('||put(max_len,8.)||')'); if not last.memname then call execute(','); else call execute(';'); if last then call execute('quit;'); run; /*finally, get what we want */ data want; set have1 have2 ; run;
Xia Keshan
Or you could just do:
data have1;
input name : $2. x $ id;
cards;
NB sds 1
RN sdft 1
;
run;
data have2;
input name $ x : $14. id;
cards;
NB sdsfsddfs 2
RN sdsdfsdfsder 3
BOTH sdsds 4
;
run;
proc sql;
create table WANT as
select *
from HAVE1
union all
select *
from HAVE2;
quit;
Ho, Not realize SQL will take the max length of variables smartly. Good to know.
P.S : One thing need to say, UNION can only handle 255 tables .
Message was edited by: xia keshan
Yep, it does in most (if not all cases). I suppose from a theory side though you should really know your input and output data well enough to not be guessing. For instance using SDTM structure most columns have a predefined length so you would know regardless of what length is coming in the output should be xx. If I was writing the code properly I would specify the length on the first select so all others adhere to that:
proc sql;
create table WANT as
select NAME length=20 label="abc",
X length=20 label="X",
ID length=8 label="ID"
from HAVE1
union all
select *
from HAVE2;
quit;
An easy way if you actually incur truncation is to set the length of the offending variable(s) before the set statement:
Data want;
length x $ 25 ; /* a number of characters at least as large as the longest of the contributing data sets*/
set datasetone datasettwo;
/*other code*/
run;
If the longer version is in the first data set of the Set statement then truncation won't occur but you still get the different lengths warning.
But that will change the order of variables . and it is not convenient if there are lots of character variables.
@ksharp, Can you give the documentation for the internal structure of a sas-dataset? As far I know this is propierty and only by reverse engineering you see some things.
The physical order of variables in a dataset is not well defined. In a RDBMS its is DBA job to do something about that. The SQL concept is designed being not aware of those physical aspects. So how would you predict physical aspects using SQL?
For SAS-datatsets I have seen many releases. At one time (long ago) being surprised by those internal changes. Internal boundary settings being aligned to a fullword (64-bit) position for the numeric part and the characters seperated by that as a long concatenation. With encoding I mind (utf-8 utf-16) that can give also some nice aspects to behavior.
Bottom line is when changing the lay out of just one variable you have to process to whole dataset/table (schema/database).
@Jaap
Sorry. I can't offer my help. The topic you talk about is too profound and broad for my knowledge .The best way is you should talk to someone worked in SAS company .
Best.
Xia Keshan
@Xia I replied to you as you did the comment on the order of variables.
As you did that, your only reason could that physical structure. As that is not the case ballardw approach would be correct that is all.
The order of the variables is not just of concern to the underlying storage structure. In fact I do not care if the underlying physical structure moves the location of the variables around to have them match word boundaries or other concerns. But I DO care that the logical order of the variables in the table matches what I want. So that I can use PROC PRINT; RUN; or SELECT * FROM and the variable will appear in a logical order.
Good point Tom, The logical order as we often are seeing is also not very well defined.
For SQL that is part of the ANSI definition. Your are expected to define that when you are needing it.
The same is for the order of observations. When you need that to be ordered you are expected to define that.
Do not rely on behavior you accidently are experiencing (it is always RTFM)
O, that is what I have seen many times happening, the trial and error method of developing instead the one of understanding the tools.
For variables, sometimes we are realy needing them to be in a order as with array-processing in the SAS datastep.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.