Hello Friends,
I want to know whether we can change the length of a variable before we do union on them in SAS data step.
EX:
Dataset1
Name ID
---------- ---
NB 1
RN 1
DATASET2
Name ID
---------- ---
NB 2
RN 3
BOTH 4
The output am getting is,
Name ID
---------- ---
NB 1
RN 1
NB 2
RN 3
BO 4
The output I desire is
Name ID
---------- ---
NB 1
RN 1
NB 2
RN 3
BOTH 4
My Code:
DATA THREE;
LENGTH NAME $ 4;
SET ONE TWO;
RUN;
Can any one let me know how this can be done through data steps.
Thanks
Check both the length and the format attached to the variables in your source data sets. So if the first dataset has $2. format attached then the new dataset will also. And most methods of examining the content of the new dataset will then truncate the value when it is displayed, even when the full value is there.
To remove any formats you can use the FORMAT statement that lists variables without any format.
DATA THREE;
LENGTH NAME $ 4;
SET ONE TWO;
FORMAT NAME ;
RUN;
SAS has a NASTY habit of permanently attaching $xx. formats to character variables that can lead to this type of confusion. For example when pulling from external databases. PROC IMPORT suffers with the same problem.
Is the length of 'name' in 'one' is 2 and in 'two' is 4? if that is the case, then your code looks fine, so I don't think we are looking at the whole picture here. Make sure they are all left aligned, then we can work from there.
Haikuo
Yeah, in DATASET1 length is 2 and in datasets2 it is 4. But in result set i am getting the length of 2.
Hi, I agree with Reeza. There is something very strange going on with your data or you did not post your -actual- code. You did not show the code for creating datasets 1 and 2. But when I run this code below and use explicit lengths, as you describe, I get a length of 4 for NAME in the final dataset. So, there is something else going on. PROC CONTENTS doesn't make up lengths...see the screen shot for the final report on the contents of WORK.DATASET3 and the PROC PRINT on the file.
Cynthia
** make one with explicit length of 2 for name;
data one;
length name $2;
infile datalines;
input name $ ID;
return;
datalines;
NB 1
RN 1
;
run;
ods listing;
proc contents data=One;
title 'Contents One';
run;
proc print data=one;
title 'One';
run;
** make two with explicit lenth of 4 for name;
data two;
length name $4;
infile datalines;
input name $ ID;
return;
datalines;
NB 2
RN 3
BOTH 4
;
run;
ods listing;
proc contents data=Two;
title 'Contents Two';
run;
proc print data=two;
title 'Two';
run;
** use same code to create dataset3;
data dataset3;
length name $4;
set one two;
run;
title; footnote;
ods listing;
proc contents data=dataset3;
title 'Contents dataset3';
run;
proc print data=dataset3;
title 'dataset3';
run;
title;
Maybe the LENGTH is changed but the FORMAT is still $2.
SAS takes the definition of the colunn from the first reference to it so it used the definition frim Table1.
To ensure it takes the widest is a bit more complex. However I like the simplicity of predefining the lengths.
Data;
length name $10 id 8 ;
set table1 table2 ;
Run.;
Check both the length and the format attached to the variables in your source data sets. So if the first dataset has $2. format attached then the new dataset will also. And most methods of examining the content of the new dataset will then truncate the value when it is displayed, even when the full value is there.
To remove any formats you can use the FORMAT statement that lists variables without any format.
DATA THREE;
LENGTH NAME $ 4;
SET ONE TWO;
FORMAT NAME ;
RUN;
SAS has a NASTY habit of permanently attaching $xx. formats to character variables that can lead to this type of confusion. For example when pulling from external databases. PROC IMPORT suffers with the same problem.
Thank you. I was struggling with this too. I had the length, but my informat was still too short... this worked! Thanks!
It would be easy to settle in SQL.
data have1; input name : $2. x $ id; cards; NB sds 1 RN sdft 1 ; run; data have2; input name $ x : $14. id; cards; NB sdsfsddfs 2 RN sdsdfsdfsder 3 BOTH sdsds 4 ; run; proc sql; create table temp as select memname,name,max(length) as max_len from dictionary.columns where libname='WORK' and memname like 'HAVE%' and type='char' group by name order by memname,name; quit; data _null_; set temp end=last; by memname; if _n_ eq 1 then call execute('proc sql;'); if first.memname then call execute('alter table '||memname||' modify '); call execute(name||' char('||put(max_len,8.)||')'); if not last.memname then call execute(','); else call execute(';'); if last then call execute('quit;'); run; data want; set have1 have2 ; by id; run;
Ksharp
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.