I want to merge multiple datasets with multiple variables but the length of the character variables are different in each dataset. I created an example that works but I want to know how I can improve it to make it easier to apply when the number of datasets is 5-10 with 20 variables to check for length.
Thanks for your thoughts
data d1;
length var1 $10. var2 $10.;
input var1 var2;
cards;
apple carrots
orange cabbage
;
data d2;
length var1 $30. var2 $30.;
input var1 var2;
datalines;
apple-pie-recipe carrot-soup
orange-juice-for-breakfast cabbage-soup-for-supper
;
data d1_i (drop= var1_old var2_old);
set d1(rename= (var1=var1_old var2=var2_old));
length var1 $30.;
var1=var1_old;
length var2 $30.;
var2=var2_old;
run;
data want;
set d1_i d2;
run;
The best way is using PROC SQL.
data d1; length var1 $10. var2 $10.; input var1 var2; cards; apple carrots orange cabbage ; data d2; length var1 $30. var2 $30.; input var1 var2; datalines; apple-pie-recipe carrot-soup orange-juice-for-breakfast cabbage-soup-for-supper ; proc sql; create table want as select * from d1 union all corr select * from d2; quit;
Find the maximum lengths and just add a length statement before the SET.
So something like:
data want;
length var1 $30 var2 $30 ;
set d1 d2;
run;
You can generate that by querying the metadata for the source tables.
proc sql noprint;
select distinct catx(' ',name,cats('$',length))
into :lengths separated by ' '
from
(select upcase(name) as unique_name,min(name) as name,max(length) as length
from dictionary.columns
where libname='WORK'
and memname in ('D1' 'D2')
and type='char'
group by unique_name
)
;
%let nchar=&sqlobs;
quit;
And then you can add that LENGTH statement to your step that combines the dataset.
You might also want to remove any formats that are attached to prevent things like a variable with length 200 using a format of only $30. which will cause the values to appear truncated even when they are not.
data want;
%if &nchar %then %do;
length &lengths;
format _character_ ;
%end;
set d1 d2;
run;
PS You don't need to include a period when specifying the length of a variable. Variables can only have integer lengths.
It would probably be better to create the original dataset with consistent lengths for the variables.
What is the source of those dataset and why does the same variable end up having a different length in different datasets?
PROC IMPORT of an SPSS files should replicate the structure in the SPSS file pretty well.
But PROC IMPORT of an EXCEL file cannot since there is no concept of a variable in EXCEL. Every cell can be totally independent from each other cell.
If you are getting files from EXCEL you will have more control if they files are delivered as delimited text files. Then you can write your own data step in SAS to read file so that you have full control over the type, length, name and other attributes of the variables.
Be careful with your wording. In SAS parlance, merge means putting data side-by-side. What you want is concatenating, stacking or appending datasets.
The best way is using PROC SQL.
data d1; length var1 $10. var2 $10.; input var1 var2; cards; apple carrots orange cabbage ; data d2; length var1 $30. var2 $30.; input var1 var2; datalines; apple-pie-recipe carrot-soup orange-juice-for-breakfast cabbage-soup-for-supper ; proc sql; create table want as select * from d1 union all corr select * from d2; quit;
Woe, thank you @Ksharp ! I am glad I asked. It is much easier than changing the length one by one!
Am I right that
union all corr
does the job of increasing the length to the longer one automatically? Thanks!
@Ksharp May I please ask one more question. How do I modify the code to stack 5 datasets? I tried adding lines like
select * from d3;
select * from d4;
select * from d5;
but that did not work. Any hint to avoid adding datasets altogether instead of one at a time? Thanks!
@Ksharp I hope this is the last question. How can I keep the variables in each dataset that do not match (other than var1 and var2) but I want to keep in the stacked dataset? Thanks!!!
data d1;
length var1 $10. var2 $10. var3 8;
input var1 var2 var3;
cards;
apple carrots 99
orange cabbage 88
;
data d2;
length var1 $30. var2 $30. var4 8 var5 8;
input var1 var2 var4 var5;
datalines;
apple-pie-recipe carrot-soup 100 111
orange-juice-for-breakfast cabbage-soup-for-supper 200 222
;
proc sql;
create table want as
select * from d1
union all corr
select * from d2;
quit;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.