#SASBeginnerAlert #SASBeginnerAlert
Hi, I'm trying to use the length statement just like this:
data unifieddataset;
length variable1 $100;
set original_dataset1 original_dataset2;
run;
Where the two original datasets both have variable1. original_dataset1 has values for variable1 that are 20 chars long, and original_dataset2 has values for variable1 that are 50 chars long. When I run this, the length of variable1 in unifieddataset comes to 20 chars long, and truncates the longer values from original_dataset2, which, in my beginner's ignorance, bewilders me as to why the length statement is ignored.
Documentation I've seen online pretty much shows what I've written above:
Step-by-Step Programming with Base SAS(R) Software
Surely the error I'm making is fundamentally simple, and I'd really appreciate someone telling me what it is I'm doing wrong.
Joe
Assigning formats to character variables is a bad idea, as your example illustrates. Unfortunately some SAS processes (PROC IMPORT or SQL connections to external databases) have a nasty habit of attaching $nn formats to every character variable. You can remove them by using a "naked" format statement, a format statement that lists variables, but no format. Informats are not much of an issue as they are really not used unless you are reading in text files.
data unifieddataset;
length variable1 $100;
set original_dataset1 original_dataset2;
format variable1 ;
run;
Hi Joe,
I believe, there should not be any problem with the code what you have written. it is right. The length statement should work. It should give the variable a length of $100.
i just tried a sample code
data dsn1;
variable1="cha";
run;
data dsn2;
variable1="character";
run;
data want;
length variable1 $10;
set dsn1 dsn2;
run;
It example is similar to what you have queried and for me it worked without any problem. The variable1 length in the want dataset is $10.
Thanks,
Jagadish
Thanks for your help Jagadish, however I've just discovered that my two original datasets have formats and informats that are inherited by unifieddataset. So whilst I set the length to $100 on unifieddataset, the informat and format remain as they were in original_dataset1. To fix this, I also set the format and informat on unifieddataset:
data unifieddataset;
length variable1 $100;
informat variable1 $100.;
format variable1 $100.;
set original_dataset1 original_dataset2;
run;
Thanks for your help!
Joe
Hi Joe,
Not sure how far the statements
informat variable1 $100.;
format variable1 $100.;
helped you, because what i believe is if there are character data then length statement should be enough to solve the problem. Could you please let me know the data that is present in variable1 for better understanding.
Thanks,
Jagadish
Sure:
proc print data=original_dataset1;
run;
proc print data=original_dataset2;
run;
proc print data=unifieddataset;
run;
Results:
Obs dummy_data1 dummy_data2
1 aosidfjaiosdosidjf Tuesday
Obs dummy_data1 dummy_data2
1 aosidfjaiosdjfaiosjdfioasjdofijasodifjaosidjfaiosdjfoiasjdfoiasjdfioasjdfiojasdiofjasiodjfas Monday
Obs dummy_data1 dummy_data2
1 aosidfjaiosdosidjf Tuesday
2 aosidfjaiosdjfaios Monday
The Length, Informat and Format for dummy_data1 in original_dataset1 is 175, $175. and $175. For original_dataset2, they're 18, $18. and $18. Using the original code I pasted, for unifieddataset I get 100, $18. and $18 and the above values.
(FYI, the string you see in the above results is "WARNING: Data too long for column "dummy_data1"; truncated to 92 characters to fit."; the actual string 175 chars long.)
When I set the Informat and Format to, say, 200, then it all fits:
proc print data=unifieddataset;
run;
Obs dummy_data1 dummy_data2
1 aosidfjaiosdosidjf Tuesday
2 aosidfjaiosdjfaiosjdfioasjdofijasodifjaosidjfaiosdjfoiasjdfoiasjdfioasjdfiojasdiofjasiodjfas Monday
Much better off just removing the format than assigning some random value like $200. That will just cause more confusion later on.
Thank you Joe
Assigning formats to character variables is a bad idea, as your example illustrates. Unfortunately some SAS processes (PROC IMPORT or SQL connections to external databases) have a nasty habit of attaching $nn formats to every character variable. You can remove them by using a "naked" format statement, a format statement that lists variables, but no format. Informats are not much of an issue as they are really not used unless you are reading in text files.
data unifieddataset;
length variable1 $100;
set original_dataset1 original_dataset2;
format variable1 ;
run;
Oh brilliant! Indeed, I am using PROC IMPORTs to import my data (which, I promise you, is more than just a random arrangement of the eight home-keys on my keyboard that you see above).
Thanks for clarifying what is happening here! Everything I had been reading online about using the Length statement to help with the consolidation of the same variable with varying lengths from different originating datasets wasn't working.
Joe
I would recommend adding a step to remove the formats from your character variable after using PROC IMPORT. For example to remove the formats from all character variables in a dataset you could use PROC DATASETS.
proc datasets lib=work nolist;
modify x;
format _character_ ;
informat _character_ ;
run;
Indeed -- before your answer I was thinking that having to set it to something enormous (>$200) was an ugly solution, but it got me past several hours of frustration so I wasn't complaining.
Also, thanks for pointing out PROC DATASETS for me -- I'll look in to it.
Thanks to you both!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.