Hi Colleagues,
I have concatenated the attached two data sets.
Step I - Sorting 2 data sets
proc sort data=a.book1 out=a.book1;
by bank_number account_number current_date ;
run;
proc sort data=a.book2 out=a.book2;
by bank_number account_number current_date ;
run;
Step 2
/* Carrying out "union of two datasets vertically by Account group", Ksharp is acknowledged.*/
data dataset;
set a.book1
a.book2 (in=inb rename=(account_name=short_name Net_Write_off_total_USD=balance));
by bank_number account_number current_date;
if inb then Arrears_Band='writoff ';
run;
It seems the concatenation happens correctly but the following message appears in the log.
WARNING: Multiple lengths were specified for the variable Short_name by input data set(s). This may cause truncation of data.
i) Could you please let me know why?
ii) And please help me to get rid of.
Thanks
Mirisage
you can find length of a variable from proc contents.
example:
proc contents data=sashelp.class;run;
output:
Alphabetic List of Variables and Attributes
# Variable Type Len
3 Age Num 8
4 Height Num 8
1 Name Char 8
2 Sex Char 1
5 Weight Num 8
solution:
you need to find out the length of variable "short_name" in book1 and the length of account_name in book2.
if length of short_name=8 and length of account_name=12, then add
"length short_name $ 12;" to your code:
data dataset;
length short_name $ 12;
set a.book1
a.book2 (in=inb rename=(account_name=short_name Net_Write_off_total_USD=balance));
by bank_number account_number current_date;
if inb then Arrears_Band='writoff ';
run;
Hi Linlin,
Many thanks.
Actually, I get 3 warnings like below when I have run my original data sets (but I have posted only two obs taking from my data set).
WARNING: Multiple lengths were specified for the BY variable Account_number by input data sets. This may cause unexpected results.
WARNING: Multiple lengths were specified for the variable Short_name by input data set(s). This may cause truncation of data.
WARNING: Multiple lengths were specified for the variable Country by input data set(s). This may cause truncation of data.
Coming back to the two data sets I have attahced, the values of variable "short_name" in book1 are :
AACKMAA EEEEEE W
When I manualy count the number of characters in this, there are 16 including the spaces between name pieces.
Question:
So, 16 is the length of value “AACKMAA EEEEEE W”, isn't it?
Let us say there is a name like below (without quotes) in my original larger data set, we never know!.
“Joseph Peter Dennis Fernando Tores Arnold”
In this case, the length is 41 including the spaces, isn't it?.
But we cannot test this by trail and error (sorry I am a novice).
Would there be any easy method to find out the length of the longest value of a character variable in a given data set?
Also Would there be any easy method to find out the length of the longest value of a numeric variable in a given data set?
Thank you
Mirisage
you can find length of a variable from proc contents.
example:
proc contents data=sashelp.class;run;
output:
Alphabetic List of Variables and Attributes
# Variable Type Len
3 Age Num 8
4 Height Num 8
1 Name Char 8
2 Sex Char 1
5 Weight Num 8
If you have the data set open, you can also check the length of a character variable by mousing over the column header. After a little while a tooltip will pop up with variable name, length, and other properties.
Hi Linilin,
Many thanks for the proc contents clue.
Hi Tom,
This is an elegant way of doing it.
Hi SASKiwi,
Your example code to find out the “maximum length of the contents of a variable” is what I have been looking for sometime. And maximum length of a numeric variable explanation is nice – Thanks
Hi Astounding, many thanks for your examples.
Hi GeoffreyBrent – Your short cut – mousing- works well. Thank you very much for this.
In addition, I explored a bit on :
“options varlenchk=nowarn;”
This option works only for “non By” variables
/*WARNING: Multiple lengths were specified for the BY variable Account_number by
input data sets. This may cause unexpected results.
Above is the wording of warning for By variables
WARNING: Multiple lengths were specified for the variable Short_name by input
data set(s). This may cause truncation of data.
Above is the wording of warning for "non BY" variables
WARNING: Multiple lengths were specified for the variable Country by input
data set(s). This may cause truncation of data.*/
Above is the wording of warning for "non BY" variables
Best regards to all,
Mirisage
You are trying to put the value of ACCOUNT_NAME from BOOK2 dataset into the variable SHORT_NAME from the BOOK1 dataset, but they are defined with different lengths. Easiest fix it make BOOK1 and BOOK2 consistent so that they use the same variable names and attributes.
Since they have different names it should be easy to fix in data step that combines them.
Actually in your data SHORT_NAME is length 16 and ACCOUNT_NAME is length 18 so you might want to reverse the logic so that ACCOUNT_NAME is kept and SHORT_NAME is dropped.
data dataset;
set a.book1
a.book2 (in=inb)
;
by bank_number account_number current_date;
if inb then do ;
Arrears_Band='writoff ';
short_name=account_name;
balance=Net_Write_off_total_USD;
end;
drop account_name Net_Write_off_total_USD ;
run;
There appear to be two issues here:
1) The amount of storage for a variable as defined in a dataset - Linlin's post explains how you find that out.
2) The maximum length of the contents of a variable. For a character variable this can be easily determined:
proc sql;
select max(length(name))
from sashelp.class;
quit;
The maximum length of a numeric variable has no meaning generally speaking as it depends how you display a variable:
For example:
proc print data = sashelp.class;
var age;
format age 10.6; if age is 10 it will be displayed as 10.000000 <== number has a "length" of 9 digits including the decimal place
format age 10.2; if age is 10 it will be displayed as 10.00 <== number has a "length" of 5 digits including the decimal place
run;
As you can see changing how the same number is displayed can change it's "length" in terms of digits.
For numeric variables you don't normally change the amount of storage from the default of 8 bytes (equivalent of characters). 8 bytes will hold up to 16 digits accurately including decimal places, so as a beginner I would recommend you do not change this as you can easily get strange results.
You have two data sets holding the same variable name, but with a different length. The best solution would be to fix the data. Otherwise, you will face this same issue in every program that combines the two data sets. Fixing the data is relatively easy. As suggested, use PROC CONTENTS to find the length of the variable in each data set. Then for the shorter variable, lengthen it:
data a.book2;
length account_name $ 30 /* or whatever the longer length is */;
set a.book2;
run;
If you can't fix the data, there are simple but less automated ways to program around it. For example, you could switch the order in the SET statement so the data set with the longer variable is named first:
set a.book2 a.book1;
Or you could add a length statement in the DATA step that combines them:
data dataset;
length short_name $ 30 /* or whatever the longer length is */;
set .....;
run;
It is wise not to ignore this message!
Good luck.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.