BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mirisage
Obsidian | Level 7

Hi Colleagues,

I have concatenated the attached two data sets.

Step I - Sorting 2 data sets

proc sort data=a.book1 out=a.book1;

     by bank_number account_number current_date ;

run;

proc sort data=a.book2  out=a.book2;

     by bank_number account_number current_date ;

run;

Step 2

/* Carrying out "union of two datasets vertically by Account group", Ksharp is acknowledged.*/

data dataset;

set a.book1

      a.book2 (in=inb  rename=(account_name=short_name Net_Write_off_total_USD=balance));

by bank_number account_number current_date;

if inb then Arrears_Band='writoff ';

run;

It seems the concatenation happens correctly but the following message appears in the log.

WARNING: Multiple lengths were specified for the variable Short_name by input data set(s). This may cause truncation of data.

i) Could you please let me know why?

ii) And please help me to get rid of.

Thanks

Mirisage

1 ACCEPTED SOLUTION

Accepted Solutions
Linlin
Lapis Lazuli | Level 10

you can find length of a variable from proc contents.

example:

proc contents data=sashelp.class;run;

output:

                    Alphabetic List of Variables and Attributes

                           #    Variable    Type    Len

                           3    Age         Num       8

                           4    Height      Num       8

                           1    Name        Char     8

                           2    Sex         Char      1

                           5    Weight      Num       8

View solution in original post

8 REPLIES 8
Linlin
Lapis Lazuli | Level 10

solution:

you need to find out the length of variable "short_name" in book1 and the length of account_name in book2.

if length of short_name=8 and length of account_name=12, then add

"length short_name $ 12;" to your code:

data dataset;

length short_name $ 12;

set a.book1

      a.book2 (in=inb  rename=(account_name=short_name Net_Write_off_total_USD=balance));

by bank_number account_number current_date;

if inb then Arrears_Band='writoff ';

run;

Mirisage
Obsidian | Level 7

Hi Linlin,

Many thanks.

Actually, I get 3 warnings like below when I have run my original data sets (but I have posted only two obs taking from my data set).

WARNING: Multiple lengths were specified for the BY variable Account_number by input data sets. This may cause unexpected results.

WARNING: Multiple lengths were specified for the variable Short_name by input data set(s). This may cause truncation of data.

WARNING: Multiple lengths were specified for the variable Country by input data set(s). This may cause truncation of data.

Coming back to the two data sets I have attahced, the values of variable "short_name" in book1 are :

AACKMAA EEEEEE W

When I manualy count the number of characters in this, there are 16 including the spaces between name pieces.

Question:

So, 16 is the length of value “AACKMAA EEEEEE W”, isn't it?

Let us say there is a name like below  (without quotes) in my original larger data set,  we never know!.

“Joseph Peter Dennis Fernando Tores Arnold”

In this case, the length is 41 including the spaces, isn't it?.

But we cannot test this by trail and error (sorry I am a novice).

Would there be any easy method to find out the length of the longest value of a character variable in a given data set?

Also Would there be any easy method to find out the length of the longest value of a numeric variable in a given data set?

Thank you

Mirisage

Linlin
Lapis Lazuli | Level 10

you can find length of a variable from proc contents.

example:

proc contents data=sashelp.class;run;

output:

                    Alphabetic List of Variables and Attributes

                           #    Variable    Type    Len

                           3    Age         Num       8

                           4    Height      Num       8

                           1    Name        Char     8

                           2    Sex         Char      1

                           5    Weight      Num       8

GeoffreyBrent
Calcite | Level 5

If you have the data set open, you can also check the length of a character variable by mousing over the column header. After a little while a tooltip will pop up with variable name, length, and other properties.

Mirisage
Obsidian | Level 7

Hi Linilin,

Many thanks for the proc contents clue.

Hi Tom,

This is an elegant way of doing it.

Hi SASKiwi,

Your example code to find out the “maximum length of the contents of a variable” is what I have been looking for sometime. And maximum length of a numeric variable explanation is nice – Thanks

Hi Astounding, many thanks for your examples.

Hi GeoffreyBrent – Your short cut – mousing- works well. Thank you very much for this.

In addition, I explored a bit on :

“options varlenchk=nowarn;”

This option works only for “non By” variables

/*WARNING: Multiple lengths were specified for the BY variable Account_number by

input data sets. This may cause unexpected results.

Above  is the wording of warning for By variables

WARNING: Multiple lengths were specified for the variable Short_name by input

data set(s). This may cause truncation of data.

Above  is the wording of warning for "non BY" variables

WARNING: Multiple lengths were specified for the variable Country by input

data set(s). This may cause truncation of data.*/

Above is the wording of warning for "non BY" variables

Best regards to all,

Mirisage

Tom
Super User Tom
Super User

You are trying to put the value of ACCOUNT_NAME from BOOK2 dataset into the variable SHORT_NAME from the BOOK1 dataset, but they are defined with different lengths.   Easiest fix it make BOOK1 and BOOK2 consistent so that they use the same variable names and attributes.

Since they have different names it should be easy to fix in data step that combines them.

Actually in your data SHORT_NAME is length 16 and ACCOUNT_NAME is length 18 so you might want to reverse the logic so that ACCOUNT_NAME is kept and SHORT_NAME is dropped.

data dataset;

  set a.book1

      a.book2 (in=inb)

  ;

by bank_number account_number current_date;

if inb then do ;

  Arrears_Band='writoff ';

  short_name=account_name;

  balance=Net_Write_off_total_USD;

end;

drop account_name Net_Write_off_total_USD ;

run;

SASKiwi
PROC Star

There appear to be two issues here:

1) The amount of storage for a variable as defined in a dataset - Linlin's post explains how you find that out.

2) The maximum length of the contents of a variable. For a character variable this can be easily determined:

proc sql;

  select max(length(name))

  from sashelp.class;

quit;

The maximum length of a numeric variable has no meaning generally speaking as it depends how you display a variable:

For example:

proc print data = sashelp.class;

  var age;

  format age 10.6; if age is 10 it will be displayed as 10.000000 <== number has a "length" of 9 digits including the decimal place

  format age 10.2; if age is 10 it will be displayed as 10.00 <== number has a "length" of 5 digits including the decimal place

run;

As you can see changing how the same number is displayed can change it's "length" in terms of digits.

For numeric variables you don't normally change the amount of storage from the default of 8 bytes (equivalent of characters). 8 bytes will hold up to 16 digits accurately including decimal places, so as a beginner I would recommend you do not change this as you can easily get strange results.

Astounding
PROC Star

You have two data sets holding the same variable name, but with a different length.  The best solution would be to fix the data.  Otherwise, you will face this same issue in every program that combines the two data sets.  Fixing the data is relatively easy.  As suggested, use PROC CONTENTS to find the length of the variable in each data set.  Then for the shorter variable, lengthen it:

data a.book2;

   length account_name $ 30 /* or whatever the longer length is */;

   set a.book2;

run;

If you can't fix the data, there are simple but less automated ways to program around it.  For example, you could switch the order in the SET statement so the data set with the longer variable is named first:

set a.book2 a.book1;

Or you could add a length statement in the DATA step that combines them:

data dataset;

   length short_name $ 30 /* or whatever the longer length is */;

   set .....;

run;

It is wise not to ignore this message!

Good luck.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 3523 views
  • 0 likes
  • 6 in conversation