BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sits
Calcite | Level 5

#SASBeginnerAlert      #SASBeginnerAlert

Hi, I'm trying to use the length statement just like this:

data unifieddataset;

length variable1 $100;

set original_dataset1 original_dataset2;

run;

Where the two original datasets both have variable1.   original_dataset1 has values for variable1 that are 20 chars long, and  original_dataset2 has values for variable1 that are 50 chars long.  When I run this, the length of variable1 in unifieddataset comes to 20 chars long, and truncates the longer values from original_dataset2, which, in my beginner's ignorance, bewilders me as to why the length statement is ignored.

Documentation I've seen online pretty much shows what I've written above:

Step-by-Step Programming with Base SAS(R) Software

Surely the error I'm making is fundamentally simple, and I'd really appreciate someone telling me what it is I'm doing wrong.

Joe

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Assigning formats to character variables is a bad idea, as your example illustrates. Unfortunately some SAS processes (PROC IMPORT or SQL connections to external databases) have a nasty habit of attaching $nn formats to every character variable.  You can remove them by using a "naked" format statement, a format statement that lists variables, but no format.  Informats are not much of an issue as they are really not used unless you are reading in text files.


data unifieddataset;

  length variable1 $100;

  set original_dataset1 original_dataset2;

  format variable1 ;

run;

View solution in original post

10 REPLIES 10
Jagadishkatam
Amethyst | Level 16

Hi Joe,

I believe, there should not be any problem with the code what you have written. it is right. The length statement should work. It should give the variable a length of $100.

i just tried a sample code

data dsn1;

    variable1="cha";

run;

data dsn2;

    variable1="character";

run;

data want;

    length variable1 $10;

    set dsn1 dsn2;

run;

It example is similar to what you have queried and for me it worked without any problem. The variable1 length in the want dataset is $10.

Thanks,

Jagadish

Thanks,
Jag
sits
Calcite | Level 5

Thanks for your help Jagadish, however I've just discovered that my two original datasets have formats and informats that are inherited by unifieddataset.  So whilst I set the length to $100 on unifieddataset, the informat and format remain as they were in original_dataset1.  To fix this, I also set the format and informat on unifieddataset:

data unifieddataset;

length variable1 $100;

informat variable1 $100.;

format variable1 $100.;

set original_dataset1 original_dataset2;

run;

Thanks for your help!

Joe

Jagadishkatam
Amethyst | Level 16

Hi Joe,

Not sure how far the statements

informat variable1 $100.;

format variable1 $100.;

helped you, because what i believe is if there are character data then length statement should be enough to solve the problem. Could you please let me know the data that is present in variable1 for better understanding.

Thanks,

Jagadish

Thanks,
Jag
sits
Calcite | Level 5

Sure:

proc print data=original_dataset1;

run;

proc print data=original_dataset2;

run;

proc print data=unifieddataset;

run;

Results:

Obs      dummy_data1      dummy_data2

1      aosidfjaiosdosidjf      Tuesday

Obs     dummy_data1     dummy_data2

1      aosidfjaiosdjfaiosjdfioasjdofijasodifjaosidjfaiosdjfoiasjdfoiasjdfioasjdfiojasdiofjasiodjfas      Monday

Obs     dummy_data1     dummy_data2

1     aosidfjaiosdosidjf     Tuesday

2     aosidfjaiosdjfaios     Monday

The Length, Informat and Format for dummy_data1 in original_dataset1 is 175, $175. and $175.  For original_dataset2, they're 18, $18. and $18.  Using the original code I pasted, for unifieddataset I get 100, $18. and $18 and the above values.

(FYI, the string you see in the above results is "WARNING: Data too long for column "dummy_data1"; truncated to 92 characters to fit."; the actual string 175 chars long.)

When I set the Informat and Format to, say, 200, then it all fits:

proc print data=unifieddataset;

run;

Obs      dummy_data1      dummy_data2

1      aosidfjaiosdosidjf      Tuesday

2      aosidfjaiosdjfaiosjdfioasjdofijasodifjaosidjfaiosdjfoiasjdfoiasjdfioasjdfiojasdiofjasiodjfas      Monday

Tom
Super User Tom
Super User

Much better off just removing the format than assigning some random value like $200. That will just cause more confusion later on.

Jagadishkatam
Amethyst | Level 16

Thank you Joe

Thanks,
Jag
Tom
Super User Tom
Super User

Assigning formats to character variables is a bad idea, as your example illustrates. Unfortunately some SAS processes (PROC IMPORT or SQL connections to external databases) have a nasty habit of attaching $nn formats to every character variable.  You can remove them by using a "naked" format statement, a format statement that lists variables, but no format.  Informats are not much of an issue as they are really not used unless you are reading in text files.


data unifieddataset;

  length variable1 $100;

  set original_dataset1 original_dataset2;

  format variable1 ;

run;

sits
Calcite | Level 5

Oh brilliant!  Indeed, I am using PROC IMPORTs to import my data (which, I promise you, is more than just a random arrangement of the eight home-keys on my keyboard that you see above).

Thanks for clarifying what is happening here!  Everything I had been reading online about using the Length statement to help with the consolidation of the same variable with varying lengths from different originating datasets wasn't working.

Joe

Tom
Super User Tom
Super User

I would recommend adding a step to remove the formats from your character variable after using PROC IMPORT.   For example to remove the formats from all character variables in a dataset you could use PROC DATASETS.

proc datasets lib=work nolist;

  modify x;

  format _character_ ;

  informat _character_ ;

run;

sits
Calcite | Level 5

Indeed -- before your answer I was thinking that having to set it to something enormous (>$200) was an ugly solution, but it got me past several hours of frustration so I wasn't complaining. Smiley Happy

Also, thanks for pointing out PROC DATASETS for me -- I'll look in to it.

Thanks to you both!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 3011 views
  • 4 likes
  • 3 in conversation