Help using Base SAS procedures

Using the Length Statement while combining two data sets

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 13
Accepted Solution

Using the Length Statement while combining two data sets

#SASBeginnerAlert      #SASBeginnerAlert

Hi, I'm trying to use the length statement just like this:

data unifieddataset;

length variable1 $100;

set original_dataset1 original_dataset2;

run;

Where the two original datasets both have variable1.   original_dataset1 has values for variable1 that are 20 chars long, and  original_dataset2 has values for variable1 that are 50 chars long.  When I run this, the length of variable1 in unifieddataset comes to 20 chars long, and truncates the longer values from original_dataset2, which, in my beginner's ignorance, bewilders me as to why the length statement is ignored.

Documentation I've seen online pretty much shows what I've written above:

Step-by-Step Programming with Base SAS(R) Software

Surely the error I'm making is fundamentally simple, and I'd really appreciate someone telling me what it is I'm doing wrong.

Joe


Accepted Solutions
Solution
‎03-18-2013 08:56 AM
Super User
Super User
Posts: 6,500

Re: Using the Length Statement while combining two data sets

Assigning formats to character variables is a bad idea, as your example illustrates. Unfortunately some SAS processes (PROC IMPORT or SQL connections to external databases) have a nasty habit of attaching $nn formats to every character variable.  You can remove them by using a "naked" format statement, a format statement that lists variables, but no format.  Informats are not much of an issue as they are really not used unless you are reading in text files.


data unifieddataset;

  length variable1 $100;

  set original_dataset1 original_dataset2;

  format variable1 ;

run;

View solution in original post


All Replies
Trusted Advisor
Posts: 1,128

Re: Using the Length Statement while combining two data sets

Hi Joe,

I believe, there should not be any problem with the code what you have written. it is right. The length statement should work. It should give the variable a length of $100.

i just tried a sample code

data dsn1;

    variable1="cha";

run;

data dsn2;

    variable1="character";

run;

data want;

    length variable1 $10;

    set dsn1 dsn2;

run;

It example is similar to what you have queried and for me it worked without any problem. The variable1 length in the want dataset is $10.

Thanks,

Jagadish

Thanks,
Jag
Occasional Contributor
Posts: 13

Re: Using the Length Statement while combining two data sets

Thanks for your help Jagadish, however I've just discovered that my two original datasets have formats and informats that are inherited by unifieddataset.  So whilst I set the length to $100 on unifieddataset, the informat and format remain as they were in original_dataset1.  To fix this, I also set the format and informat on unifieddataset:

data unifieddataset;

length variable1 $100;

informat variable1 $100.;

format variable1 $100.;

set original_dataset1 original_dataset2;

run;

Thanks for your help!

Joe

Trusted Advisor
Posts: 1,128

Re: Using the Length Statement while combining two data sets

Hi Joe,

Not sure how far the statements

informat variable1 $100.;

format variable1 $100.;

helped you, because what i believe is if there are character data then length statement should be enough to solve the problem. Could you please let me know the data that is present in variable1 for better understanding.

Thanks,

Jagadish

Thanks,
Jag
Occasional Contributor
Posts: 13

Re: Using the Length Statement while combining two data sets

Sure:

proc print data=original_dataset1;

run;

proc print data=original_dataset2;

run;

proc print data=unifieddataset;

run;

Results:

Obs      dummy_data1      dummy_data2

1      aosidfjaiosdosidjf      Tuesday

Obs     dummy_data1     dummy_data2

1      aosidfjaiosdjfaiosjdfioasjdofijasodifjaosidjfaiosdjfoiasjdfoiasjdfioasjdfiojasdiofjasiodjfas      Monday

Obs     dummy_data1     dummy_data2

1     aosidfjaiosdosidjf     Tuesday

2     aosidfjaiosdjfaios     Monday

The Length, Informat and Format for dummy_data1 in original_dataset1 is 175, $175. and $175.  For original_dataset2, they're 18, $18. and $18.  Using the original code I pasted, for unifieddataset I get 100, $18. and $18 and the above values.

(FYI, the string you see in the above results is "WARNING: Data too long for column "dummy_data1"; truncated to 92 characters to fit."; the actual string 175 chars long.)

When I set the Informat and Format to, say, 200, then it all fits:

proc print data=unifieddataset;

run;

Obs      dummy_data1      dummy_data2

1      aosidfjaiosdosidjf      Tuesday

2      aosidfjaiosdjfaiosjdfioasjdofijasodifjaosidjfaiosdjfoiasjdfoiasjdfioasjdfiojasdiofjasiodjfas      Monday

Super User
Super User
Posts: 6,500

Re: Using the Length Statement while combining two data sets

Much better off just removing the format than assigning some random value like $200. That will just cause more confusion later on.

Trusted Advisor
Posts: 1,128

Re: Using the Length Statement while combining two data sets

Thank you Joe

Thanks,
Jag
Solution
‎03-18-2013 08:56 AM
Super User
Super User
Posts: 6,500

Re: Using the Length Statement while combining two data sets

Assigning formats to character variables is a bad idea, as your example illustrates. Unfortunately some SAS processes (PROC IMPORT or SQL connections to external databases) have a nasty habit of attaching $nn formats to every character variable.  You can remove them by using a "naked" format statement, a format statement that lists variables, but no format.  Informats are not much of an issue as they are really not used unless you are reading in text files.


data unifieddataset;

  length variable1 $100;

  set original_dataset1 original_dataset2;

  format variable1 ;

run;

Occasional Contributor
Posts: 13

Re: Using the Length Statement while combining two data sets

Oh brilliant!  Indeed, I am using PROC IMPORTs to import my data (which, I promise you, is more than just a random arrangement of the eight home-keys on my keyboard that you see above).

Thanks for clarifying what is happening here!  Everything I had been reading online about using the Length statement to help with the consolidation of the same variable with varying lengths from different originating datasets wasn't working.

Joe

Super User
Super User
Posts: 6,500

Re: Using the Length Statement while combining two data sets

I would recommend adding a step to remove the formats from your character variable after using PROC IMPORT.   For example to remove the formats from all character variables in a dataset you could use PROC DATASETS.

proc datasets lib=work nolist;

  modify x;

  format _character_ ;

  informat _character_ ;

run;

Occasional Contributor
Posts: 13

Re: Using the Length Statement while combining two data sets

Indeed -- before your answer I was thinking that having to set it to something enormous (>$200) was an ugly solution, but it got me past several hours of frustration so I wasn't complaining. Smiley Happy

Also, thanks for pointing out PROC DATASETS for me -- I'll look in to it.

Thanks to you both!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 422 views
  • 4 likes
  • 3 in conversation