Merging two databases with different character variables

Reply
New Contributor
Posts: 3

Merging two databases with different character variables

Hey all. I hope all is well.

I've been working on SAS for the last 2 weeks and have come across a problem I can't seem to figure out.

Problem:

One database contains the variable 'POSTCODE' in the form of 'UB2  9TF' (UK POSTCODES)

The other database contains the variable 'POSTCODE' in the form of 'UB29TF'. How do I go about getting the same variable attributes in order to successfully merge them?

Thank you in advance!

Super Contributor
Posts: 282

Re: Merging two databases with different character variables

Hi,

Have you tried using the compress function to remove spaces (among other characters) in a string?

SAS(R) 9.3 Functions and CALL Routines: Reference

Regards,

Amir.

New Contributor
Posts: 3

Re: Merging two databases with different character variables


Thank you for the brilliant idea!

As I'm still relatively new, I'm not quite sure on how to execute the compress function.

The function:

COMPRESS (character-value <,'compress-list'>)

My coding:

**compress**;

data test3;

set test2;

Postcode = COMPRESS (" X ")

run;

Now, the problem is that there are a multitude of differing postcodes. One dataset contains 'UB8 1TF' whereas another may possibly have 'UB10 8TF', and hence I am not too sure as to what to replace 'X' with.

Just to clarify, I want the postcodes to become 'UB81TF' and 'UB108TF' instead of the above - also, there are A LOT of postcodes for this to be done to.

Super User
Super User
Posts: 7,039

Re: Merging two databases with different character variables

It sounds like you would like to remove the spaces that are embedded into the middle of some of the values of POSTCODE in one or more of the original datasets.

data fix1 ;

  set data1 ;

  postcode = compress(postcode, ' ');

run;

data fix2;

  set data2;

  postcode = compress(postcode, ' ');

run;

proc sort data=fix1; by postcode ; run;

proc sort data=fix2; by postcode; run;

data want ;

  merge fix1 fix2;

  by postcode;

run;

Or with PROC SQL you might do something like this:

proc sql ;

create table want as

   select *

   from data1 full join data2

   on compress(data1.postcode,' ') = compress(date2.postcode,' ')

;

quit;

New Contributor
Posts: 3

Re: Merging two databases with different character variables

Thank you very much Amir and Tom. It's finally compressed! My other question is how do I ensure that the two variables have the same length, informat, format, label, etc,.

For example, before merging, one database has the format and informat of  '$24' whereas the other has '$255'.

Super Contributor
Posts: 644

Re: Merging two databases with different character variables

For character variables the formats and informats you mention are redundant.  They probably indicate the length of the respective variables in their tables, both of them overkill if they ar for the postcodes. 

You can compare values with different lengths from each of these tables without making any further adjustments, SAS will just pad out the rest of the length (if any) with blanks.  Just do it.

Richard

Frequent Contributor
Posts: 118

Re: Merging two databases with different character variables

Hi,

if there is only blank characters (more than one in string) then you can also use COMPBL function to remove excess blanks.

Regards

Uma Shanker

Contributor
Posts: 71

Re: Merging two databases with different character variables

If your postal codes are like mine, you will not only want to use the Compress function as indicated to remove blanks, but also to remove unwanted characters like dashes (i.e. A1A-1A1).  Here is a paper with good examples: http://www2.sas.com/proceedings/forum2007/217-2007.pdf

In a similar way as the compress function, you may want to use the UPCASE function to make any lowercase letters in your postalcode to uppercase (that is, if any lowercases exist). 

Ask a Question
Discussion stats
  • 7 replies
  • 525 views
  • 7 likes
  • 6 in conversation