Compress, but whole word (not just the characters)

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 10
Accepted Solution

Compress, but whole word (not just the characters)

Hi.

I have two columns. One containing a vehicle make and the other a vehicle model. For example:

Column1               Column2

VOLVO                 LR55VL

In some instances (from poor validation at data entry most probably), Column2 often contains the value from Column1 within the string. Example:

Column1               Column2

VOLVO                 VOLVO - LR55VL

I was hoping to use the COMPRESS function like

COMPRESS(Column2, Column1) but that looks at the individual characters in column1 and compresses them ALL out of Column2 resulting in

Column1               Column2

VOLVO                  - R55

How can I do something similar to compress but force it to contstrain the compression with the entire string in column1?

In this example, i was hoping to achieve and output of

Column1               Column2

VOLVO                  - LR55VL

Thanks in advance!



Accepted Solutions
Solution
‎01-24-2013 06:44 AM
Respected Advisor
Posts: 3,799

Re: Compress, but whole word (not just the characters)

Posted in reply to ChrisElias
data cars;
   input make:$10. model&$20.;
   model = transtrn(model,strip(make),strip(
' '));
cards;
VOLVO LR55VL
VOLVO VOLVO-LR55VL
VOLVO LR55VOLVO
VOLVO LR55 VOLVO
VOLVO VOLVO LR55
VOLVO VOLVO/LR55
;;;;
   run;

View solution in original post


All Replies
Trusted Advisor
Posts: 1,137

Re: Compress, but whole word (not just the characters)

Posted in reply to ChrisElias

Hi Chris,

Considering that the column model has model value associated with the '-' to make, then you can follow the below code to get the desired result. Please let me know if this works for you

data have;

    input make : $10. model :$20.;

    if upcase(make)=upcase(scan(model,1,'-')) then

    new_model=scan(model,2,'-');

    else new_model=model;

cards;

VOLVO  LR55VL

VOLVO  VOLVO-LR55VL

;

run;

Thanks,

Jagadish

Thanks,
Jag
Occasional Contributor
Posts: 10

Re: Compress, but whole word (not just the characters)

Posted in reply to Jagadishkatam

Thanks for such a quick response. Wasn't aware of the scan function and that would work for the example i provided. However i should have probably provided a little more of an example to show the range of the problem.

In that, the make is not always followed by a "-". There isn't really any consistency as to where the make name could appear in the model field. Another example could be

LR55VOLVO

LR55 VOLVO

VOLVO LR55

VOLVO/LR55

This list isn't exhastive as some of the names are input by humans by free text

So i'm guessing i need to put a CONTAINS element in here. In logic would be something like

Assess if string in column2 CONTAINS string in column1 and provide the start postition of where the string in column1 is in column2. Then delete the following LEN(column1) from that start position.

Does that make sense?


Super User
Posts: 5,424

Re: Compress, but whole word (not just the characters)

Posted in reply to ChrisElias

CONTAINS are only valid in where clauses, but you could use findw (not scan - sorry).

The rest sounds good...

Data never sleeps
Solution
‎01-24-2013 06:44 AM
Respected Advisor
Posts: 3,799

Re: Compress, but whole word (not just the characters)

Posted in reply to ChrisElias
data cars;
   input make:$10. model&$20.;
   model = transtrn(model,strip(make),strip(
' '));
cards;
VOLVO LR55VL
VOLVO VOLVO-LR55VL
VOLVO LR55VOLVO
VOLVO LR55 VOLVO
VOLVO VOLVO LR55
VOLVO VOLVO/LR55
;;;;
   run;
Occasional Contributor
Posts: 10

Re: Compress, but whole word (not just the characters)

Posted in reply to data_null__

Thanks All for your help this morning.

The transtrn function and data_null_ worked perfectly!

Thanks.

Respected Advisor
Posts: 3,156

Re: Compress, but whole word (not just the characters)

Posted in reply to ChrisElias

Another approach, using Prx functions:

data cars;

  input make:$10. model&$20.;

  model = transtrn(model,strip(make),strip(' '));

  model_prx=prxchange(cats('s/',make,'//io'),-1,model);

  cards;

VOLVO LR55VL

VOLVO VOLVO-LR55VL

VOLVO LR55VOLVO

VOLVO LR55 VOLVO

VOLVO VOLVO LR55

VOLVO VOLVO/LR55

;;;;

  run;

Haikuo

Super User
Posts: 5,424

Re: Compress, but whole word (not just the characters)

Posted in reply to ChrisElias

By using the scan, length and substring functions, you can romove your word from anywhere in the other column.

Data never sleeps
Contributor kbk
Contributor
Posts: 29

Re: Compress, but whole word (not just the characters)

Posted in reply to ChrisElias

So, to be clear, it is not possible to use compress() to remove a word/phrase/string? It will also remove the component parts (e.g., letters) entirely?

Respected Advisor
Posts: 3,156

Re: Compress, but whole word (not just the characters)

Correct. Compress() is letter based, not string based.

PROC Star
Posts: 7,468

Re: Compress, but whole word (not just the characters)

But, as you asked if "it was not possible", you have opened up a can of worms.  If the word(s) are comprised of specific characters, and those characters don't exist anywhere else in the string, then it is definitely possible.  e.g., using compress(string,'a') will definitely remove the "word" from strings like:

This is a nice evening

Is it a boy or girl?

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 1694 views
  • 1 like
  • 7 in conversation