BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ChrisElias
Calcite | Level 5

Hi.

I have two columns. One containing a vehicle make and the other a vehicle model. For example:

Column1               Column2

VOLVO                 LR55VL

In some instances (from poor validation at data entry most probably), Column2 often contains the value from Column1 within the string. Example:

Column1               Column2

VOLVO                 VOLVO - LR55VL

I was hoping to use the COMPRESS function like

COMPRESS(Column2, Column1) but that looks at the individual characters in column1 and compresses them ALL out of Column2 resulting in

Column1               Column2

VOLVO                  - R55

How can I do something similar to compress but force it to contstrain the compression with the entire string in column1?

In this example, i was hoping to achieve and output of

Column1               Column2

VOLVO                  - LR55VL

Thanks in advance!


1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19
data cars;
   input make:$10. model&$20.;
   model = transtrn(model,strip(make),strip(
' '));
cards;
VOLVO LR55VL
VOLVO VOLVO-LR55VL
VOLVO LR55VOLVO
VOLVO LR55 VOLVO
VOLVO VOLVO LR55
VOLVO VOLVO/LR55
;;;;
   run;

View solution in original post

10 REPLIES 10
Jagadishkatam
Amethyst | Level 16

Hi Chris,

Considering that the column model has model value associated with the '-' to make, then you can follow the below code to get the desired result. Please let me know if this works for you

data have;

    input make : $10. model :$20.;

    if upcase(make)=upcase(scan(model,1,'-')) then

    new_model=scan(model,2,'-');

    else new_model=model;

cards;

VOLVO  LR55VL

VOLVO  VOLVO-LR55VL

;

run;

Thanks,

Jagadish

Thanks,
Jag
ChrisElias
Calcite | Level 5

Thanks for such a quick response. Wasn't aware of the scan function and that would work for the example i provided. However i should have probably provided a little more of an example to show the range of the problem.

In that, the make is not always followed by a "-". There isn't really any consistency as to where the make name could appear in the model field. Another example could be

LR55VOLVO

LR55 VOLVO

VOLVO LR55

VOLVO/LR55

This list isn't exhastive as some of the names are input by humans by free text

So i'm guessing i need to put a CONTAINS element in here. In logic would be something like

Assess if string in column2 CONTAINS string in column1 and provide the start postition of where the string in column1 is in column2. Then delete the following LEN(column1) from that start position.

Does that make sense?


LinusH
Tourmaline | Level 20

CONTAINS are only valid in where clauses, but you could use findw (not scan - sorry).

The rest sounds good...

Data never sleeps
data_null__
Jade | Level 19
data cars;
   input make:$10. model&$20.;
   model = transtrn(model,strip(make),strip(
' '));
cards;
VOLVO LR55VL
VOLVO VOLVO-LR55VL
VOLVO LR55VOLVO
VOLVO LR55 VOLVO
VOLVO VOLVO LR55
VOLVO VOLVO/LR55
;;;;
   run;
ChrisElias
Calcite | Level 5

Thanks All for your help this morning.

The transtrn function and data_null_ worked perfectly!

Thanks.

Haikuo
Onyx | Level 15

Another approach, using Prx functions:

data cars;

  input make:$10. model&$20.;

  model = transtrn(model,strip(make),strip(' '));

  model_prx=prxchange(cats('s/',make,'//io'),-1,model);

  cards;

VOLVO LR55VL

VOLVO VOLVO-LR55VL

VOLVO LR55VOLVO

VOLVO LR55 VOLVO

VOLVO VOLVO LR55

VOLVO VOLVO/LR55

;;;;

  run;

Haikuo

LinusH
Tourmaline | Level 20

By using the scan, length and substring functions, you can romove your word from anywhere in the other column.

Data never sleeps
kbk
Fluorite | Level 6 kbk
Fluorite | Level 6

So, to be clear, it is not possible to use compress() to remove a word/phrase/string? It will also remove the component parts (e.g., letters) entirely?

Haikuo
Onyx | Level 15

Correct. Compress() is letter based, not string based.

art297
Opal | Level 21

But, as you asked if "it was not possible", you have opened up a can of worms.  If the word(s) are comprised of specific characters, and those characters don't exist anywhere else in the string, then it is definitely possible.  e.g., using compress(string,'a') will definitely remove the "word" from strings like:

This is a nice evening

Is it a boy or girl?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 7031 views
  • 1 like
  • 7 in conversation