- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi.
I have two columns. One containing a vehicle make and the other a vehicle model. For example:
Column1 Column2
VOLVO LR55VL
In some instances (from poor validation at data entry most probably), Column2 often contains the value from Column1 within the string. Example:
Column1 Column2
VOLVO VOLVO - LR55VL
I was hoping to use the COMPRESS function like
COMPRESS(Column2, Column1) but that looks at the individual characters in column1 and compresses them ALL out of Column2 resulting in
Column1 Column2
VOLVO - R55
How can I do something similar to compress but force it to contstrain the compression with the entire string in column1?
In this example, i was hoping to achieve and output of
Column1 Column2
VOLVO - LR55VL
Thanks in advance!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
input make:$10. model&$20.;
model = transtrn(model,strip(make),strip(' '));
cards;
VOLVO LR55VL
VOLVO VOLVO-LR55VL
VOLVO LR55VOLVO
VOLVO LR55 VOLVO
VOLVO VOLVO LR55
VOLVO VOLVO/LR55
;;;;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Chris,
Considering that the column model has model value associated with the '-' to make, then you can follow the below code to get the desired result. Please let me know if this works for you
data have;
input make : $10. model :$20.;
if upcase(make)=upcase(scan(model,1,'-')) then
new_model=scan(model,2,'-');
else new_model=model;
cards;
VOLVO LR55VL
VOLVO VOLVO-LR55VL
;
run;
Thanks,
Jagadish
Jag
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for such a quick response. Wasn't aware of the scan function and that would work for the example i provided. However i should have probably provided a little more of an example to show the range of the problem.
In that, the make is not always followed by a "-". There isn't really any consistency as to where the make name could appear in the model field. Another example could be
LR55VOLVO
LR55 VOLVO
VOLVO LR55
VOLVO/LR55
This list isn't exhastive as some of the names are input by humans by free text
So i'm guessing i need to put a CONTAINS element in here. In logic would be something like
Assess if string in column2 CONTAINS string in column1 and provide the start postition of where the string in column1 is in column2. Then delete the following LEN(column1) from that start position.
Does that make sense?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
CONTAINS are only valid in where clauses, but you could use findw (not scan - sorry).
The rest sounds good...
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
input make:$10. model&$20.;
model = transtrn(model,strip(make),strip(' '));
cards;
VOLVO LR55VL
VOLVO VOLVO-LR55VL
VOLVO LR55VOLVO
VOLVO LR55 VOLVO
VOLVO VOLVO LR55
VOLVO VOLVO/LR55
;;;;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks All for your help this morning.
The transtrn function and data_null_ worked perfectly!
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Another approach, using Prx functions:
data cars;
input make:$10. model&$20.;
model = transtrn(model,strip(make),strip(' '));
model_prx=prxchange(cats('s/',make,'//io'),-1,model);
cards;
VOLVO LR55VL
VOLVO VOLVO-LR55VL
VOLVO LR55VOLVO
VOLVO LR55 VOLVO
VOLVO VOLVO LR55
VOLVO VOLVO/LR55
;;;;
run;
Haikuo
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
By using the scan, length and substring functions, you can romove your word from anywhere in the other column.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So, to be clear, it is not possible to use compress() to remove a word/phrase/string? It will also remove the component parts (e.g., letters) entirely?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Correct. Compress() is letter based, not string based.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
But, as you asked if "it was not possible", you have opened up a can of worms. If the word(s) are comprised of specific characters, and those characters don't exist anywhere else in the string, then it is definitely possible. e.g., using compress(string,'a') will definitely remove the "word" from strings like:
This is a nice evening
Is it a boy or girl?