Hello everyone,
i m trying to create a new column (using case when and contains/like) depending on the context of another string column, but i find it troublesome when it comes to containing numbers such as 50 and 500 for example:
HAVE DATASET
CPP
blabla DP50_L
blablablu DP 50 L
ble DP50
blx DP50_500
blibli DP500_L
bliblibla DP 500 L
bliblible DP500
blublu DP500 50
WANT DATASET
CPP SPEED
blabla DP50_L 50
blablablu DP 50 L 50
ble DP50 50
blx DP50_500 50
blibli DP500_L 500
bliblibla DP 500 L 500
bliblible DP500 500
blublu DP500 50 500
As a result, i have faulty classification. What is the proper way to create the column of the example above?
Thanks alot for your time and insights!
Data cleansing is a cumbersome job, and sometimes it's hard to get it 100& perfect.
My take would to find starting position of string "DP", substring the following 4 characters, remove any non numerical character, and then input that result to a numerical column.
I think there is some ambiguity in your sample WANT. Namely, what is the rule you intend to apply to get the speed values? In particular, is it the first number in the text of CPP? Is it the first number after the string 'DP'?
If it's the earlier (first number in the text of CPP):
data have;
input cpp $char25. ;
datalines;
blabla DP50_L
blablablu DP 50 L
ble DP50
blx DP50_500
blibli DP500_L
bliblibla DP 500 L
bliblible DP500
blublu DP500 50
run;
data want;
set have;
*find position of first numeric character, call it _char1 ;
_char1=anydigit(cpp);
*generate substring length until first non-numeric character after _char1, call it _length;
_length=notdigit(substr(cpp,_char1))-1;
*Read that substring into a numeric variable;
speed=input(substr(cpp,_char1,_length),best32.);
run;
Edited note: This works only if the numeric value is a non-negative integer.
data have;
input CPP $40.;
cards;
blabla DP50_L
blablablu DP 50 L
ble DP50
blx DP50_500
blibli DP500_L
bliblibla DP 500 L
bliblible DP500
blublu DP500 50
;
proc sql;
select cpp,case when cpp like '%50%' then
case when substr(cpp,find(cpp,'50'),3)='500' then 500 else 50 end
else . end as want
from have;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.