Hi SAS community,
Thank you for looking at my post!
I have a question,
I have a variable which takes value like this:
1545253657356_EI_1_15_CUP_A
1545253657356_EI_1_15_MUG_A
1545265347356_EI_1_15_BOWL_A
1545T432657356_EI_1_15_CUP_B
15452657356_EI_1_15_MUG_B
1545265237356_EI_1_15_BOWL_B
I want to make my variable looks like this
CUP_A
MUG_A
BOWL_A
CUP_B
MUG_B
BOWL_B
that is, to delete the non-sense numbers and letters preceding the key words
Because I know the key words will take values only in this range:
(
CUP_A
MUG_A
BOWL_A
CUP_B
MUG_B
BOWL_B
)
and because the number of non-sense letter/numbers preceding the key words varies, so I don't want to use a substr function,
which I have to manually set the position of key words
anyone know any SAS function that achieve this using some logic like this
extract(variable, from list
(
CUP_A
MUG_A
BOWL_A
CUP_B
MUG_B
BOWL_B
))
Thanks in advance!!!
And an approach that doesn't care where in the string the values may be:
data work.example; input var $50.; array t {6} $ 6 _temporary_ ("CUP_A","MUG_A","BOWL_A","CUP_B","MUG_B","BOWL_B"); do i= 1 to dim(t); if index(var,t[i])>0 then do; var=t[i]; leave; end; end; drop i; cards; 1545253657356_EI_1_15_CUP_A 1545253657356_EI_1_15_MUG_A 1545265347356_EI_1_15_BOWL_A 1545T432657356_EI_1_15_CUP_B 15452657356_EI_1_15_MUG_B 1545265237356_EI_1_15_BOWL_B 1545265237356BOWL_B_EI_1_15_ 1545265237356_EI_1_15_BOWL_B 1545265237356_EI_1_15_BOWL_B ; run;
If the value in var may not all be upper case but you want the lower or mixed case to be changed as well use
if index(upcase(var),t[i])>0 then do;
otherwise lower or mixed case such as bowl_a will remain in the long form.
data have;
input var $50.;
cards;
1545253657356_EI_1_15_CUP_A
1545253657356_EI_1_15_MUG_A
1545265347356_EI_1_15_BOWL_A
1545T432657356_EI_1_15_CUP_B
15452657356_EI_1_15_MUG_B
1545265237356_EI_1_15_BOWL_B
;
data want;
set have;
c=count(var,'_')-1;
i=0;
do pos = findc (var, '_') by 0 while (pos) ;
i+1;
if i=c then leave;
pos = findc (var, '_', pos + 1) ;
end ;
want=substr(var,pos+1);
keep var want;
run;
@changxuosu Play with the above and see if this works. Countc or count is something I want to confirm along with find group of functions. Let me have a coffee and review again
Or use the SCAN function to extract the last two "words":
data want;
set have;
length want $6;
want=scan(var,-2,'_')||'_'||scan(var,-1,'_');
run;
@FreelanceReinh Do you know the difference between genius and ordinary?. You may not, coz the former is synonym of who you are. That's what it is. Your presence of mind made that distinction. How I wish I could think like that. Simple and effective. Easy to handle and maintain. Kudos!
One day i will become good like you(hmm i hope).
PS Can't believe I didn't get the thought, the requirement is the concat of last two. Jeez! Will have to quit my evening pints and start eating veggies
Using reverse, find and substr
data have;
input var $50.;
cards;
1545253657356_EI_1_15_CUP_A
1545253657356_EI_1_15_MUG_A
1545265347356_EI_1_15_BOWL_A
1545T432657356_EI_1_15_CUP_B
15452657356_EI_1_15_MUG_B
1545265237356_EI_1_15_BOWL_B
;
data want;
set have;
t=strip(reverse(var));
want=reverse(substr(t,1,findc(t,'_',3)-1));
drop t;
run;
Hi @changxuosu, I think @Cynthia_sas questions are for you to answer although that seems to point to me. 🙂
And an approach that doesn't care where in the string the values may be:
data work.example; input var $50.; array t {6} $ 6 _temporary_ ("CUP_A","MUG_A","BOWL_A","CUP_B","MUG_B","BOWL_B"); do i= 1 to dim(t); if index(var,t[i])>0 then do; var=t[i]; leave; end; end; drop i; cards; 1545253657356_EI_1_15_CUP_A 1545253657356_EI_1_15_MUG_A 1545265347356_EI_1_15_BOWL_A 1545T432657356_EI_1_15_CUP_B 15452657356_EI_1_15_MUG_B 1545265237356_EI_1_15_BOWL_B 1545265237356BOWL_B_EI_1_15_ 1545265237356_EI_1_15_BOWL_B 1545265237356_EI_1_15_BOWL_B ; run;
If the value in var may not all be upper case but you want the lower or mixed case to be changed as well use
if index(upcase(var),t[i])>0 then do;
otherwise lower or mixed case such as bowl_a will remain in the long form.
one more way is to use regex
^(.+_)? this indicates to start from beginning and go till another instruction in this case till next pattern and this is first pattern
([^\_]+_[^\_]+)$ indicates anything_anything at the end and is second pattern
$2 indicates replace everything with second pattern
data want;
set have;
var_new = prxchange('s/^(.+_)?([^\_]+_[^\_]+)$/$2/', -1, trim(var));
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.