BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
changxuosu
Quartz | Level 8

Hi SAS community,

 

Thank you for looking at my post!

I have a question,

I have a variable which takes value like this:

 

1545253657356_EI_1_15_CUP_A

1545253657356_EI_1_15_MUG_A

1545265347356_EI_1_15_BOWL_A

1545T432657356_EI_1_15_CUP_B

15452657356_EI_1_15_MUG_B

1545265237356_EI_1_15_BOWL_B

 

I want to make my variable looks like this

 

CUP_A

MUG_A

BOWL_A

CUP_B

MUG_B

BOWL_B

 

that is, to delete the non-sense numbers and letters preceding the key words

 

Because I know the key words will take values only in this range:

(

CUP_A

MUG_A

BOWL_A

CUP_B

MUG_B

BOWL_B

)

 

and because the number of non-sense letter/numbers preceding the key words varies, so I don't want to use a substr function,

which I have to manually set the position of key words

anyone know any SAS function that achieve this using some logic like this

 

extract(variable, from list

(

CUP_A

MUG_A

BOWL_A

CUP_B

MUG_B

BOWL_B

))

 

Thanks in advance!!!

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

And an approach that doesn't care where in the string the values may be:

data work.example;
   input var $50.;
   array t {6} $ 6 _temporary_ ("CUP_A","MUG_A","BOWL_A","CUP_B","MUG_B","BOWL_B");
   do i= 1 to dim(t);
      if index(var,t[i])>0 then do;
         var=t[i];
         leave;
      end;
   end;
   drop i;
cards;
1545253657356_EI_1_15_CUP_A
1545253657356_EI_1_15_MUG_A
1545265347356_EI_1_15_BOWL_A
1545T432657356_EI_1_15_CUP_B
15452657356_EI_1_15_MUG_B
1545265237356_EI_1_15_BOWL_B
1545265237356BOWL_B_EI_1_15_
1545265237356_EI_1_15_BOWL_B
1545265237356_EI_1_15_BOWL_B
;


run;

If the value in var may not all be upper case but you want the lower or mixed case to be changed as well use

 

      if index(upcase(var),t[i])>0 then do;

otherwise lower or mixed case such as bowl_a will remain in the long form.

 

View solution in original post

19 REPLIES 19
Cynthia_sas
SAS Super FREQ
Hi:
There are functions, but you also have to use them with some kind of lookup technique. For your purposes, I'd probably use an array.
Cynthia
novinosrin
Tourmaline | Level 20
data have;
input var $50.;
cards;
1545253657356_EI_1_15_CUP_A
1545253657356_EI_1_15_MUG_A
1545265347356_EI_1_15_BOWL_A
1545T432657356_EI_1_15_CUP_B
15452657356_EI_1_15_MUG_B
1545265237356_EI_1_15_BOWL_B
;

data want;
set have;
c=count(var,'_')-1;
i=0;
do pos = findc (var, '_') by 0 while (pos) ;            
i+1;
if i=c then leave; 
pos = findc (var, '_', pos + 1) ;                     
end ; 
want=substr(var,pos+1);
keep var want;
run;

 

@changxuosu  Play with the above and see if this works. Countc or count is something I want to confirm along with find group of functions. Let me have a coffee and review again 

FreelanceReinh
Jade | Level 19

Or use the SCAN function to extract the last two "words":

data want;
set have;
length want $6;
want=scan(var,-2,'_')||'_'||scan(var,-1,'_');
run;
novinosrin
Tourmaline | Level 20

@FreelanceReinh  Do you know the difference between genius and ordinary?. You may not, coz the former is synonym of who you are. That's what it is. Your presence of mind made that distinction. How I wish I could think like that. Simple and effective. Easy to handle and maintain. Kudos!

 

One day i will become good like you(hmm i hope). 

PS Can't believe I didn't get the thought, the requirement is the concat of last two. Jeez! Will have to quit my evening pints and start eating veggies

changxuosu
Quartz | Level 8
i studied your code and it worked fantastic! thank you novinosrin. hope you enjoyed your coffee 🙂
novinosrin
Tourmaline | Level 20

Using reverse, find and substr

 


data have;
input var $50.;
cards;
1545253657356_EI_1_15_CUP_A
1545253657356_EI_1_15_MUG_A
1545265347356_EI_1_15_BOWL_A
1545T432657356_EI_1_15_CUP_B
15452657356_EI_1_15_MUG_B
1545265237356_EI_1_15_BOWL_B
;

data want;
set have;
t=strip(reverse(var));
want=reverse(substr(t,1,findc(t,'_',3)-1));
drop t;
run;
Cynthia_sas
SAS Super FREQ
Hi:
I have a couple of questions:
1) Is the string you want ALWAYS at the end of the variable?
2) What if the end of the variable had PLATE_A or CAT_A -- would you still want those to be extracted or do you only want the 6 strings you listed?
3) Are there ALWAYS underscores and ONLY underscores in the string?
4) You said the numbers at the beginning of the string were "nonsense", but are you sure you will NEVER need them?

Cynthia
novinosrin
Tourmaline | Level 20

Hi @changxuosu, I think @Cynthia_sas questions are for you to answer although that seems to point to me. 🙂

Cynthia_sas
SAS Super FREQ
Yes, those questions were for the original poster. Sorry, I just clicked so it would be at the end of the thread, not in the middle. It seems to me that there are points that really need to be clarified before writing a program. For example, what if CUP_A can be in the middle of the string? Then the REVERSE technique won't work. What if the value in the string is cup_a instead of CUP_A? What if there aren't any underscores in the string? Just a bunch of unknowns that need to be answered before I would attempt a solution. But of all the solutions, posted here, the one that I was thinking of is similar to the ARRAY solution because that allows you to provide a specific list of desired strings to search for with the INDEX function.
Cynthia
changxuosu
Quartz | Level 8
these are great questions, thank you Cynthia!
ballardw
Super User

And an approach that doesn't care where in the string the values may be:

data work.example;
   input var $50.;
   array t {6} $ 6 _temporary_ ("CUP_A","MUG_A","BOWL_A","CUP_B","MUG_B","BOWL_B");
   do i= 1 to dim(t);
      if index(var,t[i])>0 then do;
         var=t[i];
         leave;
      end;
   end;
   drop i;
cards;
1545253657356_EI_1_15_CUP_A
1545253657356_EI_1_15_MUG_A
1545265347356_EI_1_15_BOWL_A
1545T432657356_EI_1_15_CUP_B
15452657356_EI_1_15_MUG_B
1545265237356_EI_1_15_BOWL_B
1545265237356BOWL_B_EI_1_15_
1545265237356_EI_1_15_BOWL_B
1545265237356_EI_1_15_BOWL_B
;


run;

If the value in var may not all be upper case but you want the lower or mixed case to be changed as well use

 

      if index(upcase(var),t[i])>0 then do;

otherwise lower or mixed case such as bowl_a will remain in the long form.

 

changxuosu
Quartz | Level 8
it's a smart loop! thank you so much! it worked like magic!
kiranv_
Rhodochrosite | Level 12

one more way is to use regex

 

^(.+_)?  this indicates to start from beginning and go till another instruction in this case till next pattern and this is first pattern

 

([^\_]+_[^\_]+)$ indicates anything_anything at the end and is second pattern

$2 indicates replace  everything with second pattern

data want;
set have;
var_new = prxchange('s/^(.+_)?([^\_]+_[^\_]+)$/$2/', -1, trim(var));
run;

 

changxuosu
Quartz | Level 8
wow, this prxchange works like magic, it's my first time seeting it. will learn it. thanks a lot!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 19 replies
  • 2401 views
  • 11 likes
  • 7 in conversation