BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Hi,

I have two variable whose values are strings,
I want to remove substring from big string.
for example,

A=adc sgk ghhj
B= sgk

I want to create a new variable as C, So that it should be c=adc ghhj.
Can any one give the solution on that?


Thanks in advance,
Pabitra
1 ACCEPTED SOLUTION

Accepted Solutions
David_Duling
SAS Employee
Base SAS code is always fun. What if you don't know the length of the strings but you do know that space is the separator and that you want the first chunk from A and the third chunk from B. Use the scan function.

length c $32 ;
c= scan(a,1,' ') !! ' ' !! scan(b,3,' ');


Now, what if you don't know how many chunks there are in B, but you know you want the last? Use the reverse function.

length c $32 ;
c= scan(a,1,' ') !! ' ' !! reverse(scan(reverse(b),1,' '));


When using EM 5.3 or 6.1, you can enter the expression directly into the SCORE code editor in the CODE node or the TRANSFORM Node. You do not need to enter DATA, SET, or RUN statements. The code will be inserted into the diagram's score code.

What if you want to use the variable C in a model but you want to reject A and B? You can do this in your CODE node as well. The CODE node has editors for TRAIN, SCORE, and REPORT code. Enter this code into the TRAIN code.

%EM_METACHANGE(name=A, role=rejected) ;
%EM_METACHANGE(name=B, role=rejected) ;


The new variable C will have the role of input by default. If you want to set that explicitly, you can do this:

%EM_METACHANGE(name=C, role=input, level=nominal) ;

You do NOT want to drop A and B from the data since that will cause the score code to fail when the expression that creates C is evaluated after a temporary data set has been created.

In general, EM does not drop variables from the data set, and does not change the values of existing variables.

cheers.

View solution in original post

11 REPLIES 11
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Explore the use of SAS function TRANWRD to assign a new SAS variable.

Scott Barry
SBBWorks, Inc.
WayneThompson
SAS Employee
> Hi,
>
> I have two variable whose values are strings,
> I want to remove substring from big string.
> for example,
>
> A=adc sgk ghhj
> B= sgk
>
> I want to create a new variable as C, So that it
> should be c=adc ghhj.
> Can any one give the solution on that?
>
>
> Thanks in advance,
> Pabitra

Hi Pabitra,

One way to accomplish this is to use the BASE SAS substring function in an EM SAS code node.

Substr Returns a portion of the variable value based on a starting position
and number of characters.

cpart1 = SUBSTR(a,1,3);
cpart2 =SUBSTR(a,9,3);

You could runt the function twice as above and the use || to cancatenate back the two extract strings into a single varaible c =adc ghhj.

c=cpart1 ||(left)cpart2;
drop cpart1 cpart2;
Others may have an easier step.
deleted_user
Not applicable
You may also be interested by the Regular Expressions.
I do not know how to handle them in SAS specifically.
But it is a very powerfull way to detect string within string, test them, and handle them.
David_Duling
SAS Employee
Base SAS code is always fun. What if you don't know the length of the strings but you do know that space is the separator and that you want the first chunk from A and the third chunk from B. Use the scan function.

length c $32 ;
c= scan(a,1,' ') !! ' ' !! scan(b,3,' ');


Now, what if you don't know how many chunks there are in B, but you know you want the last? Use the reverse function.

length c $32 ;
c= scan(a,1,' ') !! ' ' !! reverse(scan(reverse(b),1,' '));


When using EM 5.3 or 6.1, you can enter the expression directly into the SCORE code editor in the CODE node or the TRANSFORM Node. You do not need to enter DATA, SET, or RUN statements. The code will be inserted into the diagram's score code.

What if you want to use the variable C in a model but you want to reject A and B? You can do this in your CODE node as well. The CODE node has editors for TRAIN, SCORE, and REPORT code. Enter this code into the TRAIN code.

%EM_METACHANGE(name=A, role=rejected) ;
%EM_METACHANGE(name=B, role=rejected) ;


The new variable C will have the role of input by default. If you want to set that explicitly, you can do this:

%EM_METACHANGE(name=C, role=input, level=nominal) ;

You do NOT want to drop A and B from the data since that will cause the score code to fail when the expression that creates C is evaluated after a temporary data set has been created.

In general, EM does not drop variables from the data set, and does not change the values of existing variables.

cheers.
deleted_user
Not applicable
Thanks for the ans....
But i need it for general...i mean i want to apply for two columns whose values are the strings...

can some one give me any suggestion...

Thanks in advance......

pabitra
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Did you look at the DOC and possibly using the TRANWRD function in a DATA step, as suggested? The function can be used with SAS variables as well as constants. Here's a Google advanced search argument to help you find related topic info on the SAS support http://support.sas.com/ website:

tranwrd function site:sas.com


Scott Barry
SBBWorks, Inc.
David_Duling
SAS Employee
ok, how about this - you can use the string that should be removed as the delimiter in the scan function. Works both with and without spaces.

data _null_ ;
a= 'asdf yyy sdfs' ;
b= 'yyy' ;
c= scan(a,1,b) !! ' ' !! scan(a,2,b) ;
put c ;
run ;
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Honestly, is there some aversion with using a function for its defined purpose, or am I missing something with this elongated thread?

Scott Barry
SBBWorks, Inc.
David_Duling
SAS Employee
For simply removing one string form another, that is the way to go.

TRANWRD

http://support.sas.com/documentation/cdl/en/lrdict/61724/HTML/default/a000215027.htm

rc= tranwrd(a,b,' ');
Peter_C
Rhodochrosite | Level 12
new_string= compress(tranwrd(original_string, trim(unwanted_string),'01'x ), x'01' );

thus ensuring all occurrences of what is in "unwanted_string" are removed, without leaving a blank for each occurrence.
The trim() function is used to ensure those cases not fully using the width of the variable named unwanted_string will match only on the non-blank (left-hand) part.
The compress() function will remove the substitutions for the unwanted_string.
DLing
Obsidian | Level 7
data _null_;
a='abc def kkk xyz kkk qwe';
b='kkk';
c=prxchange('s/'||b||'//', -1, a);
put a= b= c=;
run;

will get you this: a=abc def kkk xyz kkk qwe b=kkk c=abc def xyz qwe

Note that it doesn't deal with the resultant two-blank. And it will replace portions of a word containing kkk. With suitable changes to the regular expression, you can deal with all of those situations.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 27909 views
  • 4 likes
  • 6 in conversation