Solved: Re: Spliting a string into variable

BharathBandi · Posted 04-14-2015 11:12 PM

Hi ,

I want to split the below string in to different variables and when there are two consecutive characters then I want to insert 1. I tried putting characters and numbers in different variables but that's not giving what I want. Please help me

String

28,16OB4N7L8U4L

28,12O8B20O

46,12C

30,14CB12C4N2C

Output:

var1 Var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12 var13 var14

28 , 16 O 1 B 4 N 7 L 8 U 4 L

28 , 12 O 8 B 20 O

46 , 12 C

30 , 14 C 1 B 12 C 4 N 2 C

Thank You

art297 · Posted 04-15-2015 01:27 AM

There is probably an easier way to do this, but it's late and I'm tired:

data have;

informat string $50.;

input String;

cards;

28,16OB4N7L8U4L

28,12O8B20O

46,12C

30,14CB12C4N2C

;

data want (drop=string i);

set have;

string=prxchange('s/([0-9])([,A-Za-z])/$1_$2/', -1, STRING);

string=prxchange('s/(,)([0-9A-Za-z])/$1_$2/', -1, STRING);

string=prxchange('s/([0-9A-Za-z])(,)/$1_$2/', -1, STRING);

string=prxchange('s/([A-Za-z])([0-9])/$1_$2/', -1, STRING);

string=prxchange('s/([A-Za-z])([A-Za-z])/$1_1_$2/', -1, STRING);

array var(14) $;

i=1;

do while(scan(string,i,'_') ne '');

var(i)=scan(string,i,'_');

i+1;

end;

run;

View solution in original post

Jagadishkatam · Posted 04-14-2015 11:54 PM

data have;

input string:$100.;

cards;

28,16OB4N7L8U4L

28,12O8B20O

46,12C

30,14CB12C4N2C

;

data want;

set have;

var1=substr(string,1,2);

var2=substr(string,3,1);

var3=substr(string,4,2);

array vas(*) $ var4-var14;

do i = 1 to dim(vas);

vas(i)=substr(compress(string),i+5,1);

end;

run;

Thanks,

Jag

Thanks,
Jag

art297 · Posted 04-15-2015 01:27 AM

There is probably an easier way to do this, but it's late and I'm tired:

data have;

informat string $50.;

input String;

cards;

28,16OB4N7L8U4L

28,12O8B20O

46,12C

30,14CB12C4N2C

;

data want (drop=string i);

set have;

string=prxchange('s/([0-9])([,A-Za-z])/$1_$2/', -1, STRING);

string=prxchange('s/(,)([0-9A-Za-z])/$1_$2/', -1, STRING);

string=prxchange('s/([0-9A-Za-z])(,)/$1_$2/', -1, STRING);

string=prxchange('s/([A-Za-z])([0-9])/$1_$2/', -1, STRING);

string=prxchange('s/([A-Za-z])([A-Za-z])/$1_1_$2/', -1, STRING);

array var(14) $;

i=1;

do while(scan(string,i,'_') ne '');

var(i)=scan(string,i,'_');

i+1;

end;

run;

BharathBandi · Posted 04-15-2015 02:11 AM

Thank you your code worked correct for me, but if there is no number in-front of character then I need 1.

example:After O theres no number, so in this case I need 1 then B.

For the first string : 28,16OB4N7L8U4L

output:

var1 Var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12 var13 var14

28 , 16 O 1 B 4 N 7 L 8 U 4 L

Hope I'm clear.

Thank you

RW9 · Posted 04-15-2015 05:02 AM

Something like:

data have;

informat string $50.;

input String;

cards;

28,16OB4N7L8U4L

28,12O8B20O

46,12C

30,14CB12C4N2C

;

run;

data want (keep=old_string final_string);

set have;

length id1-id3 old_string final_string $20;

old_string=string;

array var{40} $20.;

array id{3} $20.;

/* Address the first parts */

id1=substr(string,1,2);

id2=substr(string,3,1);

id3=substr(string,4,2);

string=substr(string,6);

/* Now we have the data left to process loop through each character */

tmp_count=1;

do i=1 to 40 by 2;

var{i}=substr(string,tmp_count,1);

tmp_count=tmp_count+1;

end;

/* Additional step to insert 1's where necessary */

do i=1 to 40 by 2;

if anyalpha(var{i}) and anyalpha(var{i+2}) then var{i+1}="1";

end;

/* Create final complete string */

final_string=cats(of id{*})||cats(of var{*});

run;

art297 · Posted 04-15-2015 08:52 AM

Are you sure you ran my code? I think that is exactly what the code did!

BharathBandi · Posted 04-15-2015 05:32 PM

Thank You Arthur, your code worked fine.

But I just realized that when string has no number after comma it is not giving value

example :

35,NB15O4L4NB7O

Desired Output:

var1 Var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12 var13 var14 var15 var16

35 , 1 N 1 B 15 O 4 L 4 N 1 B 7 O

Thank you

Ksharp · Posted 04-15-2015 10:06 AM



data have;
input string $40.;
cards;
28,16OB4N7L8U4L
28,12O8B20O
46,12C
30,14CB12C4N2C
;
run;
data temp;
 set have;
 string=prxchange('s/([A-Z])([A-Z])/\11\2/i',-1,string);
 pid=prxparse('/\d+|\W+|[a-z]/i');
 n+1;start=1;stop=length(string);
 call prxnext(pid,start,stop,string,position,length);
 do while(position gt 0);
  found=substr(string,position,length);output;
   call prxnext(pid,start,stop,string,position,length);
 end;
 drop pid start stop position length string ;
run;
proc transpose data=temp out=want(drop=_name_);
 by n;
 var found;
run;

Xia Keshan

art297 · Posted 04-15-2015 10:47 AM

Xia,

FWIW, your code appears to do the same thing that my code does, but requires slightly more cpu and real time than the code I had proposed.

The test I ran to compare the two was:

data have (drop=i);

input string $40.;

do i=1to 100000;

output;

end;

cards;

28,16OB4N7L8U4L

28,12O8B20O

46,12C

30,14CB12C4N2C

;

run;

data temp;

set have;

string=prxchange('s/([A-Z])([A-Z])/\11\2/i',-1,string);

pid=prxparse('/\d+|\W+|[a-z]/i');

n+1;start=1;stop=length(string);

call prxnext(pid,start,stop,string,position,length);

do while(position gt 0);

found=substr(string,position,length);output;

call prxnext(pid,start,stop,string,position,length);

end;

drop pid start stop position length string ;

run;

proc transpose data=temp out=want_xia(drop=_name_ n);

by n;

var found;

run;

data want_art (drop=string i);

set have;

string=prxchange('s/([0-9])([,A-Za-z])/$1_$2/', -1, STRING);

string=prxchange('s/(,)([0-9A-Za-z])/$1_$2/', -1, STRING);

string=prxchange('s/([0-9A-Za-z])(,)/$1_$2/', -1, STRING);

string=prxchange('s/([A-Za-z])([0-9])/$1_$2/', -1, STRING);

string=prxchange('s/([A-Za-z])([A-Za-z])/$1_1_$2/', -1, STRING);

array var(14) $;

i=1;

do while(scan(string,i,'_') ne '');

var(i)=scan(string,i,'_');

i+1;

end;

run;

Ksharp · Posted 04-16-2015 07:47 AM

Yeah. You are right ,Arthur.T . I believe that is because you pre-define the number of variables . How do you know that would have to be fourteen ?

Best

Xia Keshan

art297 · Posted 04-16-2015 09:27 AM

Xia: Agreed! I hard coded it based on my interpretation of OP's problem statement.

FriedEgg · Posted 04-15-2015 11:28 AM

data foo;

length foobar $ 32;

input foobar;

array v[14] $ 2;

v1 = scan(foobar, 1, ',');

v2 = ',';

bar = scan(foobar, 2, ',');

bar = prxchange('s/([a-z])([a-z])/\11\2/io', -1, bar);

bar = prxchange('s/((?:[a-z]+)|(?:[0-9]+))(?!\s)/\1_/io', -1, bar);

do i=3 to dim(v) by 1 while(1);

v=scan(bar, i-2, '_');

if missing(v) then leave;

end;

keep v:;

cards;

28,16OB4N7L8U4L

28,12O8B20O

46,12C

30,14CB12C4N2C

;

run;

art297 · Posted 04-15-2015 12:23 PM

Matt: Not more efficient! Took the same time as my code and will only work, as is, if the first two fields always contain a number followed by a comma, and none of the numbers require more than 2 characters.

Obviously those would be easy to change, but no improvement in performance.

FriedEgg · Posted 04-15-2015 02:47 PM

Okay Art, I will change it slightly to make it more efficient by breaking apart second prx into two, and why not, lets write it in ds2 also:

2083 data x.have (drop=i);

2084 input string $40.;

2085 do i=1to 1e6;

2086 output;

2087 end;

2088 cards;

NOTE: The data set X.HAVE has 4000000 observations and 1 variables.

NOTE: DATA statement used (Total process time):

real time 0.14 seconds

cpu time 0.14 seconds

2093 ;

2094 run;

2095

2096 data want_fe;

2097 set x.have;

2098 array v[14] $ 2;

2099 v1 = scan(string, 1, ',');

2100 v2 = ',';

2101 bar = scan(string, 2, ',');

2102 bar = prxchange('s/([a-z])([a-z])/\11\2/io', -1, bar);

2103 bar = prxchange('s/([a-z]+)(?!\s)/\1_/io', -1, bar);

2104 bar = prxchange('s/([0-9]+)/\1_/o', -1, bar);

2105 do i=3 to dim(v) by 1 while(1);

2106 v=scan(bar, i-2, '_');

2107 if missing(v) then leave;

2108 end;

2109 keep v:;

2110 run;

NOTE: There were 4000000 observations read from the data set X.HAVE.

NOTE: The data set WORK.WANT_FE has 4000000 observations and 14 variables.

NOTE: DATA statement used (Total process time):

real time 15.04 seconds

cpu time 15.06 seconds

2111

2112 data want_art (drop=string i);

2113 set x.have;

2114 string=prxchange('s/([0-9])([,A-Za-z])/$1_$2/', -1, STRING);

2115 string=prxchange('s/(,)([0-9A-Za-z])/$1_$2/', -1, STRING);

2116 string=prxchange('s/([0-9A-Za-z])(,)/$1_$2/', -1, STRING);

2117 string=prxchange('s/([A-Za-z])([0-9])/$1_$2/', -1, STRING);

2118 string=prxchange('s/([A-Za-z])([A-Za-z])/$1_1_$2/', -1, STRING);

2119 array var(14) $;

NOTE: The array var has the same name as a SAS-supplied or user-defined function. Parentheses

following this name are treated as array references and not function references.

2120 i=1;

2121 do while(scan(string,i,'_') ne '');

2122 var(i)=scan(string,i,'_');

2123 i+1;

2124 end;

2125 run;

NOTE: There were 4000000 observations read from the data set X.HAVE.

NOTE: The data set WORK.WANT_ART has 4000000 observations and 14 variables.

NOTE: DATA statement used (Total process time):

real time 20.46 seconds

cpu time 20.39 seconds

2126

2127 proc ds2;

2128

2129 thread break_string / overwrite=yes;

2130 vararray nchar(2) v[14];

2131 keep v:;

2132 method run();

2133 declare nchar(32) foo;

2134 declare int i;

2135 set x.have;

2136 v[1]=scan(string, 1, ',');

2137 v[2]=',';

2138 foo=scan(string, 2, ',');

2139 foo=prxchange('s/([a-z])([a-z])/\11\2/io', -1, foo);

2140 foo=prxchange('s/([a-z]+)(?!\s)/\1_/io', -1, foo);

2141 foo=prxchange('s/([0-9]+)/\1_/o', -1, foo);

2142 do i=3 to dim(v);

2143 v=scan(foo, i-2, '_');

2144 if missing(v) then leave;

2145 end;

2146 end;

2147 endthread;

2148 run;

NOTE: Created thread break_string in data set work.break_string.

NOTE: Execution succeeded. No rows affected.

2149

2150 data want_feds2 (overwrite=yes);

2151 declare thread break_string bsthread;

2152 method run();

2153 set from bsthread threads=4;

2154 end;

2155 enddata;

2156 run;

NOTE: BASE driver, creation of a NCHAR column has been requested, the table encoding is set to

UTF-8.

NOTE: Execution succeeded. 4000000 rows affected.

2157

2158 quit;

NOTE: PROCEDURE DS2 used (Total process time):

real time 5.11 seconds

cpu time 20.57 seconds

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away