Hi ,
I want to split the below string in to different variables and when there are two consecutive characters then I want to insert 1. I tried putting characters and numbers in different variables but that's not giving what I want. Please help me
String
28,16OB4N7L8U4L
28,12O8B20O
46,12C
30,14CB12C4N2C
Output:
var1 Var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12 var13 var14
28 , 16 O 1 B 4 N 7 L 8 U 4 L
28 , 12 O 8 B 20 O
46 , 12 C
30 , 14 C 1 B 12 C 4 N 2 C
Thank You
There is probably an easier way to do this, but it's late and I'm tired:
data have;
informat string $50.;
input String;
cards;
28,16OB4N7L8U4L
28,12O8B20O
46,12C
30,14CB12C4N2C
;
data want (drop=string i);
set have;
string=prxchange('s/([0-9])([,A-Za-z])/$1_$2/', -1, STRING);
string=prxchange('s/(,)([0-9A-Za-z])/$1_$2/', -1, STRING);
string=prxchange('s/([0-9A-Za-z])(,)/$1_$2/', -1, STRING);
string=prxchange('s/([A-Za-z])([0-9])/$1_$2/', -1, STRING);
string=prxchange('s/([A-Za-z])([A-Za-z])/$1_1_$2/', -1, STRING);
array var(14) $;
i=1;
do while(scan(string,i,'_') ne '');
var(i)=scan(string,i,'_');
i+1;
end;
run;
data have;
input string:$100.;
cards;
28,16OB4N7L8U4L
28,12O8B20O
46,12C
30,14CB12C4N2C
;
data want;
set have;
var1=substr(string,1,2);
var2=substr(string,3,1);
var3=substr(string,4,2);
array vas(*) $ var4-var14;
do i = 1 to dim(vas);
vas(i)=substr(compress(string),i+5,1);
end;
run;
Thanks,
Jag
There is probably an easier way to do this, but it's late and I'm tired:
data have;
informat string $50.;
input String;
cards;
28,16OB4N7L8U4L
28,12O8B20O
46,12C
30,14CB12C4N2C
;
data want (drop=string i);
set have;
string=prxchange('s/([0-9])([,A-Za-z])/$1_$2/', -1, STRING);
string=prxchange('s/(,)([0-9A-Za-z])/$1_$2/', -1, STRING);
string=prxchange('s/([0-9A-Za-z])(,)/$1_$2/', -1, STRING);
string=prxchange('s/([A-Za-z])([0-9])/$1_$2/', -1, STRING);
string=prxchange('s/([A-Za-z])([A-Za-z])/$1_1_$2/', -1, STRING);
array var(14) $;
i=1;
do while(scan(string,i,'_') ne '');
var(i)=scan(string,i,'_');
i+1;
end;
run;
Thank you your code worked correct for me, but if there is no number in-front of character then I need 1.
example:After O theres no number, so in this case I need 1 then B.
For the first string : 28,16OB4N7L8U4L
output:
var1 Var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12 var13 var14
28 , 16 O 1 B 4 N 7 L 8 U 4 L
Hope I'm clear.
Thank you
Something like:
data have;
informat string $50.;
input String;
cards;
28,16OB4N7L8U4L
28,12O8B20O
46,12C
30,14CB12C4N2C
;
run;
data want (keep=old_string final_string);
set have;
length id1-id3 old_string final_string $20;
old_string=string;
array var{40} $20.;
array id{3} $20.;
/* Address the first parts */
id1=substr(string,1,2);
id2=substr(string,3,1);
id3=substr(string,4,2);
string=substr(string,6);
/* Now we have the data left to process loop through each character */
tmp_count=1;
do i=1 to 40 by 2;
var{i}=substr(string,tmp_count,1);
tmp_count=tmp_count+1;
end;
/* Additional step to insert 1's where necessary */
do i=1 to 40 by 2;
if anyalpha(var{i}) and anyalpha(var{i+2}) then var{i+1}="1";
end;
/* Create final complete string */
final_string=cats(of id{*})||cats(of var{*});
run;
Are you sure you ran my code? I think that is exactly what the code did!
Thank You Arthur, your code worked fine.
But I just realized that when string has no number after comma it is not giving value
example :
35,NB15O4L4NB7O
Desired Output:
var1 Var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12 var13 var14 var15 var16
35 , 1 N 1 B 15 O 4 L 4 N 1 B 7 O
Thank you
data have; input string $40.; cards; 28,16OB4N7L8U4L 28,12O8B20O 46,12C 30,14CB12C4N2C ; run; data temp; set have; string=prxchange('s/([A-Z])([A-Z])/\11\2/i',-1,string); pid=prxparse('/\d+|\W+|[a-z]/i'); n+1;start=1;stop=length(string); call prxnext(pid,start,stop,string,position,length); do while(position gt 0); found=substr(string,position,length);output; call prxnext(pid,start,stop,string,position,length); end; drop pid start stop position length string ; run; proc transpose data=temp out=want(drop=_name_); by n; var found; run;
Xia Keshan
Xia,
FWIW, your code appears to do the same thing that my code does, but requires slightly more cpu and real time than the code I had proposed.
The test I ran to compare the two was:
data have (drop=i);
input string $40.;
do i=1to 100000;
output;
end;
cards;
28,16OB4N7L8U4L
28,12O8B20O
46,12C
30,14CB12C4N2C
;
run;
data temp;
set have;
string=prxchange('s/([A-Z])([A-Z])/\11\2/i',-1,string);
pid=prxparse('/\d+|\W+|[a-z]/i');
n+1;start=1;stop=length(string);
call prxnext(pid,start,stop,string,position,length);
do while(position gt 0);
found=substr(string,position,length);output;
call prxnext(pid,start,stop,string,position,length);
end;
drop pid start stop position length string ;
run;
proc transpose data=temp out=want_xia(drop=_name_ n);
by n;
var found;
run;
data want_art (drop=string i);
set have;
string=prxchange('s/([0-9])([,A-Za-z])/$1_$2/', -1, STRING);
string=prxchange('s/(,)([0-9A-Za-z])/$1_$2/', -1, STRING);
string=prxchange('s/([0-9A-Za-z])(,)/$1_$2/', -1, STRING);
string=prxchange('s/([A-Za-z])([0-9])/$1_$2/', -1, STRING);
string=prxchange('s/([A-Za-z])([A-Za-z])/$1_1_$2/', -1, STRING);
array var(14) $;
i=1;
do while(scan(string,i,'_') ne '');
var(i)=scan(string,i,'_');
i+1;
end;
run;
Yeah. You are right ,Arthur.T . I believe that is because you pre-define the number of variables . How do you know that would have to be fourteen ?
Best
Xia Keshan
Xia: Agreed! I hard coded it based on my interpretation of OP's problem statement.
data foo;
length foobar $ 32;
input foobar;
array v[14] $ 2;
v1 = scan(foobar, 1, ',');
v2 = ',';
bar = scan(foobar, 2, ',');
bar = prxchange('s/([a-z])([a-z])/\11\2/io', -1, bar);
bar = prxchange('s/((?:[a-z]+)|(?:[0-9]+))(?!\s)/\1_/io', -1, bar);
do i=3 to dim(v) by 1 while(1);
v=scan(bar, i-2, '_');
if missing(v) then leave;
end;
keep v:;
cards;
28,16OB4N7L8U4L
28,12O8B20O
46,12C
30,14CB12C4N2C
;
run;
Matt: Not more efficient! Took the same time as my code and will only work, as is, if the first two fields always contain a number followed by a comma, and none of the numbers require more than 2 characters.
Obviously those would be easy to change, but no improvement in performance.
Okay Art, I will change it slightly to make it more efficient by breaking apart second prx into two, and why not, lets write it in ds2 also:
2083 data x.have (drop=i);
2084 input string $40.;
2085 do i=1to 1e6;
2086 output;
2087 end;
2088 cards;
NOTE: The data set X.HAVE has 4000000 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.14 seconds
cpu time 0.14 seconds
2093 ;
2094 run;
2095
2096 data want_fe;
2097 set x.have;
2098 array v[14] $ 2;
2099 v1 = scan(string, 1, ',');
2100 v2 = ',';
2101 bar = scan(string, 2, ',');
2102 bar = prxchange('s/([a-z])([a-z])/\11\2/io', -1, bar);
2103 bar = prxchange('s/([a-z]+)(?!\s)/\1_/io', -1, bar);
2104 bar = prxchange('s/([0-9]+)/\1_/o', -1, bar);
2105 do i=3 to dim(v) by 1 while(1);
2106 v=scan(bar, i-2, '_');
2107 if missing(v) then leave;
2108 end;
2109 keep v:;
2110 run;
NOTE: There were 4000000 observations read from the data set X.HAVE.
NOTE: The data set WORK.WANT_FE has 4000000 observations and 14 variables.
NOTE: DATA statement used (Total process time):
real time 15.04 seconds
cpu time 15.06 seconds
2111
2112 data want_art (drop=string i);
2113 set x.have;
2114 string=prxchange('s/([0-9])([,A-Za-z])/$1_$2/', -1, STRING);
2115 string=prxchange('s/(,)([0-9A-Za-z])/$1_$2/', -1, STRING);
2116 string=prxchange('s/([0-9A-Za-z])(,)/$1_$2/', -1, STRING);
2117 string=prxchange('s/([A-Za-z])([0-9])/$1_$2/', -1, STRING);
2118 string=prxchange('s/([A-Za-z])([A-Za-z])/$1_1_$2/', -1, STRING);
2119 array var(14) $;
NOTE: The array var has the same name as a SAS-supplied or user-defined function. Parentheses
following this name are treated as array references and not function references.
2120 i=1;
2121 do while(scan(string,i,'_') ne '');
2122 var(i)=scan(string,i,'_');
2123 i+1;
2124 end;
2125 run;
NOTE: There were 4000000 observations read from the data set X.HAVE.
NOTE: The data set WORK.WANT_ART has 4000000 observations and 14 variables.
NOTE: DATA statement used (Total process time):
real time 20.46 seconds
cpu time 20.39 seconds
2126
2127 proc ds2;
2128
2129 thread break_string / overwrite=yes;
2130 vararray nchar(2) v[14];
2131 keep v:;
2132 method run();
2133 declare nchar(32) foo;
2134 declare int i;
2135 set x.have;
2136 v[1]=scan(string, 1, ',');
2137 v[2]=',';
2138 foo=scan(string, 2, ',');
2139 foo=prxchange('s/([a-z])([a-z])/\11\2/io', -1, foo);
2140 foo=prxchange('s/([a-z]+)(?!\s)/\1_/io', -1, foo);
2141 foo=prxchange('s/([0-9]+)/\1_/o', -1, foo);
2142 do i=3 to dim(v);
2143 v=scan(foo, i-2, '_');
2144 if missing(v) then leave;
2145 end;
2146 end;
2147 endthread;
2148 run;
NOTE: Created thread break_string in data set work.break_string.
NOTE: Execution succeeded. No rows affected.
2149
2150 data want_feds2 (overwrite=yes);
2151 declare thread break_string bsthread;
2152 method run();
2153 set from bsthread threads=4;
2154 end;
2155 enddata;
2156 run;
NOTE: BASE driver, creation of a NCHAR column has been requested, the table encoding is set to
UTF-8.
NOTE: Execution succeeded. 4000000 rows affected.
2157
2158 quit;
NOTE: PROCEDURE DS2 used (Total process time):
real time 5.11 seconds
cpu time 20.57 seconds
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.