BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BharathBandi
Calcite | Level 5

Hi ,

     I want to split the below string in to different variables and when there are two consecutive characters then I want to insert 1. I tried putting characters and numbers in different variables but that's not giving what I want. Please help me

          String

28,16OB4N7L8U4L

28,12O8B20O

46,12C

30,14CB12C4N2C

Output:

var1     Var2     var3     var4     var5      var6      var7     var8     var9     var10     var11     var12     var13     var14

28      ,               16      O          1          B          4          N          7          L          8          U          4              L

28     ,               12       O          8          B          20        O

46     ,               12       C

30     ,               14       C          1          B          12        C          4           N         2          C

Thank You

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

There is probably an easier way to do this, but it's late and I'm tired:

data have;

  informat string $50.;

  input String;

  cards;

28,16OB4N7L8U4L

28,12O8B20O

46,12C

30,14CB12C4N2C

;

data want (drop=string i);

  set have;

  string=prxchange('s/([0-9])([,A-Za-z])/$1_$2/', -1, STRING);

  string=prxchange('s/(,)([0-9A-Za-z])/$1_$2/', -1, STRING);

  string=prxchange('s/([0-9A-Za-z])(,)/$1_$2/', -1, STRING);

  string=prxchange('s/([A-Za-z])([0-9])/$1_$2/', -1, STRING);

  string=prxchange('s/([A-Za-z])([A-Za-z])/$1_1_$2/', -1, STRING);

  array var(14) $;

  i=1;

  do while(scan(string,i,'_') ne '');

    var(i)=scan(string,i,'_');

    i+1;

  end;

run;

View solution in original post

13 REPLIES 13
Jagadishkatam
Amethyst | Level 16

data have;

input string:$100.;

cards;

28,16OB4N7L8U4L

28,12O8B20O

46,12C

30,14CB12C4N2C

;

data want;

set have;

var1=substr(string,1,2);

var2=substr(string,3,1);

var3=substr(string,4,2);

array vas(*) $ var4-var14;

do i = 1 to dim(vas);

vas(i)=substr(compress(string),i+5,1);

end;

run;

Thanks,

Jag

Thanks,
Jag
art297
Opal | Level 21

There is probably an easier way to do this, but it's late and I'm tired:

data have;

  informat string $50.;

  input String;

  cards;

28,16OB4N7L8U4L

28,12O8B20O

46,12C

30,14CB12C4N2C

;

data want (drop=string i);

  set have;

  string=prxchange('s/([0-9])([,A-Za-z])/$1_$2/', -1, STRING);

  string=prxchange('s/(,)([0-9A-Za-z])/$1_$2/', -1, STRING);

  string=prxchange('s/([0-9A-Za-z])(,)/$1_$2/', -1, STRING);

  string=prxchange('s/([A-Za-z])([0-9])/$1_$2/', -1, STRING);

  string=prxchange('s/([A-Za-z])([A-Za-z])/$1_1_$2/', -1, STRING);

  array var(14) $;

  i=1;

  do while(scan(string,i,'_') ne '');

    var(i)=scan(string,i,'_');

    i+1;

  end;

run;

BharathBandi
Calcite | Level 5

Thank you your code worked correct for me, but if there is no number in-front of character then I need 1.

example:After O theres no number, so in this case I need 1 then B.

For the first string : 28,16OB4N7L8U4L

output:

var1     Var2     var3     var4     var5      var6      var7     var8     var9     var10     var11     var12     var13     var14

28      ,               16      O        1          B          4          N          7          L          8          U          4              L

Hope I'm clear.

Thank you

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Something like:

data have;

  informat string $50.;

  input String;

  cards;

28,16OB4N7L8U4L

28,12O8B20O

46,12C

30,14CB12C4N2C

;

run;

data want (keep=old_string final_string);

  set have;

  length id1-id3 old_string final_string $20;

  old_string=string;

  array var{40} $20.;

  array id{3} $20.;

  /* Address the first parts */

  id1=substr(string,1,2);

  id2=substr(string,3,1);

  id3=substr(string,4,2);

  string=substr(string,6);

  /* Now we have the data left to process loop through each character */

  tmp_count=1;

  do i=1 to 40 by 2;

    var{i}=substr(string,tmp_count,1);

    tmp_count=tmp_count+1;

  end;

 

  /* Additional step to insert 1's where necessary */

  do i=1 to 40 by 2;

    if anyalpha(var{i}) and anyalpha(var{i+2}) then var{i+1}="1";

  end;

  /* Create final complete string */

  final_string=cats(of id{*})||cats(of var{*});

run;

art297
Opal | Level 21

Are you sure you ran my code? I think that is exactly what the code did!

BharathBandi
Calcite | Level 5

Thank You Arthur, your code worked fine.

But I just realized that when string has no number after comma it is not giving value


example :

35,NB15O4L4NB7O


Desired Output:

var1     Var2     var3     var4     var5      var6      var7     var8     var9     var10     var11     var12     var13     var14     var15     var16

35      ,               1      N          1          B          15          O         4          L          4         N        1             B               7            O

Thank you

Ksharp
Super User


data have;
input string $40.;
cards;
28,16OB4N7L8U4L
28,12O8B20O
46,12C
30,14CB12C4N2C
;
run;
data temp;
 set have;
 string=prxchange('s/([A-Z])([A-Z])/\11\2/i',-1,string);
 pid=prxparse('/\d+|\W+|[a-z]/i');
 n+1;start=1;stop=length(string);
 call prxnext(pid,start,stop,string,position,length);
 do while(position gt 0);
  found=substr(string,position,length);output;
   call prxnext(pid,start,stop,string,position,length);
 end;
 drop pid start stop position length string ;
run;
proc transpose data=temp out=want(drop=_name_);
 by n;
 var found;
run;

Xia Keshan

art297
Opal | Level 21

Xia,

FWIW, your code appears to do the same thing that my code does, but requires slightly more cpu and real time than the code I had proposed.

The test I ran to compare the two was:

data have (drop=i);

  input string $40.;

  do i=1to 100000;

    output;

  end;

  cards;

28,16OB4N7L8U4L

28,12O8B20O

46,12C

30,14CB12C4N2C

;

run;

data temp;

set have;

string=prxchange('s/([A-Z])([A-Z])/\11\2/i',-1,string);

pid=prxparse('/\d+|\W+|[a-z]/i');

n+1;start=1;stop=length(string);

call prxnext(pid,start,stop,string,position,length);

do while(position gt 0);

  found=substr(string,position,length);output;

   call prxnext(pid,start,stop,string,position,length);

end;

drop pid start stop position length string ;

run;

proc transpose data=temp out=want_xia(drop=_name_ n);

by n;

var found;

run;

data want_art (drop=string i);

  set have;

  string=prxchange('s/([0-9])([,A-Za-z])/$1_$2/', -1, STRING);

  string=prxchange('s/(,)([0-9A-Za-z])/$1_$2/', -1, STRING);

  string=prxchange('s/([0-9A-Za-z])(,)/$1_$2/', -1, STRING);

  string=prxchange('s/([A-Za-z])([0-9])/$1_$2/', -1, STRING);

  string=prxchange('s/([A-Za-z])([A-Za-z])/$1_1_$2/', -1, STRING);

  array var(14) $;

  i=1;

  do while(scan(string,i,'_') ne '');

    var(i)=scan(string,i,'_');

    i+1;

  end;

run;

Ksharp
Super User

Yeah. You are right ,Arthur.T  . I believe that is because you pre-define the number of variables . How do you know that would have to be fourteen ?

Best

Xia Keshan

art297
Opal | Level 21

Xia: Agreed! I hard coded it based on my interpretation of OP's problem statement.

FriedEgg
SAS Employee

data foo;

length foobar $ 32;

input foobar;

array v[14] $ 2;

v1 = scan(foobar, 1, ',');

v2 = ',';

bar = scan(foobar, 2, ',');

bar = prxchange('s/([a-z])([a-z])/\11\2/io', -1, bar);

bar = prxchange('s/((?:[a-z]+)|(?:[0-9]+))(?!\s)/\1_/io', -1, bar);

do i=3 to dim(v) by 1 while(1);

   v=scan(bar, i-2, '_');

   if missing(v) then leave;

   end;

keep v:;

cards;

28,16OB4N7L8U4L

28,12O8B20O

46,12C

30,14CB12C4N2C

;

run;

art297
Opal | Level 21

Matt: Not more efficient! Took the same time as my code and will only work, as is, if the first two fields always contain a number followed by a comma, and none of the numbers require more than 2 characters.

Obviously those would be easy to change, but no improvement in performance.

FriedEgg
SAS Employee

Okay Art, I will change it slightly to make it more efficient by breaking apart second prx into two, and why not, lets write it in ds2 also:

2083  data x.have (drop=i);

2084    input string $40.;

2085    do i=1to 1e6;

2086      output;

2087    end;

2088    cards;

NOTE: The data set X.HAVE has 4000000 observations and 1 variables.

NOTE: DATA statement used (Total process time):

      real time           0.14 seconds

      cpu time            0.14 seconds

2093  ;

2094  run;

2095

2096  data want_fe;

2097     set x.have;

2098     array v[14] $ 2;

2099  v1 = scan(string, 1, ',');

2100  v2 = ',';

2101  bar = scan(string, 2, ',');

2102  bar = prxchange('s/([a-z])([a-z])/\11\2/io', -1, bar);

2103  bar = prxchange('s/([a-z]+)(?!\s)/\1_/io', -1, bar);

2104  bar = prxchange('s/([0-9]+)/\1_/o', -1, bar);

2105  do i=3 to dim(v) by 1 while(1);

2106     v=scan(bar, i-2, '_');

2107     if missing(v) then leave;

2108     end;

2109  keep v:;

2110  run;

NOTE: There were 4000000 observations read from the data set X.HAVE.

NOTE: The data set WORK.WANT_FE has 4000000 observations and 14 variables.

NOTE: DATA statement used (Total process time):

      real time           15.04 seconds

      cpu time            15.06 seconds

2111

2112  data want_art (drop=string i);

2113    set x.have;

2114    string=prxchange('s/([0-9])([,A-Za-z])/$1_$2/', -1, STRING);

2115    string=prxchange('s/(,)([0-9A-Za-z])/$1_$2/', -1, STRING);

2116    string=prxchange('s/([0-9A-Za-z])(,)/$1_$2/', -1, STRING);

2117    string=prxchange('s/([A-Za-z])([0-9])/$1_$2/', -1, STRING);

2118    string=prxchange('s/([A-Za-z])([A-Za-z])/$1_1_$2/', -1, STRING);

2119    array var(14) $;

NOTE: The array var has the same name as a SAS-supplied or user-defined function.  Parentheses

      following this name are treated as array references and not function references.

2120    i=1;

2121    do while(scan(string,i,'_') ne '');

2122      var(i)=scan(string,i,'_');

2123      i+1;

2124    end;

2125  run;

NOTE: There were 4000000 observations read from the data set X.HAVE.

NOTE: The data set WORK.WANT_ART has 4000000 observations and 14 variables.

NOTE: DATA statement used (Total process time):

      real time           20.46 seconds

      cpu time            20.39 seconds

2126

2127  proc ds2;

2128

2129  thread break_string / overwrite=yes;

2130     vararray nchar(2) v[14];

2131     keep v:;

2132     method run();

2133        declare nchar(32) foo;

2134        declare int i;

2135        set x.have;

2136        v[1]=scan(string, 1, ',');

2137        v[2]=',';

2138        foo=scan(string, 2, ',');

2139        foo=prxchange('s/([a-z])([a-z])/\11\2/io', -1, foo);

2140        foo=prxchange('s/([a-z]+)(?!\s)/\1_/io', -1, foo);

2141        foo=prxchange('s/([0-9]+)/\1_/o', -1, foo);

2142        do i=3 to dim(v);

2143           v=scan(foo, i-2, '_');

2144           if missing(v) then leave;

2145           end;

2146        end;

2147     endthread;

2148     run;

NOTE: Created thread break_string in data set work.break_string.

NOTE: Execution succeeded. No rows affected.

2149

2150  data want_feds2 (overwrite=yes);

2151     declare thread break_string bsthread;

2152     method run();

2153        set from bsthread threads=4;

2154        end;

2155     enddata;

2156     run;

NOTE: BASE driver, creation of a NCHAR column has been requested, the table encoding is set to

      UTF-8.

NOTE: Execution succeeded. 4000000 rows affected.

2157

2158  quit;

NOTE: PROCEDURE DS2 used (Total process time):

      real time           5.11 seconds

      cpu time            20.57 seconds

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 4439 views
  • 11 likes
  • 6 in conversation