Help using Base SAS procedures

Spliting a string into variable

Accepted Solution Solved
Reply
Contributor
Posts: 34
Accepted Solution

Spliting a string into variable

Hi ,

     I want to split the below string in to different variables and when there are two consecutive characters then I want to insert 1. I tried putting characters and numbers in different variables but that's not giving what I want. Please help me

          String

28,16OB4N7L8U4L

28,12O8B20O

46,12C

30,14CB12C4N2C

Output:

var1     Var2     var3     var4     var5      var6      var7     var8     var9     var10     var11     var12     var13     var14

28      ,               16      O          1          B          4          N          7          L          8          U          4              L

28     ,               12       O          8          B          20        O

46     ,               12       C

30     ,               14       C          1          B          12        C          4           N         2          C

Thank You


Accepted Solutions
Solution
‎04-15-2015 01:27 AM
PROC Star
Posts: 7,486

Re: Spliting a string into variable

Posted in reply to BharathBandi

There is probably an easier way to do this, but it's late and I'm tired:

data have;

  informat string $50.;

  input String;

  cards;

28,16OB4N7L8U4L

28,12O8B20O

46,12C

30,14CB12C4N2C

;

data want (drop=string i);

  set have;

  string=prxchange('s/([0-9])([,A-Za-z])/$1_$2/', -1, STRING);

  string=prxchange('s/(,)([0-9A-Za-z])/$1_$2/', -1, STRING);

  string=prxchange('s/([0-9A-Za-z])(,)/$1_$2/', -1, STRING);

  string=prxchange('s/([A-Za-z])([0-9])/$1_$2/', -1, STRING);

  string=prxchange('s/([A-Za-z])([A-Za-z])/$1_1_$2/', -1, STRING);

  array var(14) $;

  i=1;

  do while(scan(string,i,'_') ne '');

    var(i)=scan(string,i,'_');

    i+1;

  end;

run;

View solution in original post


All Replies
Trusted Advisor
Posts: 1,137

Re: Spliting a string into variable

Posted in reply to BharathBandi

data have;

input string:$100.;

cards;

28,16OB4N7L8U4L

28,12O8B20O

46,12C

30,14CB12C4N2C

;

data want;

set have;

var1=substr(string,1,2);

var2=substr(string,3,1);

var3=substr(string,4,2);

array vas(*) $ var4-var14;

do i = 1 to dim(vas);

vas(i)=substr(compress(string),i+5,1);

end;

run;

Thanks,

Jag

Thanks,
Jag
Solution
‎04-15-2015 01:27 AM
PROC Star
Posts: 7,486

Re: Spliting a string into variable

Posted in reply to BharathBandi

There is probably an easier way to do this, but it's late and I'm tired:

data have;

  informat string $50.;

  input String;

  cards;

28,16OB4N7L8U4L

28,12O8B20O

46,12C

30,14CB12C4N2C

;

data want (drop=string i);

  set have;

  string=prxchange('s/([0-9])([,A-Za-z])/$1_$2/', -1, STRING);

  string=prxchange('s/(,)([0-9A-Za-z])/$1_$2/', -1, STRING);

  string=prxchange('s/([0-9A-Za-z])(,)/$1_$2/', -1, STRING);

  string=prxchange('s/([A-Za-z])([0-9])/$1_$2/', -1, STRING);

  string=prxchange('s/([A-Za-z])([A-Za-z])/$1_1_$2/', -1, STRING);

  array var(14) $;

  i=1;

  do while(scan(string,i,'_') ne '');

    var(i)=scan(string,i,'_');

    i+1;

  end;

run;

Contributor
Posts: 34

Re: Spliting a string into variable

Thank you your code worked correct for me, but if there is no number in-front of character then I need 1.

example:After O theres no number, so in this case I need 1 then B.

For the first string : 28,16OB4N7L8U4L

output:

var1     Var2     var3     var4     var5      var6      var7     var8     var9     var10     var11     var12     var13     var14

28      ,               16      O        1          B          4          N          7          L          8          U          4              L

Hope I'm clear.

Thank you

Super User
Super User
Posts: 7,977

Re: Spliting a string into variable

Posted in reply to BharathBandi

Something like:

data have;

  informat string $50.;

  input String;

  cards;

28,16OB4N7L8U4L

28,12O8B20O

46,12C

30,14CB12C4N2C

;

run;

data want (keep=old_string final_string);

  set have;

  length id1-id3 old_string final_string $20;

  old_string=string;

  array var{40} $20.;

  array id{3} $20.;

  /* Address the first parts */

  id1=substr(string,1,2);

  id2=substr(string,3,1);

  id3=substr(string,4,2);

  string=substr(string,6);

  /* Now we have the data left to process loop through each character */

  tmp_count=1;

  do i=1 to 40 by 2;

    var{i}=substr(string,tmp_count,1);

    tmp_count=tmp_count+1;

  end;

 

  /* Additional step to insert 1's where necessary */

  do i=1 to 40 by 2;

    if anyalpha(var{i}) and anyalpha(var{i+2}) then var{i+1}="1";

  end;

  /* Create final complete string */

  final_string=cats(of id{*})||cats(of var{*});

run;

PROC Star
Posts: 7,486

Re: Spliting a string into variable

Posted in reply to BharathBandi

Are you sure you ran my code? I think that is exactly what the code did!

Contributor
Posts: 34

Re: Spliting a string into variable

Thank You Arthur, your code worked fine.

But I just realized that when string has no number after comma it is not giving value


example :

35,NB15O4L4NB7O


Desired Output:

var1     Var2     var3     var4     var5      var6      var7     var8     var9     var10     var11     var12     var13     var14     var15     var16

35      ,               1      N          1          B          15          O         4          L          4         N        1             B               7            O

Thank you

Super User
Posts: 10,041

Re: Spliting a string into variable

Posted in reply to BharathBandi


data have;
input string $40.;
cards;
28,16OB4N7L8U4L
28,12O8B20O
46,12C
30,14CB12C4N2C
;
run;
data temp;
 set have;
 string=prxchange('s/([A-Z])([A-Z])/\11\2/i',-1,string);
 pid=prxparse('/\d+|\W+|[a-z]/i');
 n+1;start=1;stop=length(string);
 call prxnext(pid,start,stop,string,position,length);
 do while(position gt 0);
  found=substr(string,position,length);output;
   call prxnext(pid,start,stop,string,position,length);
 end;
 drop pid start stop position length string ;
run;
proc transpose data=temp out=want(drop=_name_);
 by n;
 var found;
run;

Xia Keshan

PROC Star
Posts: 7,486

Re: Spliting a string into variable

Xia,

FWIW, your code appears to do the same thing that my code does, but requires slightly more cpu and real time than the code I had proposed.

The test I ran to compare the two was:

data have (drop=i);

  input string $40.;

  do i=1to 100000;

    output;

  end;

  cards;

28,16OB4N7L8U4L

28,12O8B20O

46,12C

30,14CB12C4N2C

;

run;

data temp;

set have;

string=prxchange('s/([A-Z])([A-Z])/\11\2/i',-1,string);

pid=prxparse('/\d+|\W+|[a-z]/i');

n+1;start=1;stop=length(string);

call prxnext(pid,start,stop,string,position,length);

do while(position gt 0);

  found=substr(string,position,length);output;

   call prxnext(pid,start,stop,string,position,length);

end;

drop pid start stop position length string ;

run;

proc transpose data=temp out=want_xia(drop=_name_ n);

by n;

var found;

run;

data want_art (drop=string i);

  set have;

  string=prxchange('s/([0-9])([,A-Za-z])/$1_$2/', -1, STRING);

  string=prxchange('s/(,)([0-9A-Za-z])/$1_$2/', -1, STRING);

  string=prxchange('s/([0-9A-Za-z])(,)/$1_$2/', -1, STRING);

  string=prxchange('s/([A-Za-z])([0-9])/$1_$2/', -1, STRING);

  string=prxchange('s/([A-Za-z])([A-Za-z])/$1_1_$2/', -1, STRING);

  array var(14) $;

  i=1;

  do while(scan(string,i,'_') ne '');

    var(i)=scan(string,i,'_');

    i+1;

  end;

run;

Super User
Posts: 10,041

Re: Spliting a string into variable

Yeah. You are right ,Arthur.T  . I believe that is because you pre-define the number of variables . How do you know that would have to be fourteen ?

Best

Xia Keshan

PROC Star
Posts: 7,486

Re: Spliting a string into variable

Xia: Agreed! I hard coded it based on my interpretation of OP's problem statement.

Trusted Advisor
Posts: 1,301

Re: Spliting a string into variable

Posted in reply to BharathBandi

data foo;

length foobar $ 32;

input foobar;

array v[14] $ 2;

v1 = scan(foobar, 1, ',');

v2 = ',';

bar = scan(foobar, 2, ',');

bar = prxchange('s/([a-z])([a-z])/\11\2/io', -1, bar);

bar = prxchange('s/((?:[a-z]+)|(?:[0-9]+))(?!\s)/\1_/io', -1, bar);

do i=3 to dim(v) by 1 while(1);

   v=scan(bar, i-2, '_');

   if missing(v) then leave;

   end;

keep v:;

cards;

28,16OB4N7L8U4L

28,12O8B20O

46,12C

30,14CB12C4N2C

;

run;

PROC Star
Posts: 7,486

Re: Spliting a string into variable

Matt: Not more efficient! Took the same time as my code and will only work, as is, if the first two fields always contain a number followed by a comma, and none of the numbers require more than 2 characters.

Obviously those would be easy to change, but no improvement in performance.

Trusted Advisor
Posts: 1,301

Re: Spliting a string into variable

Okay Art, I will change it slightly to make it more efficient by breaking apart second prx into two, and why not, lets write it in ds2 also:

2083  data x.have (drop=i);

2084    input string $40.;

2085    do i=1to 1e6;

2086      output;

2087    end;

2088    cards;

NOTE: The data set X.HAVE has 4000000 observations and 1 variables.

NOTE: DATA statement used (Total process time):

      real time           0.14 seconds

      cpu time            0.14 seconds

2093  ;

2094  run;

2095

2096  data want_fe;

2097     set x.have;

2098     array v[14] $ 2;

2099  v1 = scan(string, 1, ',');

2100  v2 = ',';

2101  bar = scan(string, 2, ',');

2102  bar = prxchange('s/([a-z])([a-z])/\11\2/io', -1, bar);

2103  bar = prxchange('s/([a-z]+)(?!\s)/\1_/io', -1, bar);

2104  bar = prxchange('s/([0-9]+)/\1_/o', -1, bar);

2105  do i=3 to dim(v) by 1 while(1);

2106     v=scan(bar, i-2, '_');

2107     if missing(v) then leave;

2108     end;

2109  keep v:;

2110  run;

NOTE: There were 4000000 observations read from the data set X.HAVE.

NOTE: The data set WORK.WANT_FE has 4000000 observations and 14 variables.

NOTE: DATA statement used (Total process time):

      real time           15.04 seconds

      cpu time            15.06 seconds

2111

2112  data want_art (drop=string i);

2113    set x.have;

2114    string=prxchange('s/([0-9])([,A-Za-z])/$1_$2/', -1, STRING);

2115    string=prxchange('s/(,)([0-9A-Za-z])/$1_$2/', -1, STRING);

2116    string=prxchange('s/([0-9A-Za-z])(,)/$1_$2/', -1, STRING);

2117    string=prxchange('s/([A-Za-z])([0-9])/$1_$2/', -1, STRING);

2118    string=prxchange('s/([A-Za-z])([A-Za-z])/$1_1_$2/', -1, STRING);

2119    array var(14) $;

NOTE: The array var has the same name as a SAS-supplied or user-defined function.  Parentheses

      following this name are treated as array references and not function references.

2120    i=1;

2121    do while(scan(string,i,'_') ne '');

2122      var(i)=scan(string,i,'_');

2123      i+1;

2124    end;

2125  run;

NOTE: There were 4000000 observations read from the data set X.HAVE.

NOTE: The data set WORK.WANT_ART has 4000000 observations and 14 variables.

NOTE: DATA statement used (Total process time):

      real time           20.46 seconds

      cpu time            20.39 seconds

2126

2127  proc ds2;

2128

2129  thread break_string / overwrite=yes;

2130     vararray nchar(2) v[14];

2131     keep v:;

2132     method run();

2133        declare nchar(32) foo;

2134        declare int i;

2135        set x.have;

2136        v[1]=scan(string, 1, ',');

2137        v[2]=',';

2138        foo=scan(string, 2, ',');

2139        foo=prxchange('s/([a-z])([a-z])/\11\2/io', -1, foo);

2140        foo=prxchange('s/([a-z]+)(?!\s)/\1_/io', -1, foo);

2141        foo=prxchange('s/([0-9]+)/\1_/o', -1, foo);

2142        do i=3 to dim(v);

2143           v=scan(foo, i-2, '_');

2144           if missing(v) then leave;

2145           end;

2146        end;

2147     endthread;

2148     run;

NOTE: Created thread break_string in data set work.break_string.

NOTE: Execution succeeded. No rows affected.

2149

2150  data want_feds2 (overwrite=yes);

2151     declare thread break_string bsthread;

2152     method run();

2153        set from bsthread threads=4;

2154        end;

2155     enddata;

2156     run;

NOTE: BASE driver, creation of a NCHAR column has been requested, the table encoding is set to

      UTF-8.

NOTE: Execution succeeded. 4000000 rows affected.

2157

2158  quit;

NOTE: PROCEDURE DS2 used (Total process time):

      real time           5.11 seconds

      cpu time            20.57 seconds

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 591 views
  • 11 likes
  • 6 in conversation