Help using Base SAS procedures

Splitting a string

Reply
Occasional Contributor
Posts: 10

Splitting a string

How to split a string at less than 200 character in order to derive SDTM variable like DVTERMxx? My variable value has a length of 650 character so my assumption is it ends up in 4 variables. I am not sure. Any help will be appreciated.

Super User
Posts: 10,508

Re: Splitting a string

An example of what you have and what the output should be would help. Differe types of string values may be manipulated easiest with different approaches.

Occasional Contributor
Posts: 10

Re: Splitting a string

Here is an example:

Var1= I am going to market to get some groceries. I will not return till the evening. If there is anyway possible for you to cook the dinner. I am sure the grocery stores may be busy due to the forecast of having the storm this weekend.I hate going to the groceries.I am going to market to get some groceries. I will not return till the evening. If there is anyway possible for you to cook the dinner. I am sure the grocery stores may be busy due to the forecast of having the storm this weekend.I hate going to the groceries.I am going to market to get some groceries. I will not return till the evening. If there is anyway possible for you to cook the dinner. I am sure the grocery stores may be busy due to the forecast of having the storm this weekend.I hate going to the groceries.

I want to split this var at 200 character without chopping any word but get close to 200 and make the first 200 as my first var like xxTERM and the rest of the string will go to the next variable xxTERM01 , xxTERM02 and so on. I hope this make sense?

Respected Advisor
Posts: 3,124

Re: Splitting a string

data have;

var1="I am going to market to get some groceries. I will not return till the evening. If there is anyway possible for you to cook the dinner. I am sure the grocery stores may be busy due to the forecast of having the storm this weekend.I hate going to the groceries.I am going to market to get some groceries. I will not return till the evening. If there is anyway possible for you to cook the dinner. I am sure the grocery stores may be busy due to the forecast of having the storm this weekend.I hate going to the groceries.I am going to market to get some groceries. I will not return till the evening. If there is anyway possible for you to cook the dinner. I am sure the grocery stores may be busy due to the forecast of having the storm this weekend.I hate going to the groceries.";

run;

proc sql;

  select cats(ceil(max(length(var1))/200)) into :dim from have;

  quit;

data want;

   set have;

   array split(&dim) $200. xxTERM1-xxTERM&dim;j=1;

   do i=1 to &dim WHILE (not missing(scan(var1,j)));

    len=0;

     do j=j by 1 while (not missing(scan(var1,j)));

       if length(scan(var1,j))+len>200 then leave;

split(i)=catx(' ',split(i),scan(var1,j));

len=length(split(i));

       end;

   end;

   keep xx:;

   run;


Haikuo

Super Contributor
Posts: 259

Re: Splitting a string

If you don't want to keep var1 in the output dataset then try this. The proc sql setting the variable dim by Haikuo needs further testing. As long as your text contains short words, using ceil will most likely provide a number large enough to create the required number of substring-variables.


proc sql noprint;

  select ceil(max(length(var1)) / 200)

            into :dim separated by '' /* strips leading blanks */

            from have

      ;

quit;

data want;

      set have;

    

      length

            Part01-Part&dim $ 200

            i 8

      ;

      array parts Part:;

    i = 1;

      do while (length(Var1) > 200);

            blankPos = find(var1, ' ', -200);

            parts = substr(Var1, 1, blankPos -1);

            i = i + 1;

            Var1 = substr(Var1, blankPos);

      end;

      if length(var1) > 0 then do;

            parts = var1;

      end;

      drop var1 i blankPos /*Buffer*/;

run;

Super User
Posts: 9,682

Re: Splitting a string

Or transpose it into vertical.

data have;
length var1 $ 4000 ;
var1='I am going to market to get some groceries. I will not return till the evening. If there is anyway possible for you to cook the dinner. I am sure the grocery stores may be busy due to the forecast of having the storm this weekend.I hate going to the groceries.I am going to market to get some groceries. I will not return till the evening. If there is anyway possible for you to cook the dinner. I am sure the grocery stores may be busy due to the forecast of having the storm this weekend.I hate going to the groceries.I am going to market to get some groceries. I will not return till the evening. If there is anyway possible for you to cook the dinner. I am sure the grocery stores may be busy due to the forecast of having the storm this weekend.I hate going to the groceries.';
output;
var1='I am going to market to get some groceries.I will not return till the evening. If there is anyway possible for you to cook the dinner.I will not return till the evening. If there is anyway possible for you to cook the dinner.I will not return till the evening. If there is anyway possible for you to cook the dinner.I will not return till the evening. If there is anyway possible for you to cook the dinner.I will not return till the evening. If there is anyway possible for you to cook the dinner.';
output;
run;
data temp(keep=id x);
 set have;
 id=_n_;
 do i=1 to countw(var1,' ');
  x=scan(var1,i,' ');
  output;
 end;
run;
data temp;
 set temp;
 length xx $ 400;
 retain xx;
 retain n 1;
 xx=catx(' ',xx,x);
 if  id ne lag(id) then do;n=1;     xx=x;end;
 if length(xx) gt 200  then do;n+1;xx=x;end;
 run;
 data x;
  set temp;
  by n notsorted;
  if last.n;
run;
 
 proc transpose data=x out=want prefix=temp;
 by id;
 id n;
 var xx;
run;

Ksharp

Super Contributor
Posts: 1,040

Re: Splitting a string

Hi,

I have a couple of questions in the code.

data temp;

set temp;

length xx $ 400;       <<<<<----length of var1 was 40000 and why are we giving a length of only 400 to xx??????

retain xx;

retain n 1; <<<<<<<<<<<<<<<<----------------------------does this mean n variable to have a length of 1 while it retains???

xx=catx(' ',xx,x);

if  id ne lag(id) then do;n=1;     xx=x;end;

if length(xx) gt 200  then do;n+1;xx=x;end;   /*i also have difficulty in understanding what these 2 lines of code does???*/

run;

Please help

Thanks

Super User
Posts: 9,682

Re: Splitting a string

"

length xx $ 400; <<<<<----length of var1 was 40000 and why are we giving a length of only 400 to xx??????

"

It is used to hold a variable which has $200. you need . I make it as $400. is to make sure greater than $200, and we can judge if its length is greater than $200 or not. (if length(xx) gt 200 )

"retain n 1; <<<<<<<<<<<<<<<<----------------------------does this mean n variable to have a length of 1 while it retains???"

no, it means n is numeric variable and its initiative value is 1 .

"

if  id ne lag(id) then do;n=1;     xx=x;end;"

at the beginning of every group, I initiative xx as x (xx has the same value with x).and reinstate group id (variable n) as 1.



"if length(xx) gt 200  then do;n+1;xx=x;end;   /*i also have difficulty in understanding what these 2 lines of code does???*/"

I make several groups (variable n), each group 's length ( variable xx 's length) is not greater than 200 which you need to split.

Actually here xx variable is what we need, then proc transpose to make it what we finally want.

Ksharp

Message was edited by: xia keshan

Super Contributor
Posts: 1,040

Re: Splitting a string

Hi,

I ran through the code again.

I was wondering about the following statement:

if  id ne lag(id) then do;n=1;     xx=x;end;"

There are 151 words in the first statement

there are 94 words in the second statement

lag(id) is the same for the first 151 records except for the 1st record so n  should not equls 1.But to my surprise I find n=1 after running this code

Could some one explain to me please...Same with the other 94 records

Thanks

Super User
Posts: 9,682

Re: Splitting a string

Actually it equals   if first.id then .............

when it is the first obs of group id (first.id) then assign the first word of 151 words to variable xx and n=1;

at this time, if length(xx) gt 200  will evaluate the first word 's length ,it should not be great than 200 so n will not +1 .

Ksharp

Super User
Super User
Posts: 6,502

Re: Splitting a string

If you just want to chop into 200 character pieces then just do it.

Let's say you have variable that is 650.  So that will take 4 pieces, 1-200, 201-400, 401-600 and 601 on.

data want ;

  set have ;

  array _x $200 DVTERM01 - DVTERM04  ;

  do i=1 to 650 by 200 ;

    _x(i) = substr(dvterm,i);

end;

run;

Regular Contributor
Posts: 180

Re: Splitting a string

I like better the solution given by andreas_Ids than the first one presented by Ksharp. It's more clear, more simple and did not lose the dots that var1 contais to separate the sentences.

PROC Star
Posts: 7,363

Re: Splitting a string

@CTorres: But that solution will fail if a sufficiently long word exists in the last split that would cause 5 parts rather than 4 even though the string length is 800.  Below is such a case where the code would fail, along with code that wouldn't fail given a string like that:

data have (drop=i);

  var1="I am going to market to get some groceries. I will not return till the evening. If there is anyway possible for you to cook the dinner. I am sure the grocery stores may be busy due to the forecast ofhaving the storm this weekend.I hate going to the groceries.I am going to market to get some groceries. I will not return till the evening. If there is anyway possible for you to cook the dinner. I am sure the grocery stores may be busy due to the forecast of having the storm this weekend.I hate going to the groceries.I am going to market to get some groceries. I will not return till the evening. If there is anyway possible for you to cook the dinner. I am sure the grocery stores may be busy due to the forecast of having the storm this weekend.I hate going to those open allnight grocery stores.";

  do i=1 to 1000;output;end;

run;

data need (keep=_line _partnum part);

  set have;

  length part $200;

  _line+1;

  _partnum=1;

  j=1;

  do WHILE (not missing(scan(var1,j,' ')));

    if length(catx(' ',part,scan(var1,j,' '))) gt 200 or

     missing(scan(var1,j+1,' ')) then do;

      output;

      _partnum+1;

      part=scan(var1,j,' ');

    end;

    else part=catx(' ',part,scan(var1,j,' '));

    j+1;

  end;

run;

proc transpose data=need out=want (drop=_Smiley Happy prefix=part;

  var part;

  id _partnum;

  by _line;

run;

Super User
Posts: 9,682

Re: Splitting a string

BTW. my code will still keep the dots variable contains . Smiley Wink

Super Contributor
Posts: 297

Re: Splitting a string

I find the following quite useful.

%LET DIM  = ;

%LET DSID = %SYSFUNC(OPEN(HAVE,I));

%LET DIM  = %SYSFUNC(CEIL(%EVAL(%SYSFUNC(VARLEN(&DSID,%SYSFUNC(VARNUM(&DSID,STDM))))/200)));

%LET RC   = %SYSFUNC(CLOSE(&DSID));

DATA WANT ;

  SET HAVE;

  ARRAY SPLITIT $200 DVTERM1-DVTERM&DIM. _CHARACTER_;

  DO I = 1 TO &DIM.;

    SPLITIT{I} = SUBSTR(STDM,(I-1)*200+1);

  END;

RUN;

%PUT &DIM.;

Ask a Question
Discussion stats
  • 15 replies
  • 3879 views
  • 2 likes
  • 11 in conversation