Solved: Re: splitting text which is more than 200char to multiple variables

ysantosh18 · Posted 02-16-2017 09:37 AM

Hi,

Dear member

Can you please let me know How to split a content more than 200 chr. using no macros, i mean with a datasetp.

For eg. if a content given is more than 500chars. in a datastep should split according to it's size(200chars.) and without truncating words.

eg. var = '

But multiple notes over the past three weeks have undermined the shock last November, wiping out 86% of the money in circulation in a cash-driven economy.

The demonetisation exercise has been called a watershed for a country saddled with counterfeiters pushing millions of fake notes into the Indian economy from neighbouring Pakistan, Bangladesh and Nepal. Terrorists and governments hostile to India use the bogus cash to weaken the economy.

The latest attempts to restart the vicious cycle after the notes ban have set off alarm bells in the security establishment.

'

to var, var1 and var2, excess content should split from var to var1 and var2.

Thanks

-santosh

Astounding · Posted 02-16-2017 09:46 AM

Here are some DATA step statements to accomplish this:

length var1 $ 500 var2 $ 200;

if length(var) >= 200 then do;

do i=201 to 1 by -1 until (var1 > ' ');

if substr(var, i, 1)=' ' then do;

var1 = left(substr(var, i+1));

var = substr(var, 1, i);

end;

if length(var1) >= 200 then do;

do i=201 to 1 by -1 until (var2 > ' ');

if substr(var1, i, 1)=' ' then do;

var2 = left(substr(var1, i+1));

var1 = substr(var1, 1, i);

end;

It's untested, but should be fine as is.

View solution in original post

Astounding · Posted 02-16-2017 09:46 AM

Here are some DATA step statements to accomplish this:

length var1 $ 500 var2 $ 200;

if length(var) >= 200 then do;

do i=201 to 1 by -1 until (var1 > ' ');

if substr(var, i, 1)=' ' then do;

var1 = left(substr(var, i+1));

var = substr(var, 1, i);

end;

if length(var1) >= 200 then do;

do i=201 to 1 by -1 until (var2 > ' ');

if substr(var1, i, 1)=' ' then do;

var2 = left(substr(var1, i+1));

var1 = substr(var1, 1, i);

end;

It's untested, but should be fine as is.

ysantosh18 · Posted 02-16-2017 10:37 AM

thank you, let me check, will get back to you

thanks again
santosh

##- Please type your reply above this line. Simple formatting, no
attachments. -##

DanielSantos · Posted 02-16-2017 10:32 AM

Hi.

The following may work for you.

%let NUM_OF_STR=5;
%let CHR_PER_STR=40;
data _null_;
array V {*} $ &CHR_PER_STR V1-V&NUM_OF_STR;
S='the demonetisation exercise has been called a watershed for a country saddled with counterfeiters pushing';
_I=1; _J=1;
do until (scan(S,_I,' ') eq '');
   _W=scan(S,_I,' ');
   if lengthn(V[_J])+lengthn(_W) > lengthc(V[_J]) then _J+1;
   V[_J]=strip(V[_J])!!' '!!_W;
   _I+1;
end;
run;

Nothing fancy, it will split the text word by word and ditribute it by the number of desired variables.

This will also remove consecutive blanks between words if any.

Daniel Santos @ www.cgd.pt

ysantosh18 · Posted 02-17-2017 06:03 AM

thank you, looks like you are genius in SAS, but am a beginner, and here i
don't know the number of strings to split, It should depend on only length

thanks
Santosh

DanielSantos · Posted 02-17-2017 06:40 AM

Oh, I'm surely not a genius, but thanks for the compliment.

And no problem with that. If you want to set only CHR_PER_STR and have NUM_OF_STR calculated dynamically, instead of:

%let NUM_OF_STR=5;
%let CHR_PER_STR=40;

use:

%let CHR_PER_STR=200;
data _null_;
     set HAVE;
     call symputn('NUM_OF_STR',(lengthc(S)+&CHR_PER_STR-1)/&CHR_PER_STR);
run;

This will set NUM_OF_STR according to S's size and the specified CHR_PER_STR.

Daniel Santos @ www.cgd.pt

art297 · Posted 02-16-2017 10:47 AM

The two suggestions offered may work (I don't have time to test them) and, if so, you can disregard this one. SAS notes has published a macro that splits lines of text (see: http://support.sas.com/kb/30/649.html ). The following is a slight bastardization of that macro that meets your current need. You should note that your example text line is longer than 500 characters.

data have;
  informat var $char600.;
  infile 'c:\art\testdata.txt' lrecl=600;
  input var &;
;

data want (keep=var:);
  array var(3) $200.;
  length textin $600;
  set have (rename=var=_var);
  textin=_var;
  i=1;
  if lengthn( textin ) <= 200 then var(i)=textin;
  else do while( lengthn( textin ) > 200 ) ;
    var(i) = reverse( substr( textin, 1, 200 )) ;
    ndx = index( var(i), ' ' ) ;
    if ndx then do ;
      var(i) = reverse( substr( var(i), ndx + 1 )) ;
      i+1;
      textin = substr( textin, 200 - ndx + 1 ) ;
    end ;
    else do;
      var(i) = substr(textin,1,200);
      i+1;
      textin = substr(textin,200+1);
    end;
    if lengthn( textin ) le 200 then var(i)=textin;
  end ;
run;

HTH,

Art, CEO, AnalystFinder.com

Ksharp · Posted 02-16-2017 10:30 PM

Search it firstly at this community. There are already a couple of questions have been answered.

options noquotelenmax;
data have;
id=1;
 var = '
But multiple notes over the past three weeks have undermined the shock last November, wiping out 86% of the money in circulation in a cash-driven economy.
The demonetisation exercise has been called a watershed for a country saddled with counterfeiters pushing millions of fake notes into the Indian economy from neighbouring Pakistan, Bangladesh and Nepal. Terrorists and governments hostile to India use the bogus cash to weaken the economy.
The latest attempts to restart the vicious cycle after the notes ban have set off alarm bells in the security establishment.
';
run;
data temp;
 set have;
 do i=1 to countw(var,' ');
  word=scan(var,i,' ');output;
 end;
 drop i var;
run;
data temp;
 set temp;
 by id;
 if first.id then sum=0;
 sum+(length(word)+1);
 if sum gt 200 then do;group+1;sum=length(word);end;
 drop sum ; 
run;
data x;
 length x $ 200;
 do until(last.group);
  set temp;
  by id group;
  x=catx(' ',x,word);
 end;
run;
proc transpose data=x out=want;
by id ;
var x;
run;
proc print;run;

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away