DATA Step, Macro, Functions and more

splitting text which is more than 200char to multiple variables

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 14
Accepted Solution

splitting text which is more than 200char to multiple variables

Hi,

Dear member

 

Can you please let me know How to split a content more than 200 chr. using no macros, i mean with a datasetp.

For eg. if a content given is more than 500chars. in a datastep should split according to it's size(200chars.) and without truncating  words.

eg. var = '

But multiple notes over the past three weeks have undermined the shock last November, wiping out 86% of the money in circulation in a cash-driven economy.

The demonetisation exercise has been called a watershed for a country saddled with counterfeiters pushing millions of fake notes into the Indian economy from neighbouring Pakistan, Bangladesh and Nepal. Terrorists and governments hostile to India use the bogus cash to weaken the economy.

The latest attempts to restart the vicious cycle after the notes ban have set off alarm bells in the security establishment.

'

to var, var1 and var2, excess content should split from var to var1 and var2.

 

Thanks

-santosh


Accepted Solutions
Solution
‎02-23-2017 09:52 AM
Super User
Posts: 5,500

Re: splitting text which is more than 200char to multiple variables

Posted in reply to ysantosh18

Here are some DATA step statements to accomplish this:

 

length var1 $ 500  var2 $ 200;

if length(var) >= 200 then do;

   do i=201 to 1 by -1 until (var1 > ' ');

      if substr(var, i, 1)=' ' then do;

         var1 = left(substr(var, i+1));

         var = substr(var, 1, i);

      end;

   end;

end;

if length(var1) >= 200 then do;

   do i=201 to 1 by -1 until (var2 > ' ');

      if substr(var1, i, 1)=' ' then do;

         var2 = left(substr(var1, i+1));

         var1 = substr(var1, 1, i);

      end;

   end;

end;

 

It's untested, but should be fine as is. 

View solution in original post


All Replies
Solution
‎02-23-2017 09:52 AM
Super User
Posts: 5,500

Re: splitting text which is more than 200char to multiple variables

Posted in reply to ysantosh18

Here are some DATA step statements to accomplish this:

 

length var1 $ 500  var2 $ 200;

if length(var) >= 200 then do;

   do i=201 to 1 by -1 until (var1 > ' ');

      if substr(var, i, 1)=' ' then do;

         var1 = left(substr(var, i+1));

         var = substr(var, 1, i);

      end;

   end;

end;

if length(var1) >= 200 then do;

   do i=201 to 1 by -1 until (var2 > ' ');

      if substr(var1, i, 1)=' ' then do;

         var2 = left(substr(var1, i+1));

         var1 = substr(var1, 1, i);

      end;

   end;

end;

 

It's untested, but should be fine as is. 

Occasional Contributor
Posts: 14

Re: splitting text which is more than 200char to multiple variables

Posted in reply to Astounding
thank you, let me check, will get back to you

thanks again
santosh

##- Please type your reply above this line. Simple formatting, no
attachments. -##
Super Contributor
Posts: 474

Re: splitting text which is more than 200char to multiple variables

[ Edited ]
Posted in reply to ysantosh18

Hi.

 

The following may work for you.

 

%let NUM_OF_STR=5;
%let CHR_PER_STR=40;
data _null_;
array V {*} $ &CHR_PER_STR V1-V&NUM_OF_STR;
S='the demonetisation exercise has been called a watershed for a country saddled with counterfeiters pushing';
_I=1; _J=1;
do until (scan(S,_I,' ') eq '');
_W=scan(S,_I,' ');
if lengthn(V[_J])+lengthn(_W) > lengthc(V[_J]) then _J+1;
V[_J]=strip(V[_J])!!' '!!_W;
_I+1;
end;
run;

 

Nothing fancy, it will split the text word by word and ditribute it by the number of desired variables.

 

This will also remove consecutive blanks between words if any.

 

Daniel Santos @ www.cgd.pt

Occasional Contributor
Posts: 14

Re: splitting text which is more than 200char to multiple variables

Posted in reply to DanielSantos
thank you, looks like you are genius in SAS, but am a beginner, and here i
don't know the number of strings to split, It should depend on only length

thanks
Santosh
Super Contributor
Posts: 474

Re: splitting text which is more than 200char to multiple variables

[ Edited ]
Posted in reply to ysantosh18

Oh, I'm surely not a genius, but thanks for the compliment.

 

And no problem with that. If you want to set only CHR_PER_STR and have NUM_OF_STR calculated dynamically, instead of:

 

%let NUM_OF_STR=5;
%let CHR_PER_STR=40;

 

use:

 

%let CHR_PER_STR=200;
data _null_;
     set HAVE;
     call symputn('NUM_OF_STR',(lengthc(S)+&CHR_PER_STR-1)/&CHR_PER_STR);
run;

 

This will set NUM_OF_STR according to S's size and the specified CHR_PER_STR.

 

Daniel Santos @ www.cgd.pt

 

PROC Star
Posts: 7,469

Re: splitting text which is more than 200char to multiple variables

Posted in reply to ysantosh18

The two suggestions offered may work (I don't have time to test them) and, if so, you can disregard this one. SAS notes has published a macro that splits lines of text (see: http://support.sas.com/kb/30/649.html ). The following is a slight bastardization of that macro that meets your current need. You should note that your example text line is longer than 500 characters.

 

data have;
  informat var $char600.;
  infile 'c:\art\testdata.txt' lrecl=600;
  input var &;
;

data want (keep=var:);
  array var(3) $200.;
  length textin $600;
  set have (rename=var=_var);
  textin=_var;
  i=1;
  if lengthn( textin ) <= 200 then var(i)=textin;
  else do while( lengthn( textin ) > 200 ) ;
    var(i) = reverse( substr( textin, 1, 200 )) ;
    ndx = index( var(i), ' ' ) ;
    if ndx then do ;
      var(i) = reverse( substr( var(i), ndx + 1 )) ;
      i+1;
      textin = substr( textin, 200 - ndx + 1 ) ;
    end ;
    else do;
      var(i) = substr(textin,1,200);
      i+1;
      textin = substr(textin,200+1);
    end;
    if lengthn( textin ) le 200 then var(i)=textin;
  end ;
run;

HTH,

Art, CEO, AnalystFinder.com

 

Super User
Posts: 10,023

Re: splitting text which is more than 200char to multiple variables

[ Edited ]
Posted in reply to ysantosh18

Search it firstly at this community. There are already a couple of questions have been answered.

 

options noquotelenmax;
data have;
id=1;
 var = '
But multiple notes over the past three weeks have undermined the shock last November, wiping out 86% of the money in circulation in a cash-driven economy.
The demonetisation exercise has been called a watershed for a country saddled with counterfeiters pushing millions of fake notes into the Indian economy from neighbouring Pakistan, Bangladesh and Nepal. Terrorists and governments hostile to India use the bogus cash to weaken the economy.
The latest attempts to restart the vicious cycle after the notes ban have set off alarm bells in the security establishment.
';
run;
data temp;
 set have;
 do i=1 to countw(var,' ');
  word=scan(var,i,' ');output;
 end;
 drop i var;
run;
data temp;
 set temp;
 by id;
 if first.id then sum=0;
 sum+(length(word)+1);
 if sum gt 200 then do;group+1;sum=length(word);end;
 drop sum ; 
run;
data x;
 length x $ 200;
 do until(last.group);
  set temp;
  by id group;
  x=catx(' ',x,word);
 end;
run;
proc transpose data=x out=want;
by id ;
var x;
run;
proc print;run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 265 views
  • 6 likes
  • 5 in conversation