DATA Step, Macro, Functions and more

Address File Creation - Split string into multiple columns with a max length per column

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 15
Accepted Solution

Address File Creation - Split string into multiple columns with a max length per column

Hi Guys, 

 

I have a file with a long address string called "Address". I have to create a file with a max of 3 address columns, let's call them address 1-2-3. Each column has a max length of 36 characters and I don't want to break up part of a word to move over to the next line.

 

I need something that will break up the string into columns of the desired length taking into account complete words. In other words not split a word in half to create a new line, instead if the particular word would make it go over 36 characters to then move that word on to the next column and so forth.

 

For example, if this were the address line:

 

This is an example of a really long address string which needs to be split into columns of 36 characters

 

Into something like:

 

This is an example of a really long address string which needs to be  split into columns of 36 characters
35 33 35

 

Any suggestions would be much appreciated. I looked at some of the samples posted but they all seem to revolve around delimited long strings.

 

Thanks!


Accepted Solutions
Solution
‎02-22-2017 10:02 AM
Super User
Super User
Posts: 7,955

Re: Address File Creation - Split string into multiple columns with a max length per column

Note that this code assumes the string is no longer than 3 * 36:

data want;
  s="This is an example of a really long address string which needs to be split into columns of 36 characters";
  array cols{3} $36;
  c=1;
  do i=1 to countw(s," ");
    if lengthn(catx(" ",cols{c},scan(s,i," "))) <= 36 then cols{c}=catx(" ",cols{c},scan(s,i," "));
    else do;
      c=c+1;
      cols{c}=scan(s,i," ");
    end;
  end;
run;

View solution in original post


All Replies
Solution
‎02-22-2017 10:02 AM
Super User
Super User
Posts: 7,955

Re: Address File Creation - Split string into multiple columns with a max length per column

Note that this code assumes the string is no longer than 3 * 36:

data want;
  s="This is an example of a really long address string which needs to be split into columns of 36 characters";
  array cols{3} $36;
  c=1;
  do i=1 to countw(s," ");
    if lengthn(catx(" ",cols{c},scan(s,i," "))) <= 36 then cols{c}=catx(" ",cols{c},scan(s,i," "));
    else do;
      c=c+1;
      cols{c}=scan(s,i," ");
    end;
  end;
run;
Occasional Contributor
Posts: 15

Re: Address File Creation - Split string into multiple columns with a max length per column

Thank you very much for the quick reply! this worked perfectly!
Super User
Posts: 7,782

Re: Address File Creation - Split string into multiple columns with a max length per column

A slightly different (and maybe less elegant) take from me:

data want;
set have;
length address1-address3 $36;
array adr {3} address1-address3;
index = 1;
do count = 1 to countw(address);
  if length(adr{index}) + 1 + length(scan(address,count)) > 36
  then do;
    index = index + 1;
    adr{index} = scan(address,count);
  end;
  else adr{index} = catx(' ',adr{index},scan(address,count));
end;
drop count index;
run;

Didn't want to discard it just because @RW9 beat me to it Smiley Wink

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 15

Re: Address File Creation - Split string into multiple columns with a max length per column

Posted in reply to KurtBremser
Thank you very much Kurt, this also worked beautifully! again thanks for your help!
Valued Guide
Posts: 505

Re: Address File Creation - Split string into multiple columns with a max length per column

If you use the 'old text editor' command line

 

000001  This is a very long string more than 36 characters. Split to 36 strings. More useless text.

 

Just type TF36 in the prefix area

 

TF3601 This is a very long string more than 36 characters. Split to 36 strings. More useless text.

/* T0100520 Hits #24 Optimum splits for long text strings, datastep linear regression and file attributes

Other hits on the end of this email

HAVE

All randomized subjects who receive at least one dose of study drug will be
considered evaluable for safety. All adverse events will be included in the analysis of safety after
randomization and prior to a subject entering the followup phase. The definition of the followup
phase (above) describes those adverse events that will be collected and analyzed during the followup pha
The Full Analysis Set and the Safety Subset
will each include all randomised
subjects who receive at least one dose of study
drug.
All subjects who have
signed informed consent before invasive, protocol specified procedures
(including study specific blood draws for laboratory testing and study chemotherapy) and have
received at least one dose of study  drug will be included in the safety evaluable set.  These subjects
will be analyzed according to the treatment they actually received. Summaries of safety data for the
treatment period will be provided on this safety evaluable set.');


WANT

Obs                      STR

 1   All randomized subjects who receive at
 2   least one dose of study drug will be
 3   considered evaluable for safety. All
 4   adverse events will be included in the
 5   analysis of safety after randomization
 6   and prior to a subject entering the
 7   followup phase. The definition of the
 8   followup phase (above) describes those
 9   adverse events that will be collected
10   and analyzed during the followup phase.
11   The Full Analysis Set and the Safety
12   Subset will each include all randomised
13   subjects who receive at least one dose
14   of study drug. All subjects who have
15   signed informed consent before invasive,
16   protocol specified procedures (including
17   study specific blood draws for
18   laboratory testing and study
19   chemotherapy) and have received at least
20   one dose of study drug will be included
21   in the safety evaluable set. These
22   subjects will be analyzed according
23   to the treatment they actually received.
24   Summaries of safety data for the
25   treatment period will be provided
26   on this safety evaluable set.

WORKING CODE
  proc template;
    ...
    flow=on;
    width=40;   * this is whre we set length;
    just=l;

FULL SOLUTION

/* T00388X TECHNIQUE FOR WRAPPING A LONGE STRING ie 32,000 CHAR STRING INTO MUTIPLE 40 BYTE STRINGS WITH NICE SPLITS

options noquotelenmax;
data rpt;
  length lyn $32000;
  lyn=compbl('
        All randomized subjects who receive at least one dose of study drug will be
        considered evaluable for safety. All adverse events will be included in the analysis of safety after
        randomization and prior to a subject entering the followup phase. The definition of the followup
        phase (above) describes those adverse events that will be collected and analyzed during the followup phase.
        The Full Analysis Set and the Safety Subset
        will each include all randomised
        subjects who receive at least one dose of study
        drug.
        All subjects who have
        signed informed consent before invasive, protocol specified procedures
        (including study specific blood draws for laboratory testing and study chemotherapy) and have
        received at least one dose of study  drug will be included in the safety evaluable set.  These subjects
        will be analyzed according to the treatment they actually received. Summaries of safety data for the
        treatment period will be provided on this safety evaluable set.');
output;
run;

libname odslib v9 "%sysfunc(pathname(work))";
ods path odslib.templates sashelp.tmplmst work.templates(update);
/* put the template in work.templates */
proc template;
    define table rolchr;
    classlevels=on;
    order_data=on;
    col_space_max=1;
    col_space_min=1;
    define column rol;
    generic=on;
    blank_dups=on;
    flow=on;
    width=40;   * this is whre we set length;
    just=l;
    header=' ';
    end;
    end;
run;

options nodate nonumber ps=5000 ls=140;
title;footnote;
ods listing file="d:/txt/rolchr.txt";
data _null_;
    retain cnt -1;
    set rpt end=dne;
    file print ods=(template='rolchr' columns=(rol=lyn (generic=on)));
    put _ods_;
run;
quit;
ods path close;
ods listing close;
ods listing;

data str;
 infile "d:/txt/rolchr.txt";
 input str $40.;
run;
proc print data=str;
run;
ods path reset;

 

000001 This is a very long string more than
000002 36 characters. Split to 36 strings.
000003 More useless text.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 149 views
  • 4 likes
  • 4 in conversation