BookmarkSubscribeRSS Feed
learner_sas
Quartz | Level 8

Hello, I have a very long varying character field named comment. I want break that field into 7 fields.

and I use

comment_one = substr(comment,1,240);

    comment_extraone = substr(comment,241,480);

    comment_extratwo = substr(comment,481,720);

   comment_extrathree = substr(comment,721,960);

   comment_extrafour = substr(comment,961,1200);

   comment_extrafive=substr(comment,1201,1440);

   comment_extrasix = substr(comment,1441,1680);

but extracted substring looked like "I went to football game ..............then I reached ho"

since the length of comment_one is 240 it showed "ho" as last word but it is actually "home".

I am wondering if there is a way to fix this. Index function does not work since each comment

is of different length.

Thanks

5 REPLIES 5
Fugue
Quartz | Level 8

Is there a reason why each extracted comment has to be a specified number of characters? Unless your original comment field is neatly separated by number of characters (e.g. the first sub-comment always ends at 240, the second always ends at 480, etc), the results are going to be less than desireable -- as it seems you've discovered.

If you are trying to break a large comment field into smaller fields, it might make more sense to break it up based on natural sentence endings (e.g. period, question mark, exclamation mark).

esjackso
Quartz | Level 8

So would it be fair to say you want to split the comment by punctuation? Your could do something like the following:

data want;

  length comment_1 - comment_7 $ 100;

  set have;

  comment_1 = scan(comment,1,".","RP");

  comment_2 = scan(comment,2,".","RP");

  comment_3 = scan(comment,3,".","RP");

  comment_4 = scan(comment,4,".","RP");

  comment_5 = scan(comment,5,".","RP");

  comment_6 = scan(comment,6,".","RP");

  comment_7 = scan(comment,7,".","RP");

run;

Code could be condensed with an arrary and do loop but I left it alone as to not confuse matters. The R and P are optional modifiers that trim leading and trailing spaces from the result (R) and use all punctuation as delimiters (P).

EJ

learner_sas
Quartz | Level 8

I am transferring this extracted sub strings into Crystal cell and it has 256 as maximum character length. I am trying to fit as much as possible.

Astounding
PROC Star

This reminds me of the good old days before PROC REPORT performed all the text-wrapping for you.  Here's an approach.

array extra {6} $240 comment_extra1 - comment_extra6;

dum = comment;

drop dum;

_i_=0;

if dum > ' ' then do until (dum=' ');
   do _k_=241 to 1 by -1;

      if substr(dum, _k_, 1) = ' ' then do;

         _i_ + 1;

         extra{_i_} = substr(dum, 1, _k_-1);

         dum = left(substr(dum, k));

      end;

   end;

end;

It's up to you to make sure that 6 variables is enough.  Sometimes an individual word can be quite lengthy.  In one medical dictionary that I examined, several words were each over 40 characters long.

Good luck.

TomKari
Onyx | Level 15

Here's another approach. It hasn't been exhaustively tested.

Tom

data have;
input;
LongDataRecord = _infile_;
cards;
Long text string 1
Long text string 2
Long text string 3
Long text string 4
run;

data want(keep=ShortDataRecord);
length ShortDataRecord $256;
retain prxid;

if _n_=1 then
  prxid=prxparse('/\s+\S*$/o');
set have;
length TestPart $257;

do while(^missing(LongDataRecord));
  TestPart = substr(LongDataRecord, 1, 257);
  FoundPosn = prxmatch(prxid, TestPart);

  if FoundPosn > 0 then
   do;
    LastChar = FoundPosn - 1;
    FirstChar = FoundPosn + 1;
    ShortDataRecord = substr(LongDataRecord, 1, LastChar);
    LongDataRecord = left(substrn(LongDataRecord, FirstChar));
    output;
   end;
  else
   do;
    ShortDataRecord = substr(LongDataRecord, 1, 256);
    LongDataRecord = left(substrn(257, 1));
    output;
   end;
end;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 944 views
  • 0 likes
  • 5 in conversation