Hello, I have a very long varying character field named comment. I want break that field into 7 fields.
and I use
comment_one = substr(comment,1,240);
comment_extraone = substr(comment,241,480);
comment_extratwo = substr(comment,481,720);
comment_extrathree = substr(comment,721,960);
comment_extrafour = substr(comment,961,1200);
comment_extrafive=substr(comment,1201,1440);
comment_extrasix = substr(comment,1441,1680);
but extracted substring looked like "I went to football game ..............then I reached ho"
since the length of comment_one is 240 it showed "ho" as last word but it is actually "home".
I am wondering if there is a way to fix this. Index function does not work since each comment
is of different length.
Thanks
Is there a reason why each extracted comment has to be a specified number of characters? Unless your original comment field is neatly separated by number of characters (e.g. the first sub-comment always ends at 240, the second always ends at 480, etc), the results are going to be less than desireable -- as it seems you've discovered.
If you are trying to break a large comment field into smaller fields, it might make more sense to break it up based on natural sentence endings (e.g. period, question mark, exclamation mark).
So would it be fair to say you want to split the comment by punctuation? Your could do something like the following:
data want;
length comment_1 - comment_7 $ 100;
set have;
comment_1 = scan(comment,1,".","RP");
comment_2 = scan(comment,2,".","RP");
comment_3 = scan(comment,3,".","RP");
comment_4 = scan(comment,4,".","RP");
comment_5 = scan(comment,5,".","RP");
comment_6 = scan(comment,6,".","RP");
comment_7 = scan(comment,7,".","RP");
run;
Code could be condensed with an arrary and do loop but I left it alone as to not confuse matters. The R and P are optional modifiers that trim leading and trailing spaces from the result (R) and use all punctuation as delimiters (P).
EJ
I am transferring this extracted sub strings into Crystal cell and it has 256 as maximum character length. I am trying to fit as much as possible.
This reminds me of the good old days before PROC REPORT performed all the text-wrapping for you. Here's an approach.
array extra {6} $240 comment_extra1 - comment_extra6;
dum = comment;
drop dum;
_i_=0;
if dum > ' ' then do until (dum=' ');
do _k_=241 to 1 by -1;
if substr(dum, _k_, 1) = ' ' then do;
_i_ + 1;
extra{_i_} = substr(dum, 1, _k_-1);
dum = left(substr(dum, k));
end;
end;
end;
It's up to you to make sure that 6 variables is enough. Sometimes an individual word can be quite lengthy. In one medical dictionary that I examined, several words were each over 40 characters long.
Good luck.
Here's another approach. It hasn't been exhaustively tested.
Tom
data have;
input;
LongDataRecord = _infile_;
cards;
Long text string 1
Long text string 2
Long text string 3
Long text string 4
run;
data want(keep=ShortDataRecord);
length ShortDataRecord $256;
retain prxid;
if _n_=1 then
prxid=prxparse('/\s+\S*$/o');
set have;
length TestPart $257;
do while(^missing(LongDataRecord));
TestPart = substr(LongDataRecord, 1, 257);
FoundPosn = prxmatch(prxid, TestPart);
if FoundPosn > 0 then
do;
LastChar = FoundPosn - 1;
FirstChar = FoundPosn + 1;
ShortDataRecord = substr(LongDataRecord, 1, LastChar);
LongDataRecord = left(substrn(LongDataRecord, FirstChar));
output;
end;
else
do;
ShortDataRecord = substr(LongDataRecord, 1, 256);
LongDataRecord = left(substrn(257, 1));
output;
end;
end;
run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.