BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
jenim514
Pyrite | Level 9
Hello, 
I have this code and it works ok, but I have text  with words that are separated by space AND semicolons so that no column is greater than 200.  So, if either is space or semicolon  encountered approaching 200, I would like ok to split while keeping the integrity of the data (i.e.  keeping spaces and semicolons in the data in all columns)
 
Here is an example of text string:
BUPLEURUM SPP. ROOT;CINNAMOMUM CASSIA TWIG;FOSSILIA OSSIS MASTODI;OYSTER SHELL;PANAX GINSENG ROOT;PINELLIA TERNATA TUBER;PORIA COCOS SCLEROTIUM;RHEUM SPP. ROOT WITH RHIZOME;SCUTELLARIA BAICALENSIS
 
data tmp (drop= i word); set db.cm (keep=subject cmtrt_product recordid);
length nystr $200;
 
cmtrt_product = compbl(cmtrt_product);
do i = 1 to countw(cmtrt_product,' ');
word = scan(cmtrt_product,i,' ');
if length(nystr) + length(word) + 1 > 200 then do;
output;
nystr = word;
end;
else nystr = catx(' ',nystr,scan(cmtrt_product,i,' '));
end;
if nystr ne '' then output;
run;
 
proc sort data=tmp; by subject recordid;
proc transpose data=tmp out=want (drop=_name_); 
by subject recordid ;
var nystr;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Instead of building NYSTR word by word, why not examine a copy of the source string at character 200 and work backwards until a blank or semi-colon character is found.  Copy character 1 through the located character to NYSTR.  Then move the rest of the copy-of-source-string to start in column 1, and repeat:

 

data tmp (drop=_:);
  set db.cm (keep=subject cmtrt_product recordid);
  _string=cmtrt_product;

  length nystr $200;
  do while (_string^=' ');
    do _col=200 by -1 until (char(_string,_col) in (' ',';'));
    end;
    nystr=substr(_string,1,_col);
    output;
    _string=substr(_string,_col+1);
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
ballardw
Super User

You really should show us what you expect for the output.

 

I would actually expect the use of the explicit delimiters you talk about in the SCAN and COUNTW functions. You only show using space as a delimiter. SCAN and COUNTW can use multiple delimiters such as this to use space and semicolon

 

do i = 1 to countw(cmtrt_product,' ;');

I might suspect that it would make more sense to just use the semicolon though so that the concepts are together in the output instead of split between multiple observations.

mkeintz
PROC Star

Instead of building NYSTR word by word, why not examine a copy of the source string at character 200 and work backwards until a blank or semi-colon character is found.  Copy character 1 through the located character to NYSTR.  Then move the rest of the copy-of-source-string to start in column 1, and repeat:

 

data tmp (drop=_:);
  set db.cm (keep=subject cmtrt_product recordid);
  _string=cmtrt_product;

  length nystr $200;
  do while (_string^=' ');
    do _col=200 by -1 until (char(_string,_col) in (' ',';'));
    end;
    nystr=substr(_string,1,_col);
    output;
    _string=substr(_string,_col+1);
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 617 views
  • 0 likes
  • 3 in conversation