BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
A_Kh
Lapis Lazuli | Level 10

Hi, 

I need your help in resolving following problem:

In my monthly report there is COMMENT variable that might have max allowable number of characters. I need to parse it out into multiple COMMENT vars that contain 200 characters. Every time i could have different number of characters. Eg: if the length of COMMENT is 300 then COMMENT1 and COMMENT2 need to be created, else if if the length of COMMENT is 500 then COMMENT1, COMMENT2 and COMMENT3 need to be created.. etc. 

I was trying to conditionally create new vars using <if then> + substr, but somehow it did not work.

data report;

set rawdata;

varlength= length (comment);

if varlength >200 then  comment1= substr(comment, 1, 200);

if varlength >400 then  comment2= substr(comment, 201, 200);

etc...

run; 

 

But I need to put this into loop I think.. 

 

Please advise on this and share your experience! 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Athenkosi
Obsidian | Level 7

Try this.

 

proc sql;
	select ceil(max(length(model))/200) into :num trimmed
	from rawdata;
quit;

data report;
	set rawdata;
	array COMS{*} $200 COMMMENT1-COMMMENT&num;
	
	do i = 1 to ceil(max(length(comment))/200);
		COMS{i} = substr(comment,200*i - 199,200);
	end;
run;

View solution in original post

12 REPLIES 12
ballardw
Super User

You don't get variables "conditionally" in a SAS data set. Every observation will have ALL the variables.

 

One question: why? What will you do with that plethora of variables?

Your current approach would cut words in the middle so the content apparently is not of actual concern.

 

Do you know the maximum length that will ever occur in your "comment"? If not you may have to consider comment to be 32K long which is going to be a LOT of variables.

A_Kh
Lapis Lazuli | Level 10

This is an issue, there is not set max length. But my expectation is not that long, up to 1000 characters for now. 

ballardw
Super User

@A_Kh wrote:

This is an issue, there is not set max length. But my expectation is not that long, up to 1000 characters for now. 


I strongly suggest that you contact the provider of the files you are using to see what the maximum length can ever be. If you get a solution that works for 1000 characters and then the next file as 2000 characters you end up having to change code and data structure. And again when another file has 3000 characters.

SAS currently has a limit of 32,767 characters in a single variable. You need to know if the text can exceed that because then you have a much more complex issue. And if you get to 32k characters you would have 164 variables needed.

 

You still have not provided a reason for splitting up the text.

You can use Proc ODSTEXT to display long blocks of text. Below I create a set with three dummy comments consisting of repeating the same word 100 times and then display the values with Proc ODSTEXT.

data junk;
   input word $;
   length x $ 1000;
   x=repeat(word,100);
datalines;
abc
john
mary
;
run;

proc odstext data=junk;
 p x;
run;

This proc will break the display at word boundaries. The ODS destination would determine length for the paragraph.

In this procedure P is a paragraph start, then you can have literal text or an expression involving variable(s). Other options to control appearance are available.

 

A_Kh
Lapis Lazuli | Level 10

Hi again, 
Thank you so much for taking the time to solve my problem, I greatly appreciate it. 

The idea is I get the COMMENT from a database that out of my control. No way to limit its length. But the expectation is around up to 1000 character per report. It could be slightly more than 1000 but not too much, like 2k. 

Per my client's data standards COMMENT length is set $200 which causes data truncation if raw data COMMENT contains more than 200 characters. in order to avoid data truncation I will need to create extra comment fields such as COMMENT1, COMMENT2 etc with the same length $200 to include all available data. As we don't now the exact max number, I was suggesting to conditionally create extra COMMENTS based on raw data LENGTH. This way my program would take care of all data and create as many new VARS as needed every time I run the report. 

 

So far I could figure out a macro code to conditionally create new VARS to cover up to 1000 characters.  I could extend it's coverage up to 2k as well, but my codes get very long and ugly :(. 

My idea was developing kind of LOOP that contains less code and can create enough vars when needed. 
Below is my temporary solution for your review. And please give your opinion about this code and how to improve it.

Thank you!

**********************************************************************************************************;

*Testing a macro code to conditionally create new variable;

 

data test;
      length comment $ 1000;
     comment= "Major adverse cardiovascular events (MACE, or major adverse cardiac events) is a composite endpoint frequently used in cardiovascular research,[1][2] comparable to the composite endpoint all-cause mortality. Despite widespread use of the term in clinical trials, the definitions of MACE can differ, which makes comparison of similar studies difficult.[3] The so-calledclassical 3-point MACE is defined as a composite of nonfatal stroke, nonfatal myocardial infarction, and cardiovascular death.[4][5] But another study defines MACE as CVD events, admission for HF (Heart Failure), ischemic cardiovascular [CV] events, cardiac death, or MACE.[6] Yet another study defined MACE as CV death, hospitalization for HF, or myocardial infarction (MI)";
run;

 

data one;
      set test;
      varlgth= length(comment);
     call symputx("varlth", varlgth);
run;

 

%macro split;
        data two;
           set one;
         %if &varlth gt 200 %then %do;
              length COM1 $ 200;
              COM1= substr(comment, 1, 200);
         %end;
        %else %do;
                length COM1 $ 200;
                COM1= comment;
        %end;

        %if &varlth gt 400 %then %do;
                length COM2 $ 200;
               COM2= substr(comment, 201, 200);
       %end;
       %else %if &varlth <= 400 and &varlth > 200 %then %do;
                 length COM2 $ 200;
                 COM2=substr(comment, 201);
        %end;

        %if &varlth gt 600 %then %do;
                length COM3 $ 200;
                COM3= substr(comment, 401, 200);
         %end;
         %else %if &varlth <= 600 and &varlth > 400 %then %do;
               length COM3 $ 200;
               COM3=substr(comment, 401);
         %end;

         %if &varlth gt 800 %then %do;
                   length COM4 $ 200;
                   COM4= substr(comment, 601, 200);
         %end;
         %else %if &varlth <= 800 and &varlth > 600 %then %do;
                   length COM4 $ 200;
                   COM4=substr(comment, 601);
          %end;

          %if &varlth gt 1000 %then %do;
                  length COM5 $ 200;
                  COM5= substr(comment, 801, 200);
           %end;
           %else %if &varlth <= 1000 and &varlth > 800 %then %do;
                  length COM5 $ 200;
                 COM5=substr(comment, 801);
            %end;

      run;
%mend split;

%split;

**********************************************************************************************************;

Astounding
PROC Star

Perhaps you should take a look at PROC REPORT and its WRAP option.  It can automatically take a long string, split it up for you (without cutting a word in the middle) and wrap the text when it prints.  If you need this for some other purpose other than creating a report, it's best to let us know.

A_Kh
Lapis Lazuli | Level 10

Good to know this technique! Do you know if this affects Variable length since it is splitting the string by whole words? And I think we can save the output as a dataset? I did not use proc report too much.. 

Astounding
PROC Star

As far as I know, the splitting is temporary, for reporting purposes only.  The original data set remains in its original form.

 

I've never played with creating an output data set from PROC REPORT.

 

If you really must split the variable (and I am not yet convinced that you do), there are plenty of DATA step techniques that will do so without splitting a word in the middle.

Athenkosi
Obsidian | Level 7

Try this.

 

proc sql;
	select ceil(max(length(model))/200) into :num trimmed
	from rawdata;
quit;

data report;
	set rawdata;
	array COMS{*} $200 COMMMENT1-COMMMENT&num;
	
	do i = 1 to ceil(max(length(comment))/200);
		COMS{i} = substr(comment,200*i - 199,200);
	end;
run;
A_Kh
Lapis Lazuli | Level 10

Hi @Athenkosi , 

 

That's exactly what I was looking for.. Thank you so much! It worked! Below I'm enclosing the tested program. 

A_Kh
Lapis Lazuli | Level 10

Hi again, 

 

I initially accepted your code as a solution. It worked perfectly for the case when varying number of COMMENTS is used. 

Now I have had a change to this data standards and number of comments set to 10. I applied the code to the new logic, it is correctly splitting the string into multiple comments, but generating error log for substr function as following:

 

.............long string above.....................................................i=2 _ERROR_=1 _N_=8
NOTE: Invalid third argument to function SUBSTR at line 906 column 29.
NOTE: Invalid third argument to function SUBSTR at line 911 column 29.
NOTE: Invalid third argument to function SUBSTR at line 916 column 29.
NOTE: Invalid third argument to function SUBSTR at line 921 column 29.
NOTE: Invalid third argument to function SUBSTR at line 926 column 29.
NOTE: Invalid third argument to function SUBSTR at line 931 column 29.

 

Would you please advise how to get rid of this error log? 

 

Thank you again, and I really appreciate your help!

Athenkosi
Obsidian | Level 7

Hello,

I'll help, may you please clarify when you say the number of comments has been set to 10 what are you referring to?

A_Kh
Lapis Lazuli | Level 10

Hi again, 

 

Initially number of COMMENT variables was indefinite and loop code supposed to create the appropriate number of COMMENT variables based on data length. Now there are only 10 COMMENT fields, instead of indefinite number. That's what I was referring to. 

 

Anyway, this problem has been solved. I got rid of error NOTE in the log USING SUBSTRN function instead of SUBSTR. 

 

Thank you for your time! 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1153 views
  • 6 likes
  • 4 in conversation