BookmarkSubscribeRSS Feed
pdhokriya
Pyrite | Level 9

If dataset has larger than 200 characters then 200 cutting off the setting at whole words with the first 200 characters going into text, the second set of 200 characters going into text1 and so fourth. Continue until all characters have been processed. 

 

Query :

 

data have;
str = "Subject 1013 at v6 26AUG2020 was not reconsented with Main ICF (c ) v4.0 14AUG2020, IRB approval 19AUG2020, xxxxx Approval 20AUG2020. Subject 1013 was reconsented at V9 18NOV2020 with Main ICF (c ) v4.0 14AUG2020, IRB approval 19AUG2020, xxxxx Approval 20AUG2020.";output;
str= " a b c" ; output;
run;

 

So the output would be : 

 

pdhokriya_1-1626267621470.png

 

text text1 text2
Subject 1013 at v6 26AUG2020 was not reconsented with Main ICF (c ) v4.0 14AUG2020, IRB approval 19AUG2020, xxxxx Approval 20AUG2020. Subject 1013 was reconsented at V9 18NOV2020 with Main ICF (c ) v4.0 14AUG2020, IRB approval 19AUG2020, Lilly Approval 20AUG2020.  
a b c    
6 REPLIES 6
Kurt_Bremser
Super User

I recommend to not limit yourself by a fixed number of text pieces, use a vertical layout instead:

data want;
length text $200;
set have;
do until (lengthn(str) = 0);
  text = substr(str,1,200);
  call scan(text,-1,pos,len," ");
  if pos + len lt 200 then pos = pos + len;
  str = substr(str,pos);
  text = substr(text,1,pos);
  output;
end;
keep text;
run;
pdhokriya
Pyrite | Level 9

Observation 2 should be in another variable ( text1).

Reeza
Super User

You need to predefine the variables ahead of time so you that means before you start your step you need to know the maximum number of variables you'll need for each string. Instead of that, a more dynamic method is to do it in a long format, so each observation to each line and then transpose it afterwards to a wide format if necessary. It's usually more efficient to work in the long format generally but it takes some training and experience to get there.

 


@pdhokriya wrote:

Observation 2 should be in another variable ( text1).


Transposing data tutorials:
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/

https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/

Wide to Long:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-wide-to-long-using-proc-transpose/

https://stats.idre.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/

And sometimes a double transpose is needed for extra wide data sets:
https://gist.github.com/statgeek/2321b6f62ab78d5bf2b0a5a8626bd7cd

Kurt_Bremser
Super User

@pdhokriya wrote:

Observation 2 should be in another variable ( text1).


VERY STRONGLY advise against such coding techniques. They only make your further data handling complicated and serve no purpose in about 99.95 % of cases, especially as observed here on the communities. Cleanse your brain of the poisons of Excel thinking.

Please give me a valid reason why you want a limited set of text variables that you might run out of if str becomes longer than expected.

Tom
Super User Tom
Super User

It is not hard to adapt @Kurt_Bremser solution to generating multiple variables instead.

Calculate the number of variables you will need by dividing the max length of your source variable by 200 and adding at least one more to handle the inefficiency caused by not splitting words in the middle.  So for example if STR is length 800 you might use 5 as the number of new TEXT variables to create.

data want;
  set have;
  array text [5] $200;
  do nwords=1 to dim(text) until (lengthn(str) = 0);
    text[nwords] = str;
    call scan(text[nwords],-1,pos,len," ");
    if pos + len lt 200 then pos = pos + len;
    str = substr(str,pos);
    text[nwords] = substr(text[nwords],1,pos);
  end;
run;

If you end up with some observations where STR is not empty then re-run with a larger array.

ballardw
Super User

Can you describe what you will do with the resulting data, and why 200 characters as a magic number is desired?

If you have an arbitrary 200 character split then you are extremely likely to have words split in the middle with bits in two separate variables. Which means that finding values or actual use of the reduced variables seems hardly worth it.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 4257 views
  • 3 likes
  • 5 in conversation