DATA Step, Macro, Functions and more

How to read variables containing characters longer than 200 and get the output without truncation of

Reply
Super Contributor
Posts: 270

How to read variables containing characters longer than 200 and get the output without truncation of

Dear,

 

In my data, a variable 'description' contains characters longer than 200. I am not getting the desired output for all OBS. With following code I am getting the output needed for first OBS but not for second. If I change number in FINDC fuction to 193 from 194, I am getting the output I need for second OBS but not for first OBS. Please help. Thanks you very much for the support

 

description

Subject xxxxxxxx date of first dose is >2 days from the Randomization date of xxxxxxxxx instead of being entered into xxx as randomized after xxx completed with first IP dose the following day on xxxxxxxxx.

 

The Week 1 sample for xxxxxxxx was taken 14 1/2 hours after dosing (dose at x:00 and sample taken at xx:30) due to the subject's work schedule. This was documented in the source chart. The xxx reviewed the xx sample collection per protocol with the site staff.

 

 

ouput needed:

1.Subject xxxxxxxx date of first dose is >2 days from the Randomization date of xxxxxxxxx instead of being entered into xxx as randomized after xxx completed with first IP dose the following day on
2.The Week 1 sample for xxxxxxxx was taken 14 1/2 hours after dosing (dose at x:00 and sample taken at xx:30) due to the subject's work schedule. This was documented in the source chart. The xxx

 

 

output getting;

1.Subject xxxxxxxx date of first dose is >2 days from the Randomization date of xxxxxxxxx instead of being entered into xxx as randomized after xxx completed with first IP dose the following day on
2.The Week 1 sample for xxxxxxxx was taken 14 1/2 hours after dosing (dose at x:00 and sample taken at xx:30) due to the subject's work schedule. This was documented in the source chart. The xxx reviewe

 

code;

data one;

set two;
if length(description) < 200 then TERM=DESCRIPTION;
Else if length(description) > 200 then TERM = substr(description,1,findc(description,' ',194));

run;

Respected Advisor
Posts: 3,887

Re: How to read variables containing characters longer than 200 and get the output without truncatio

Not sure that I really understand what you're after so let me re-formulate your problem and then please confirm or correct.

 

1. You have a source character variable with potentially a very long string (words)

2. You want to split up this very long string into strings of maximum 200 characters 

3. You want that these sub-strings always end with a full non-truncated word from your source string

4. You want that a sub-string always starts with a new sentence (sentence starts after a punctuation like dot or at the very beginning of your source string).

 

Can you please confirm or correct my understanding?

Super Contributor
Posts: 270

Re: How to read variables containing characters longer than 200 and get the output without truncatio

You are right. Thank you for your help. I am doing QC of a data set. I am writing the program to proc compare. The output needed is the output in data needs to be checked. 

By looking at first OBS in the vendor dataset, I wrote the code. The code produced the output exactly the output in vendor dataset for some OBS.(eg.First OBS). It didnot produce exactly the out needed for other OBS. (eg. Second OBS). 

 

Super Contributor
Posts: 270

Re: How to read variables containing characters longer than 200 and get the output without truncatio

Thank you all for helping me. The code just posted worked. Thank you.

Respected Advisor
Posts: 4,641

Re: How to read variables containing characters longer than 200 and get the output without truncatio

You could use pattern matching:

 

data want;
length term $200;
set have;
term = prxChange("s/(.{1,200})\s.*/\1/o", 1, description);
run;
PG
Super Contributor
Posts: 270

Re: How to read variables containing characters longer than 200 and get the output without truncatio

Thanks for the code. It worked. I just have one issue. 

 

 

For this  value the code did not work. Do you have any suggestions.

 

"Wrong XXX version 1.0 signed during screening.
XXX Version 2.0 was approved by the XXX on 22 Dec 2015. Due to Christmas season and sick leave of the CRA the updated Version x.0 was forwarded to the site after her return starting 11 Jan 2016 so that Version 2.0 was not available at the site for the Screening visit on xx Jan 2016. The site used the previous Version 1.0 to consent the Patient. Subject signed the updated Version 2.0 during the next visit on 13 Jan 2016.
The EC has not yet been notified."

 

 

Output produced by the code:

"Wrong XXX version 1.0 signed during screening.
The EC has not yet been notified."

 

output expected:

 

"Wrong XXX version 1.0 signed during screening.
XXX Version x.0 was approved by the CEC on xx Dec 2015. Due to Christmas season and sick leave of the CRA the updated Version 2.0 was forwarded to the"

 

 

Why it skipped middle part of the value. This is the only value where the code didnot work. Thank you

 

 

 

 

Respected Advisor
Posts: 4,641

Re: How to read variables containing characters longer than 200 and get the output without truncatio

[ Edited ]

You must have embedded line breaks in the field. Try making the pattern :

 

"s/(.{1,200})\s.*/\1/om"

the m modifier makes the function consider the whole field as a single line.

 

 Or else, try the s modifier which makes . match newline characters :

 

"s/(.{1,200})\s.*/\1/os"

 

 

PG
Super Contributor
Posts: 270

Re: How to read variables containing characters longer than 200 and get the output without truncatio

Thank you. 's' modifier worked.

If you don't mind I have to ask one more question as I am moving to supplemental data set. How to get the remaining part into term2 variable. If the charaxters are longer than 400, then how to get into third variable. Please help. Thank you very much

Respected Advisor
Posts: 4,641

Re: How to read variables containing characters longer than 200 and get the output without truncatio

Could be done this way:

 

data want2;
if not prxId then prxId + prxParse("/.{1,200}\s/");
set have;
array term{10} $200;
start = 1;
stop = length(description) + 1;
call prxNext(prxId, start, stop, description, pos, len);
do i = 1 to dim(term) while(pos > 0);
    term{i} = substr(description, pos, len);
    call prxNext(prxId, start, stop, description, pos, len);
    end;
drop prxId start stop pos len i;
run;
PG
Super User
Posts: 9,662

Re: How to read variables containing characters longer than 200 and get the output without truncatio

If I understand what you mean.

 

options noquotelenmax;
data have;
x= "Wrong XXX version 1.0 signed during screening.
XXX Version 2.0 was approved by the XXX on 22 Dec 2015. Due to Christmas season and sick leave of the CRA the updated Version x.0 was forwarded to the site after her return starting 11 Jan 2016 so that Version 2.0 was not available at the site for the Screening visit on xx Jan 2016. The site used the previous Version 1.0 to consent the Patient. Subject signed the updated Version 2.0 during the next visit on 13 Jan 2016.
The EC has not yet been notified.";output;
x= "Wrong XXX version 1.0 signed during screening.
XXX Version 2.0 was approved by the XXX on 22 Dec 2015. Due to Christmas season and sick leave of the CRA the updated Version x.0 was forwarded to the site after her return starting 11 Jan 2016 so that Version 2.0 was not available at the site for the Screening visit on xx Jan 2016. The site used the previous Version 1.0 to consent the Patient. Subject signed the updated Version 2.0 during the next visit on 13 Jan 2016.
The EC has not yet been notified.";output;
run;
data x;
set have;
length temp $ 200;
n+1;
 do i=1 to countw(x,' ');
  temp=scan(x,i,' ');len=length(temp)+1;output;
 end;
run;
data temp;
 set x;
 by n;
 retain sum;
 if first.n then sum=0;
 sum+len;
 if sum gt 200 then do;group+1;sum=len;end;
run;
data temp1;
length want $ 200;
do until(last.group);
 set temp;
 by n group;
 want=catx(' ',want,temp);
end;
run;
proc transpose data=temp1 out=want;
by n;
var want;
run;
 
Ask a Question
Discussion stats
  • 9 replies
  • 403 views
  • 3 likes
  • 4 in conversation