BookmarkSubscribeRSS Feed
knveraraju91
Barite | Level 11

Dear,

 

In my data, a variable 'description' contains characters longer than 200. I am not getting the desired output for all OBS. With following code I am getting the output needed for first OBS but not for second. If I change number in FINDC fuction to 193 from 194, I am getting the output I need for second OBS but not for first OBS. Please help. Thanks you very much for the support

 

description

Subject xxxxxxxx date of first dose is >2 days from the Randomization date of xxxxxxxxx instead of being entered into xxx as randomized after xxx completed with first IP dose the following day on xxxxxxxxx.

 

The Week 1 sample for xxxxxxxx was taken 14 1/2 hours after dosing (dose at x:00 and sample taken at xx:30) due to the subject's work schedule. This was documented in the source chart. The xxx reviewed the xx sample collection per protocol with the site staff.

 

 

ouput needed:

1.Subject xxxxxxxx date of first dose is >2 days from the Randomization date of xxxxxxxxx instead of being entered into xxx as randomized after xxx completed with first IP dose the following day on
2.The Week 1 sample for xxxxxxxx was taken 14 1/2 hours after dosing (dose at x:00 and sample taken at xx:30) due to the subject's work schedule. This was documented in the source chart. The xxx

 

 

output getting;

1.Subject xxxxxxxx date of first dose is >2 days from the Randomization date of xxxxxxxxx instead of being entered into xxx as randomized after xxx completed with first IP dose the following day on
2.The Week 1 sample for xxxxxxxx was taken 14 1/2 hours after dosing (dose at x:00 and sample taken at xx:30) due to the subject's work schedule. This was documented in the source chart. The xxx reviewe

 

code;

data one;

set two;
if length(description) < 200 then TERM=DESCRIPTION;
Else if length(description) > 200 then TERM = substr(description,1,findc(description,' ',194));

run;

9 REPLIES 9
Patrick
Opal | Level 21

Not sure that I really understand what you're after so let me re-formulate your problem and then please confirm or correct.

 

1. You have a source character variable with potentially a very long string (words)

2. You want to split up this very long string into strings of maximum 200 characters 

3. You want that these sub-strings always end with a full non-truncated word from your source string

4. You want that a sub-string always starts with a new sentence (sentence starts after a punctuation like dot or at the very beginning of your source string).

 

Can you please confirm or correct my understanding?

knveraraju91
Barite | Level 11

You are right. Thank you for your help. I am doing QC of a data set. I am writing the program to proc compare. The output needed is the output in data needs to be checked. 

By looking at first OBS in the vendor dataset, I wrote the code. The code produced the output exactly the output in vendor dataset for some OBS.(eg.First OBS). It didnot produce exactly the out needed for other OBS. (eg. Second OBS). 

 

knveraraju91
Barite | Level 11

Thank you all for helping me. The code just posted worked. Thank you.

PGStats
Opal | Level 21

You could use pattern matching:

 

data want;
length term $200;
set have;
term = prxChange("s/(.{1,200})\s.*/\1/o", 1, description);
run;
PG
knveraraju91
Barite | Level 11

Thanks for the code. It worked. I just have one issue. 

 

 

For this  value the code did not work. Do you have any suggestions.

 

"Wrong XXX version 1.0 signed during screening.
XXX Version 2.0 was approved by the XXX on 22 Dec 2015. Due to Christmas season and sick leave of the CRA the updated Version x.0 was forwarded to the site after her return starting 11 Jan 2016 so that Version 2.0 was not available at the site for the Screening visit on xx Jan 2016. The site used the previous Version 1.0 to consent the Patient. Subject signed the updated Version 2.0 during the next visit on 13 Jan 2016.
The EC has not yet been notified."

 

 

Output produced by the code:

"Wrong XXX version 1.0 signed during screening.
The EC has not yet been notified."

 

output expected:

 

"Wrong XXX version 1.0 signed during screening.
XXX Version x.0 was approved by the CEC on xx Dec 2015. Due to Christmas season and sick leave of the CRA the updated Version 2.0 was forwarded to the"

 

 

Why it skipped middle part of the value. This is the only value where the code didnot work. Thank you

 

 

 

 

PGStats
Opal | Level 21

You must have embedded line breaks in the field. Try making the pattern :

 

"s/(.{1,200})\s.*/\1/om"

the m modifier makes the function consider the whole field as a single line.

 

 Or else, try the s modifier which makes . match newline characters :

 

"s/(.{1,200})\s.*/\1/os"

 

 

PG
knveraraju91
Barite | Level 11

Thank you. 's' modifier worked.

If you don't mind I have to ask one more question as I am moving to supplemental data set. How to get the remaining part into term2 variable. If the charaxters are longer than 400, then how to get into third variable. Please help. Thank you very much

PGStats
Opal | Level 21

Could be done this way:

 

data want2;
if not prxId then prxId + prxParse("/.{1,200}\s/");
set have;
array term{10} $200;
start = 1;
stop = length(description) + 1;
call prxNext(prxId, start, stop, description, pos, len);
do i = 1 to dim(term) while(pos > 0);
    term{i} = substr(description, pos, len);
    call prxNext(prxId, start, stop, description, pos, len);
    end;
drop prxId start stop pos len i;
run;
PG
Ksharp
Super User

If I understand what you mean.

 

options noquotelenmax;
data have;
x= "Wrong XXX version 1.0 signed during screening.
XXX Version 2.0 was approved by the XXX on 22 Dec 2015. Due to Christmas season and sick leave of the CRA the updated Version x.0 was forwarded to the site after her return starting 11 Jan 2016 so that Version 2.0 was not available at the site for the Screening visit on xx Jan 2016. The site used the previous Version 1.0 to consent the Patient. Subject signed the updated Version 2.0 during the next visit on 13 Jan 2016.
The EC has not yet been notified.";output;
x= "Wrong XXX version 1.0 signed during screening.
XXX Version 2.0 was approved by the XXX on 22 Dec 2015. Due to Christmas season and sick leave of the CRA the updated Version x.0 was forwarded to the site after her return starting 11 Jan 2016 so that Version 2.0 was not available at the site for the Screening visit on xx Jan 2016. The site used the previous Version 1.0 to consent the Patient. Subject signed the updated Version 2.0 during the next visit on 13 Jan 2016.
The EC has not yet been notified.";output;
run;
data x;
set have;
length temp $ 200;
n+1;
 do i=1 to countw(x,' ');
  temp=scan(x,i,' ');len=length(temp)+1;output;
 end;
run;
data temp;
 set x;
 by n;
 retain sum;
 if first.n then sum=0;
 sum+len;
 if sum gt 200 then do;group+1;sum=len;end;
run;
data temp1;
length want $ 200;
do until(last.group);
 set temp;
 by n group;
 want=catx(' ',want,temp);
end;
run;
proc transpose data=temp1 out=want;
by n;
var want;
run;
 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1131 views
  • 3 likes
  • 4 in conversation