data one; input term $ 628.; datalines; Subject 00000000 baseline labs were collected on Day -0, 00AAA0000, instead of on Day -0 per protocol. The baseline labs should not have been drawn as the baseline visit was < 00 days from screening. ; data two; set one; n+1; do i=1 to countw(term,''); temp=scan(term,i,' ');len=length(temp)+1;output; end; run;
data three; set two; by n; retain sum; if first.n then sum=0; if last.n then len=len-1; sum+len; if sum gt 200 then do;group+1;sum=len;end; run;
data four; length dterm $ 200; do until(last.group); set dv2; by n group; dterm=catx(' ',dterm,temp); end; run;
Not sure what you're trying to do. First, when I ran your initial data step, no data was read. It worked a lot better with:
data one; informat term $628.; input term &; datalines; Subject 00000000 baseline labs were collected on Day -0, 00AAA0000, instead of on Day -0 per protocol. The baseline labs should not have been drawn as the baseline visit was < 00 days from screening. ;
Art, CEO, AnalystFinder.com
Can you show where you attempted: "I am not able to delete with code "if term='' then delete". "
Since that does not appear in your code.
You are also referencing a data set, DV2, in data four with no description of the contents to us. And since data four also does not reference any of data one, two or three it is real hard to figure out why it was included in your post.
Running your code, with data one modified as per @art297 so there is something in it, I do not see blank observations in any of one, two or three.
Is the whole purpose of this excersize to break a long string into strings of 200 characters or less breaking at a word boundary?
Dear,
The term variable value "Subject 00000000 baseline labs were collected on Day -0, 00AAA0000, instead of on Day -0 per protocol. The baseline labs should not have been drawn as the baseline visit was < 00 days from screening."
is in my data under a variable name term with length 628 from excel file. I ran the code below. I suppose to get only one obs for the value with my code as the length of the value is under 200.
But I am getting two obs with one all the characters of the value with length=200 and another with blank value and length=2.
I am trying to remove the blank value . But I could not with "if dterm='' then delete". Thank you for your time
data two; set one; n+1; do i=1 to countw(term,''); temp=scan(term,i,' ');len=length(temp)+1;output; end; run; data three; set two; by n; retain sum; if first.n then sum=0; if last.n then len=len-1; sum+len; if sum gt 200 then do;group+1;sum=len;end; run; data four; length dterm $ 200; do until(last.group); set three; by n group; dterm=catx(' ',dterm,temp); end;
if dterm='' then delete; run;
Where is the blank value you are trying to remove? Show us by stating what comes immediately before and after it.
Art, CEO, AnalystFinder.com
Thank you very much for your support. I am attaching a screen shot of my actual data for the values.
The data is read from an excel file.
The variable length (term) is 628. But when I used the code length=length(term). The value is 202 for this obs.
I ran the code in previous post which gave two obs with one containg all the characters of the value with length=199(length=length(dterm)). Length of dvterm set at 200.
The other with blank value with length =2(length=length(dterm)).
Still not sure what you are trying to either keep or delete. If you just want to find the values you marked in your doc, then the following might help:
data one (keep=term val:); informat term $628.; input term &; loc=anydigit(term); value1=scan(substr(term,loc),1,', .'); len=length(value1); temp=substr(term,loc+len); loc=indexc(temp,'<>-0123456789'); value2=scan(substr(temp,loc),1,', .'); len=length(value2); temp=substr(temp,loc+len); loc=indexc(temp,'<>-0123456789'); value3=scan(substr(temp,loc),1,', .'); len=length(value3); temp=substr(temp,loc+len); loc=indexc(temp,'<>-0123456789'); value4=scan(substr(temp,loc),1,', .'); if value4 in ('<','>') then value4= catt(value4,scan(substr(temp,loc),2,', .')); temp=substr(temp,loc+len); loc=indexc(temp,'<>-0123456789'); value5=scan(substr(temp,loc),1,', .'); if value5 in ('<','>') then value5= catt(value5,scan(substr(temp,loc),2,', .')); datalines; Subject 00000000 baseline labs were collected on Day -0, 00AAA0000, instead of on Day -0 per protocol. The baseline labs should not have been drawn as the baseline visit was < 00 days from screening. ;
Art, CEO, AnalystFinder.com
Thanks for the help. I am attaching an excel sheet. I ran the followibg code. I got two OBS for varaible DTERM I need to remove the blank obs. Thank you
proc import datafile="C:\\Book1.xlsx" DBMS=xlsx out=DV; run; data two; set dv; n+1; do i=1 to countw(term,''); temp=scan(term,i,' ');len=length(temp)+1;output; end; run; data three; set two; by n; retain sum; if first.n then sum=0; if last.n then len=len-1; sum+len; if sum gt 200 then do;group+1;sum=len;end; run; data four; length dterm $ 200; do until(last.group); set three; by n group; dterm=catx(' ',dterm,temp); end; if dterm='' then delete; one=length(dterm); run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.