DATA Step, Macro, Functions and more

How to remove blank value obs that occurred while splitting the source OBS value into multiple obs

Reply
Super Contributor
Posts: 272

How to remove blank value obs that occurred while splitting the source OBS value into multiple obs

[ Edited ]
Dear, I am having issue with the following obs in data one. The actual data is data one from an excel file. The code worked for all values including this. But it also created a blank obs for this value which I am not able to delete with code "if term='' then delete". By running the code below i am getting two OBS in data four. One contains all the characters of the value with length=200 and second with blank value with length 2. I am not able to delete this blank obs. Please help how to delete this. Thank you

data one; input term $ 628.; datalines; Subject 00000000 baseline labs were collected on Day -0, 00AAA0000, instead of on Day -0 per protocol. The baseline labs should not have been drawn as the baseline visit was < 00 days from screening. ; data two; set one; n+1; do i=1 to countw(term,''); temp=scan(term,i,' ');len=length(temp)+1;output; end; run;
data three; set two; by n; retain sum; if first.n then sum=0; if last.n then len=len-1; sum+len; if sum gt 200 then do;group+1;sum=len;end; run;
data four; length dterm $ 200; do until(last.group); set dv2; by n group; dterm=catx(' ',dterm,temp); end; run;
PROC Star
Posts: 7,480

Re: How to remove blank value obs that occurred while splitting the source OBS value into multiple o

[ Edited ]
Posted in reply to knveraraju91

Not sure what you're trying to do. First, when I ran your initial data step, no data was read. It worked a lot better with:

 

data one;
  informat term $628.;
input term &;
datalines;
Subject 00000000 baseline labs were collected on Day -0, 00AAA0000, instead of on Day -0 per protocol. The baseline labs should not have been drawn as the baseline visit was < 00 days from screening. 
;

 

Art, CEO, AnalystFinder.com

 

Super User
Posts: 11,343

Re: How to remove blank value obs that occurred while splitting the source OBS value into multiple o

Posted in reply to knveraraju91

Can you show where you attempted: "I am not able to delete with code "if term='' then delete". "

 

Since that does not appear in your code.

 

You are also referencing a data set, DV2, in data four with no description of the contents to us. And since data four also does not reference any of data one, two or three it is real hard to figure out why it was included in your post.

 

Running your code, with data one modified as per @art297 so there is something in it, I do not see blank observations in any of one, two or three.

 

Is the whole purpose of this excersize to break a long string into strings of 200 characters or less breaking at a word boundary?

Super Contributor
Posts: 272

Re: How to remove blank value obs that occurred while splitting the source OBS value into multiple o

Dear,

The  term variable value "Subject 00000000 baseline labs were collected on Day -0, 00AAA0000, instead of on Day -0 per protocol. The baseline labs should not have been drawn as the baseline visit was < 00 days from screening."

is in my data under a variable name term with length 628 from excel file.  I ran the code  below. I suppose to get only one obs for the value with my code as the length of the value is under 200. 

But I am getting two obs with one  all the characters of the value with length=200 and another with blank value and length=2.

 

I am trying to remove the blank value . But I could not with "if dterm='' then delete". Thank you for your time

 

 

 

data two;
set one;
	n+1;
 do i=1 to countw(term,'');
  temp=scan(term,i,' ');len=length(temp)+1;output;
 end;
	run;

	data three;
 set two;
 by n;
 retain sum;
 if first.n then sum=0;
 if last.n then len=len-1;
 sum+len;
 if sum gt 200 then do;group+1;sum=len;end;
run;

data four;
length dterm $ 200;
do until(last.group);
 set three;
 by n group;
 dterm=catx(' ',dterm,temp);
end;
if dterm='' then delete; run;
PROC Star
Posts: 7,480

Re: How to remove blank value obs that occurred while splitting the source OBS value into multiple o

Posted in reply to knveraraju91

Where is the blank value you are trying to remove? Show us by stating what comes immediately before and after it.

 

Art, CEO, AnalystFinder.com

 

Super Contributor
Posts: 272

Re: How to remove blank value obs that occurred while splitting the source OBS value into multiple o

Thank you very much for your support. I am attaching a screen shot of my actual data for the values. 

 

The data is read from an excel file. 

The variable length (term) is 628. But when I used the code length=length(term). The value is 202 for this obs.

 

I ran the code in previous post which gave two obs with one containg all the characters of the value with length=199(length=length(dterm)). Length of dvterm set at 200.

The other with blank value with length =2(length=length(dterm)). 

PROC Star
Posts: 7,480

Re: How to remove blank value obs that occurred while splitting the source OBS value into multiple o

Posted in reply to knveraraju91

Still not sure what you are trying to either keep or delete. If you just want to find the values you marked in your doc, then the following might help:

 

data one (keep=term val:);
  informat term $628.;
  input term &;
  loc=anydigit(term);
  value1=scan(substr(term,loc),1,', .');
  len=length(value1);
  temp=substr(term,loc+len);
  loc=indexc(temp,'<>-0123456789');
  value2=scan(substr(temp,loc),1,', .');
  len=length(value2);
  temp=substr(temp,loc+len);
  loc=indexc(temp,'<>-0123456789');
  value3=scan(substr(temp,loc),1,', .');
  len=length(value3);
  temp=substr(temp,loc+len);
  loc=indexc(temp,'<>-0123456789');
  value4=scan(substr(temp,loc),1,', .');
  if value4 in ('<','>') then value4=
   catt(value4,scan(substr(temp,loc),2,', .'));
  temp=substr(temp,loc+len);
  loc=indexc(temp,'<>-0123456789');
  value5=scan(substr(temp,loc),1,', .');
  if value5 in ('<','>') then value5=
   catt(value5,scan(substr(temp,loc),2,', .'));
datalines;
Subject 00000000 baseline labs were collected on Day -0, 00AAA0000, instead of on Day -0 per protocol. The baseline labs should not have been drawn as the baseline visit was < 00 days from screening. 
;

Art, CEO, AnalystFinder.com

 

Super Contributor
Posts: 272

Re: How to remove blank value obs that occurred while splitting the source OBS value into multiple o

Thanks for the help. I am attaching an excel sheet. I ran the followibg code. I got two OBS for varaible DTERM I need to remove the blank obs. Thank you

proc import datafile="C:\\Book1.xlsx" DBMS=xlsx out=DV;
                
run;
data two;
set dv;
	n+1;
 do i=1 to countw(term,'');
  temp=scan(term,i,' ');len=length(temp)+1;output;
 end;
	run;

	data three;
 set two;
 by n;
 retain sum;
 if first.n then sum=0;
 if last.n then len=len-1;
 sum+len;
 if sum gt 200 then do;group+1;sum=len;end;
run;

data four;
length dterm $ 200;
do until(last.group);
 set three;
 by n group;
 dterm=catx(' ',dterm,temp);
end;
if dterm='' then delete;
one=length(dterm);
run;
Ask a Question
Discussion stats
  • 7 replies
  • 165 views
  • 0 likes
  • 3 in conversation