DATA Step, Macro, Functions and more

Deleting rows with consecutive missing values

Accepted Solution Solved
Reply
Contributor
Posts: 23
Accepted Solution

Deleting rows with consecutive missing values

[ Edited ]

HI, In the dataset below, The dataset has values Level1, Level2, Level3, Level4 and Blank for the variables named MONTH.  I would like to keep only rows with one or more values missing but missing has to be consecutive. i.e. Delete ID 3,6,13,14,21 and keep the rest.

 

In the new dataset created I would like to keep only those with value as Level1 and consecutive missing values i.e. Delete ID 9 and 23.

Here is the sample of my dataset.

IDStoreMonth 1Month 2Month 3Month 4Month 5Month 6Month 7Month 8
1ALevel1Level1Level1Level1    
2ALevel1Level1      
3BLevel1Level1  Level1 Level1Level1
4CLevel1Level1Level2Level2Level2   
5BLevel1       
6BLevel1 Level1Level1Level3Level3Level1Level1
7ALevel1Level1      
8ALevel1Level1Level1Level1Level1   
9DLevel1Level4Level4Level4Level4Level4Level4 
10DLevel1Level1Level1     
11ELevel1       
12ELevel1Level1Level1Level1Level1Level1Level1 
13DLevel1 Level1     
14BLevel1Level1Level1 Level3Level3Level3Level3
15ALevel1Level1Level1Level1Level1   
16DLevel1Level1      
17BLevel1Level1Level1     
18ELevel1Level1Level1     
19FLevel1Level1Level1Level1Level1   
20DLevel1Level1Level1Level1Level1   
21ALevel1Level1 Level1Level1Level1Level1Level1
22ALevel1Level1Level1Level1Level1Level1Level1 
23DLevel1Level1Level1Level1Level1Level1Level2 
24ELevel1Level1      

Accepted Solutions
Solution
‎01-17-2018 12:33 PM
PROC Star
Posts: 1,584

Re: Deleting rows with consecutive missing values

[ Edited ]

what source dataset are you testing against. 

I tested again for you, here is my log

 


3952 data have;
3953 input ID Store$ (Month1-Month8)($);
3954 infile datalines dsd missover;
3955 datalines;

NOTE: The data set WORK.HAVE has 24 observations and 10 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds


3980 ;
3981
3982 data want;
3983 set have;
3984 array t(*) month:;
3985 k=whichc(' ', of t(*));
3986 if k>1 then do;
3987 do _n_=dim(t) by -1 to k+1;
3988 if not missing(t(_n_)) then do; _t=1;leave;end;
3989 end;
3990 end;
3991 if _t then delete;
3992 drop _t;
3993 run;

NOTE: There were 24 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 19 observations and 11 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds

 

 

This is my output:

IDStoreMonth1Month2Month3Month4Month5Month6Month7Month8
1ALevel1Level1Level1Level1    
2ALevel1Level1      
4CLevel1Level1Level2Level2Level2   
5BLevel1       
7ALevel1Level1      
8ALevel1Level1Level1Level1Level1   
9DLevel1Level4Level4Level4Level4Level4Level4 
10DLevel1Level1Level1     
11ELevel1       
12ELevel1Level1Level1Level1Level1Level1Level1 
15ALevel1Level1Level1Level1Level1   
16DLevel1Level1      
17BLevel1Level1Level1     
18ELevel1Level1Level1     
19FLevel1Level1Level1Level1Level1   
20DLevel1Level1Level1Level1Level1   
22ALevel1Level1Level1Level1Level1Level1Level1 
23DLevel1Level1Level1Level1Level1Level1Level2 
24ELevel1Level1      

View solution in original post


All Replies
PROC Star
Posts: 1,215

Re: Deleting rows with consecutive missing values

Just to understand your logic, why is ID=2 not deleted? It has a consecutive missing value for Month 5?

Contributor
Posts: 23

Re: Deleting rows with consecutive missing values

If variables MONTH has a value and then has a consecutive missing values then I want to keep it so ID 2 is a keep and so is 1,4,5 and so on.

PROC Star
Posts: 1,215

Re: Deleting rows with consecutive missing values

Ok. I think you are looking for something like this

 

data have;
input ID Store$ (Month1-Month8)($);
infile datalines dsd missover;
datalines;
1,A,Level1,Level1,Level1,Level1,,,,
2,A,Level1,Level1,,,,,,
3,B,Level1,Level1,,,Level1,,Level1,Level1
4,C,Level1,Level1,Level2,Level2,Level2,,,
5,B,Level1,,,,,,,
6,B,Level1,,Level1,Level1,Level3,Level3,Level1,Level1
7,A,Level1,Level1,,,,,,
8,A,Level1,Level1,Level1,Level1,Level1,,,
9,D,Level1,Level4,Level4,Level4,Level4,Level4,Level4,
10,D,Level1,Level1,Level1,,,,,
11,E,Level1,,,,,,,
12,E,Level1,Level1,Level1,Level1,Level1,Level1,Level1,
13,D,Level1,,Level1,,,,,
14,B,Level1,Level1,Level1,,Level3,Level3,Level3,Level3
15,A,Level1,Level1,Level1,Level1,Level1,,,
16,D,Level1,Level1,,,,,,
17,B,Level1,Level1,Level1,,,,,
18,E,Level1,Level1,Level1,,,,,
19,F,Level1,Level1,Level1,Level1,Level1,,,
20,D,Level1,Level1,Level1,Level1,Level1,,,
21,A,Level1,Level1,,Level1,Level1,Level1,Level1,Level1
22,A,Level1,Level1,Level1,Level1,Level1,Level1,Level1,
23,D,Level1,Level1,Level1,Level1,Level1,Level1,Level2,
24,E,Level1,Level1,,,,,,
;

data want(drop=i);
   set have;
   array months{*} Month1-Month8;
   do i=1 to dim(months)-1;
      if (months[i] = '' and months[i+1] ne '') then delete;
   end;
run;
Contributor
Posts: 23

Re: Deleting rows with consecutive missing values

HI, Thanks so much.

I think this is working but the problem I found is that in my original dataset I have 947 Obs.

When I run your code below, I got 842 Obs which is fine.

data want(drop=i);
   set have;
   array months{*} Month1-Month8; do i=1 to dim(months)-1;
      if (months[i] = '' and months[i+1] ne '') then delete;
   end;
run;

 

BUT then I run following code, to check the deleted obs, but I get 107 Obs. 107 + 842 does not total 947 obs.

data want(drop=i);
   set have;
   array months{*} Month1-Month8; do i=1 to dim(months)-1;
      if (months[i] = '' and months[i+1] ne '') then Output;
   end;
run;

PROC Star
Posts: 1,215

Re: Deleting rows with consecutive missing values

[ Edited ]

@aled, that is because the two are not additive. The contition

 

if (months[i] = '' and months[i+1] ne '')

can be true more than once for the same observation. However, you can delete an observation only once (Your first data step), while you can output an observation more than once (Your second data step).

 

You can check this by looking at the data created by your second data step. I suspect than you have multiple lines with the same ID.

Contributor
Posts: 23

Re: Deleting rows with consecutive missing values

The solution works for the first part of my original question. Solution by novisnorin also worked for the first part of my question. Thanks a lot.

PROC Star
Posts: 503

Re: Deleting rows with consecutive missing values

something like this 

data abc;
infile datalines missover dlm ='09'x;
input 
ID	1-2 Store $ 3-4	Month1 $	Month2 $	Month3	$ Month4 $	Month5 $ Month6	$ Month7 $ Month8 $;
datalines;
1	A	Level1	Level1	Level1	Level1	 	 	 	 
2	A	Level1	Level1	 	 	 	 	 	 
3	B	Level1	Level1	 	 	Level1	 	Level1	Level1
4	C	Level1	Level1	Level2	Level2	Level2	 	 	 
5	B	Level1	 	 	 	 	 	 	 
6	B	Level1	 	Level1	Level1	Level3	Level3	Level1	Level1
7	A	Level1	Level1	 	 	 	 	 	 
8	 	Level1	Level1	Level1	Level1	Level1	 	 	 
9	D	Level1	Level4	Level4	Level4	Level4	Level4	Level4	 
10	D	Level1	Level1	Level1	 	 	 	 	 
11	E	Level1	 	 	 	 	 	 	 
12	E	Level1	Level1	Level1	Level1	Level1	Level1	Level1	 
13	D	Level1	 	Level1	 	 	 	 	 
14	B	Level1	Level1	Level1	 	Level3	Level3	Level3	Level3
15	A	Level1	Level1	Level1	Level1	Level1	 	 	 
16	D	Level1	Level1	 	 	 	 	 	 
17	B	Level1	Level1	Level1	 	 	 	 	 
18	E	Level1	Level1	Level1	 	 	 	 	 
19	F	Level1	Level1	Level1	Level1	Level1	 	 	 
20	D	Level1	Level1	Level1	Level1	Level1	 	 	 
21	A	Level1	Level1	 	Level1	Level1	Level1	Level1	Level1
22	A	Level1	Level1	Level1	Level1	Level1	Level1	Level1	 
23	D	Level1	Level1	Level1	Level1	Level1	Level1	Level2	 
24	E	Level1	Level1	 	
; 	


data abss;
length newval $ 100;
set abc;
tot= 0;
array varsnew (8) $ newmonth1-newmonth8;
array vars (8) $ month1-month8;
do i = 1 to 8;
if vars{i} = '' then varsnew{i} = '-';
else varsnew{i} =vars{i};
if vars{i} not in ('Level1') then tot=tot+1;
else tot=tot+0;
end;
newval= cats(of newmonth1-newmonth8);
if index(newval,'--' )> 0 or tot >1 then delete;
drop i newmonth1-newmonth8 newval tot;
run;
PROC Star
Posts: 1,584

Re: Deleting rows with consecutive missing values

[ Edited ]

some mathematical thinking to make it easy:

 

data have;
input ID Store$ (Month1-Month8)($);
infile datalines dsd missover;
datalines;
1,A,Level1,Level1,Level1,Level1,,,,
2,A,Level1,Level1,,,,,,
3,B,Level1,Level1,,,Level1,,Level1,Level1
4,C,Level1,Level1,Level2,Level2,Level2,,,
5,B,Level1,,,,,,,
6,B,Level1,,Level1,Level1,Level3,Level3,Level1,Level1
7,A,Level1,Level1,,,,,,
8,A,Level1,Level1,Level1,Level1,Level1,,,
9,D,Level1,Level4,Level4,Level4,Level4,Level4,Level4,
10,D,Level1,Level1,Level1,,,,,
11,E,Level1,,,,,,,
12,E,Level1,Level1,Level1,Level1,Level1,Level1,Level1,
13,D,Level1,,Level1,,,,,
14,B,Level1,Level1,Level1,,Level3,Level3,Level3,Level3
15,A,Level1,Level1,Level1,Level1,Level1,,,
16,D,Level1,Level1,,,,,,
17,B,Level1,Level1,Level1,,,,,
18,E,Level1,Level1,Level1,,,,,
19,F,Level1,Level1,Level1,Level1,Level1,,,
20,D,Level1,Level1,Level1,Level1,Level1,,,
21,A,Level1,Level1,,Level1,Level1,Level1,Level1,Level1
22,A,Level1,Level1,Level1,Level1,Level1,Level1,Level1,
23,D,Level1,Level1,Level1,Level1,Level1,Level1,Level2,
24,E,Level1,Level1,,,,,,
;

data want;
set have;
array t(*) month:;
k=whichc(' ', of t(*));
if k>1 then do;
do _n_=dim(t) by -1 to k+1;
if not missing(t(_n_)) then do; _t=1;leave;end;
end;
end;
if _t then delete;
drop _t k;
run;
Contributor
Posts: 23

Re: Deleting rows with consecutive missing values

Posted in reply to novinosrin

Hi, got the following warning message when running the code:

 

data want;
set have;
array t(*) month:;

WARNING: Defining an array with zero elements
k=whichc(' ', of t(*));
if k>1 then do; do _n_=dim(t) by -1 to k+1;
if not missing(t(_n_)) then do; _t=1;leave;end;
end;
end;
if _t then delete;
drop _t;
run;

NOTE: Numeric values have been converted to character values at the places given by:

(Line)Smiley SadColumn).

420:18

NOTE: The SAS System stopped processing this step because of errors.

WARNING: The data set WANT may be incomplete. When this step was stopped there

were 0 observations and 11 variables.

WARNING: Data set WANT was not replaced because this step was stopped.

NOTE: DATA statement used (Total process time):

real time 0.03 seconds

cpu time 0.03 seconds

Solution
‎01-17-2018 12:33 PM
PROC Star
Posts: 1,584

Re: Deleting rows with consecutive missing values

[ Edited ]

what source dataset are you testing against. 

I tested again for you, here is my log

 


3952 data have;
3953 input ID Store$ (Month1-Month8)($);
3954 infile datalines dsd missover;
3955 datalines;

NOTE: The data set WORK.HAVE has 24 observations and 10 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds


3980 ;
3981
3982 data want;
3983 set have;
3984 array t(*) month:;
3985 k=whichc(' ', of t(*));
3986 if k>1 then do;
3987 do _n_=dim(t) by -1 to k+1;
3988 if not missing(t(_n_)) then do; _t=1;leave;end;
3989 end;
3990 end;
3991 if _t then delete;
3992 drop _t;
3993 run;

NOTE: There were 24 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 19 observations and 11 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds

 

 

This is my output:

IDStoreMonth1Month2Month3Month4Month5Month6Month7Month8
1ALevel1Level1Level1Level1    
2ALevel1Level1      
4CLevel1Level1Level2Level2Level2   
5BLevel1       
7ALevel1Level1      
8ALevel1Level1Level1Level1Level1   
9DLevel1Level4Level4Level4Level4Level4Level4 
10DLevel1Level1Level1     
11ELevel1       
12ELevel1Level1Level1Level1Level1Level1Level1 
15ALevel1Level1Level1Level1Level1   
16DLevel1Level1      
17BLevel1Level1Level1     
18ELevel1Level1Level1     
19FLevel1Level1Level1Level1Level1   
20DLevel1Level1Level1Level1Level1   
22ALevel1Level1Level1Level1Level1Level1Level1 
23DLevel1Level1Level1Level1Level1Level1Level2 
24ELevel1Level1      
Contributor
Posts: 23

Re: Deleting rows with consecutive missing values

Posted in reply to novinosrin

It worked for the first part of my original question. Thanks a lot.

PROC Star
Posts: 503

Re: Deleting rows with consecutive missing values

did you try my answer, I think it works for 2nd part of question too

Contributor
Posts: 23

Re: Deleting rows with consecutive missing values

I don't know why my reply to your post is not being visible here. did you get my reply. I did it 3 times already.

PROC Star
Posts: 1,584

Re: Deleting rows with consecutive missing values

Ok sorry, let me help you with the 2nd part ina bit. I didn't see it. too much coffee caffeine effect Smiley Sad

 

Your wrote:

"In the new dataset created I would like to keep only those with value as Level1 and consecutive missing values i.e. Delete ID 9 and 23."

can you explain more clearly as there many records with level1 and other levels? please

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 27 replies
  • 276 views
  • 1 like
  • 4 in conversation