Solved: Re: Deleting rows with consecutive missing values

d0816 · Posted 01-17-2018 10:11 AM

HI, In the dataset below, The dataset has values Level1, Level2, Level3, Level4 and Blank for the variables named MONTH. I would like to keep only rows with one or more values missing but missing has to be consecutive. i.e. Delete ID 3,6,13,14,21 and keep the rest.

In the new dataset created I would like to keep only those with value as Level1 and consecutive missing values i.e. Delete ID 9 and 23.

Here is the sample of my dataset.

ID	Store	Month 1	Month 2	Month 3	Month 4	Month 5	Month 6	Month 7	Month 8
1	A	Level1	Level1	Level1	Level1
2	A	Level1	Level1
3	B	Level1	Level1			Level1		Level1	Level1
4	C	Level1	Level1	Level2	Level2	Level2
5	B	Level1
6	B	Level1		Level1	Level1	Level3	Level3	Level1	Level1
7	A	Level1	Level1
8	A	Level1	Level1	Level1	Level1	Level1
9	D	Level1	Level4	Level4	Level4	Level4	Level4	Level4
10	D	Level1	Level1	Level1
11	E	Level1
12	E	Level1	Level1	Level1	Level1	Level1	Level1	Level1
13	D	Level1		Level1
14	B	Level1	Level1	Level1		Level3	Level3	Level3	Level3
15	A	Level1	Level1	Level1	Level1	Level1
16	D	Level1	Level1
17	B	Level1	Level1	Level1
18	E	Level1	Level1	Level1
19	F	Level1	Level1	Level1	Level1	Level1
20	D	Level1	Level1	Level1	Level1	Level1
21	A	Level1	Level1		Level1	Level1	Level1	Level1	Level1
22	A	Level1	Level1	Level1	Level1	Level1	Level1	Level1
23	D	Level1	Level1	Level1	Level1	Level1	Level1	Level2
24	E	Level1	Level1

novinosrin · Posted 01-17-2018 12:13 PM

what source dataset are you testing against.

I tested again for you, here is my log

3952 data have;
3953 input ID Store$ (Month1-Month8)($);
3954 infile datalines dsd missover;
3955 datalines;

NOTE: The data set WORK.HAVE has 24 observations and 10 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

3980 ;
3981
3982 data want;
3983 set have;
3984 array t(*) month:;
3985 k=whichc(' ', of t(*));
3986 if k>1 then do;
3987 do _n_=dim(t) by -1 to k+1;
3988 if not missing(t(_n_)) then do; _t=1;leave;end;
3989 end;
3990 end;
3991 if _t then delete;
3992 drop _t;
3993 run;

NOTE: There were 24 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 19 observations and 11 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds

This is my output:

ID	Store	Month1	Month2	Month3	Month4	Month5	Month6	Month7	Month8
1	A	Level1	Level1	Level1	Level1
2	A	Level1	Level1
4	C	Level1	Level1	Level2	Level2	Level2
5	B	Level1
7	A	Level1	Level1
8	A	Level1	Level1	Level1	Level1	Level1
9	D	Level1	Level4	Level4	Level4	Level4	Level4	Level4
10	D	Level1	Level1	Level1
11	E	Level1
12	E	Level1	Level1	Level1	Level1	Level1	Level1	Level1
15	A	Level1	Level1	Level1	Level1	Level1
16	D	Level1	Level1
17	B	Level1	Level1	Level1
18	E	Level1	Level1	Level1
19	F	Level1	Level1	Level1	Level1	Level1
20	D	Level1	Level1	Level1	Level1	Level1
22	A	Level1	Level1	Level1	Level1	Level1	Level1	Level1
23	D	Level1	Level1	Level1	Level1	Level1	Level1	Level2
24	E	Level1	Level1

View solution in original post

PeterClemmensen · Posted 01-17-2018 10:24 AM

Just to understand your logic, why is ID=2 not deleted? It has a consecutive missing value for Month 5?

The DATA to DATA Step Macro
Blog: SASnrd

d0816 · Posted 01-17-2018 10:28 AM

If variables MONTH has a value and then has a consecutive missing values then I want to keep it so ID 2 is a keep and so is 1,4,5 and so on.

PeterClemmensen · Posted 01-17-2018 10:47 AM

Ok. I think you are looking for something like this

data have;
input ID Store$ (Month1-Month8)($);
infile datalines dsd missover;
datalines;
1,A,Level1,Level1,Level1,Level1,,,,
2,A,Level1,Level1,,,,,,
3,B,Level1,Level1,,,Level1,,Level1,Level1
4,C,Level1,Level1,Level2,Level2,Level2,,,
5,B,Level1,,,,,,,
6,B,Level1,,Level1,Level1,Level3,Level3,Level1,Level1
7,A,Level1,Level1,,,,,,
8,A,Level1,Level1,Level1,Level1,Level1,,,
9,D,Level1,Level4,Level4,Level4,Level4,Level4,Level4,
10,D,Level1,Level1,Level1,,,,,
11,E,Level1,,,,,,,
12,E,Level1,Level1,Level1,Level1,Level1,Level1,Level1,
13,D,Level1,,Level1,,,,,
14,B,Level1,Level1,Level1,,Level3,Level3,Level3,Level3
15,A,Level1,Level1,Level1,Level1,Level1,,,
16,D,Level1,Level1,,,,,,
17,B,Level1,Level1,Level1,,,,,
18,E,Level1,Level1,Level1,,,,,
19,F,Level1,Level1,Level1,Level1,Level1,,,
20,D,Level1,Level1,Level1,Level1,Level1,,,
21,A,Level1,Level1,,Level1,Level1,Level1,Level1,Level1
22,A,Level1,Level1,Level1,Level1,Level1,Level1,Level1,
23,D,Level1,Level1,Level1,Level1,Level1,Level1,Level2,
24,E,Level1,Level1,,,,,,
;

data want(drop=i);
   set have;
   array months{*} Month1-Month8;
   do i=1 to dim(months)-1;
      if (months[i] = '' and months[i+1] ne '') then delete;
   end;
run;

The DATA to DATA Step Macro
Blog: SASnrd

d0816 · Posted 01-17-2018 11:20 AM

HI, Thanks so much.

I think this is working but the problem I found is that in my original dataset I have 947 Obs.

When I run your code below, I got 842 Obs which is fine.

data want(drop=i);
   set have;
   array months{*} Month1-Month8; do i=1 to dim(months)-1;
      if (months[i] = '' and months[i+1] ne '') then delete;
   end;
run;

BUT then I run following code, to check the deleted obs, but I get 107 Obs. 107 + 842 does not total 947 obs.

data want(drop=i);
   set have;
   array months{*} Month1-Month8; do i=1 to dim(months)-1;
      if (months[i] = '' and months[i+1] ne '') then Output;
   end;
run;

PeterClemmensen · Posted 01-17-2018 12:18 PM

@d0816, that is because the two are not additive. The contition

if (months[i] = '' and months[i+1] ne '')

can be true more than once for the same observation. However, you can delete an observation only once (Your first data step), while you can output an observation more than once (Your second data step).

You can check this by looking at the data created by your second data step. I suspect than you have multiple lines with the same ID.

The DATA to DATA Step Macro
Blog: SASnrd

d0816 · Posted 01-17-2018 12:38 PM

The solution works for the first part of my original question. Solution by novisnorin also worked for the first part of my question. Thanks a lot.

kiranv_ · Posted 01-17-2018 11:20 AM

something like this

data abc;
infile datalines missover dlm ='09'x;
input 
ID	1-2 Store $ 3-4	Month1 $	Month2 $	Month3	$ Month4 $	Month5 $ Month6	$ Month7 $ Month8 $;
datalines;
1	A	Level1	Level1	Level1	Level1	 	 	 	 
2	A	Level1	Level1	 	 	 	 	 	 
3	B	Level1	Level1	 	 	Level1	 	Level1	Level1
4	C	Level1	Level1	Level2	Level2	Level2	 	 	 
5	B	Level1	 	 	 	 	 	 	 
6	B	Level1	 	Level1	Level1	Level3	Level3	Level1	Level1
7	A	Level1	Level1	 	 	 	 	 	 
8	 	Level1	Level1	Level1	Level1	Level1	 	 	 
9	D	Level1	Level4	Level4	Level4	Level4	Level4	Level4	 
10	D	Level1	Level1	Level1	 	 	 	 	 
11	E	Level1	 	 	 	 	 	 	 
12	E	Level1	Level1	Level1	Level1	Level1	Level1	Level1	 
13	D	Level1	 	Level1	 	 	 	 	 
14	B	Level1	Level1	Level1	 	Level3	Level3	Level3	Level3
15	A	Level1	Level1	Level1	Level1	Level1	 	 	 
16	D	Level1	Level1	 	 	 	 	 	 
17	B	Level1	Level1	Level1	 	 	 	 	 
18	E	Level1	Level1	Level1	 	 	 	 	 
19	F	Level1	Level1	Level1	Level1	Level1	 	 	 
20	D	Level1	Level1	Level1	Level1	Level1	 	 	 
21	A	Level1	Level1	 	Level1	Level1	Level1	Level1	Level1
22	A	Level1	Level1	Level1	Level1	Level1	Level1	Level1	 
23	D	Level1	Level1	Level1	Level1	Level1	Level1	Level2	 
24	E	Level1	Level1	 	
; 	


data abss;
length newval $ 100;
set abc;
tot= 0;
array varsnew (8) $ newmonth1-newmonth8;
array vars (8) $ month1-month8;
do i = 1 to 8;
if vars{i} = '' then varsnew{i} = '-';
else varsnew{i} =vars{i};
if vars{i} not in ('Level1') then tot=tot+1;
else tot=tot+0;
end;
newval= cats(of newmonth1-newmonth8);
if index(newval,'--' )> 0 or tot >1 then delete;
drop i newmonth1-newmonth8 newval tot;
run;

novinosrin · Posted 01-17-2018 11:39 AM

some mathematical thinking to make it easy:

data have;
input ID Store$ (Month1-Month8)($);
infile datalines dsd missover;
datalines;
1,A,Level1,Level1,Level1,Level1,,,,
2,A,Level1,Level1,,,,,,
3,B,Level1,Level1,,,Level1,,Level1,Level1
4,C,Level1,Level1,Level2,Level2,Level2,,,
5,B,Level1,,,,,,,
6,B,Level1,,Level1,Level1,Level3,Level3,Level1,Level1
7,A,Level1,Level1,,,,,,
8,A,Level1,Level1,Level1,Level1,Level1,,,
9,D,Level1,Level4,Level4,Level4,Level4,Level4,Level4,
10,D,Level1,Level1,Level1,,,,,
11,E,Level1,,,,,,,
12,E,Level1,Level1,Level1,Level1,Level1,Level1,Level1,
13,D,Level1,,Level1,,,,,
14,B,Level1,Level1,Level1,,Level3,Level3,Level3,Level3
15,A,Level1,Level1,Level1,Level1,Level1,,,
16,D,Level1,Level1,,,,,,
17,B,Level1,Level1,Level1,,,,,
18,E,Level1,Level1,Level1,,,,,
19,F,Level1,Level1,Level1,Level1,Level1,,,
20,D,Level1,Level1,Level1,Level1,Level1,,,
21,A,Level1,Level1,,Level1,Level1,Level1,Level1,Level1
22,A,Level1,Level1,Level1,Level1,Level1,Level1,Level1,
23,D,Level1,Level1,Level1,Level1,Level1,Level1,Level2,
24,E,Level1,Level1,,,,,,
;

data want;
set have;
array t(*) month:;
k=whichc(' ', of t(*));
if k>1 then do;
do _n_=dim(t) by -1 to k+1;
if not missing(t(_n_)) then do; _t=1;leave;end;
end;
end;
if _t then delete;
drop _t k;
run;

d0816 · Posted 01-17-2018 12:09 PM

Hi, got the following warning message when running the code:

data want;
set have;
array t(*) month:;

WARNING: Defining an array with zero elements
k=whichc(' ', of t(*));
if k>1 then do; do _n_=dim(t) by -1 to k+1;
if not missing(t(_n_)) then do; _t=1;leave;end;
end;
end;
if _t then delete;
drop _t;
run;

NOTE: Numeric values have been converted to character values at the places given by:

(Line):(Column).

420:18

NOTE: The SAS System stopped processing this step because of errors.

WARNING: The data set WANT may be incomplete. When this step was stopped there

were 0 observations and 11 variables.

WARNING: Data set WANT was not replaced because this step was stopped.

NOTE: DATA statement used (Total process time):

real time 0.03 seconds

cpu time 0.03 seconds

novinosrin · Posted 01-17-2018 12:13 PM

what source dataset are you testing against.

I tested again for you, here is my log

3952 data have;
3953 input ID Store$ (Month1-Month8)($);
3954 infile datalines dsd missover;
3955 datalines;

NOTE: The data set WORK.HAVE has 24 observations and 10 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

3980 ;
3981
3982 data want;
3983 set have;
3984 array t(*) month:;
3985 k=whichc(' ', of t(*));
3986 if k>1 then do;
3987 do _n_=dim(t) by -1 to k+1;
3988 if not missing(t(_n_)) then do; _t=1;leave;end;
3989 end;
3990 end;
3991 if _t then delete;
3992 drop _t;
3993 run;

NOTE: There were 24 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 19 observations and 11 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds

This is my output:

ID	Store	Month1	Month2	Month3	Month4	Month5	Month6	Month7	Month8
1	A	Level1	Level1	Level1	Level1
2	A	Level1	Level1
4	C	Level1	Level1	Level2	Level2	Level2
5	B	Level1
7	A	Level1	Level1
8	A	Level1	Level1	Level1	Level1	Level1
9	D	Level1	Level4	Level4	Level4	Level4	Level4	Level4
10	D	Level1	Level1	Level1
11	E	Level1
12	E	Level1	Level1	Level1	Level1	Level1	Level1	Level1
15	A	Level1	Level1	Level1	Level1	Level1
16	D	Level1	Level1
17	B	Level1	Level1	Level1
18	E	Level1	Level1	Level1
19	F	Level1	Level1	Level1	Level1	Level1
20	D	Level1	Level1	Level1	Level1	Level1
22	A	Level1	Level1	Level1	Level1	Level1	Level1	Level1
23	D	Level1	Level1	Level1	Level1	Level1	Level1	Level2
24	E	Level1	Level1

d0816 · Posted 01-17-2018 12:33 PM

It worked for the first part of my original question. Thanks a lot.

kiranv_ · Posted 01-17-2018 12:39 PM

did you try my answer, I think it works for 2nd part of question too

d0816 · Posted 01-17-2018 01:02 PM

I don't know why my reply to your post is not being visible here. did you get my reply. I did it 3 times already.

novinosrin · Posted 01-17-2018 12:41 PM

Ok sorry, let me help you with the 2nd part ina bit. I didn't see it. too much coffee caffeine effect 😞

Your wrote:

"In the new dataset created I would like to keep only those with value as Level1 and consecutive missing values i.e. Delete ID 9 and 23."

can you explain more clearly as there many records with level1 and other levels? please

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away