Solved: Re: Deleting rows with consecutive missing values - Page 2

d0816 · Posted 01-17-2018 12:48 PM

Can you please see above reply to kiranv?

novinosrin · Posted 01-17-2018 01:01 PM

@d0816 Where is the reply to Kiranv?

Anyway, if you say keep only those with "value as Level1 and consecutive missing values",. shouldn't 4 be deleted too?

4

C

Level1

Level2

9

D

Level1

Level4

23

D

Level1

Level2

Please clarify

d0816 · Posted 01-17-2018 01:00 PM

Month1	Month2	Month3	Month4	Month5	Month6	Month7	Month8
Level1	Level1	Level1	Level1	Level1	Level1	Level1
Level1	Level1	Level1	Level1	Level1	Level1
Level1	Level1	Level1	Level1	Level1
Level1	Level1	Level1	Level1
Level1	Level1	Level1
Level1	Level1
Level1

HI, for the 2nd part I want dataset like this.

novinosrin · Posted 01-17-2018 01:02 PM

Ok got it. Thank you. Be right back in a bit

d0816 · Posted 01-17-2018 01:09 PM

Also Would like the other half of the dataset too.

novinosrin · Posted 01-17-2018 01:34 PM

data want;
set have;
array t(*) month:;
_k=whichc(' ', of t(*));
if _k>1 then do;
do _n_=dim(t) by -1 to _k+1;
if not missing(t(_n_)) then do; _t=1;leave;end;
end;
end;
do _n_=dim(t) by -1 to 1;
if not missing(t(_n_)) and t(_n_) ne 'Level1' then do; __t=1;leave;end;
end;
if _t or __t then delete;
drop _:;
run;

I will try to see if I can make it really slick. Hope that helps

d0816 · Posted 01-17-2018 01:47 PM

Gave me only 3 variables. For eg. like below. whereas I would like to keep all MONTH variables.

ID

Store

k

1

A

5

2

A

3

3

B

2

4

A

3

5

D

6

6

C

4

7

D

2

8

A

8

9

A

6

10

C

3

11

K

4

12

B

4

13

E

6

14

E

6

15

D

8

16

D

2

novinosrin · Posted 01-17-2018 01:53 PM

My log and result:

4365 data want;
4366 set have;
4367 array t(*) month:;
4368 _k=whichc(' ', of t(*));
4369 if _k>1 then do;
4370 do _n_=dim(t) by -1 to _k+1;
4371 if not missing(t(_n_)) then do; _t=1;leave;end;
4372 end;
4373 end;
4374 do _n_=dim(t) by -1 to 1;
4375 if not missing(t(_n_)) and t(_n_) ne 'Level1' then do; __t=1;leave;end;
4376 end;
4377 if _t or __t then delete;
4378 drop _:;
4379 run;

NOTE: There were 24 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 16 observations and 10 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

My output:

ID	Store	Month1	Month2	Month3	Month4	Month5	Month6	Month7	Month8
1	A	Level1	Level1	Level1	Level1
2	A	Level1	Level1
5	B	Level1
7	A	Level1	Level1
8	A	Level1	Level1	Level1	Level1	Level1
10	D	Level1	Level1	Level1
11	E	Level1
12	E	Level1	Level1	Level1	Level1	Level1	Level1	Level1
15	A	Level1	Level1	Level1	Level1	Level1
16	D	Level1	Level1
17	B	Level1	Level1	Level1
18	E	Level1	Level1	Level1
19	F	Level1	Level1	Level1	Level1	Level1
20	D	Level1	Level1	Level1	Level1	Level1
22	A	Level1	Level1	Level1	Level1	Level1	Level1	Level1
24	E	Level1	Level1

d0816 · Posted 01-18-2018 01:07 PM

Hi,

I am getting the same results with only 3 variables.

My dataset have 11 variables actually, with variable k added from the previous code for the part 1. Could that be the reason why this code is not working for my full dataset?

novinosrin · Posted 01-18-2018 01:17 PM

I tested again, use this HAVE and and check the WANT. If this works for your sample, replicate carefully for your original source

data have;
input ID Store$ (Month1-Month8)($);
infile datalines dsd missover;
datalines;
1,A,Level1,Level1,Level1,Level1,,,,
2,A,Level1,Level1,,,,,,
3,B,Level1,Level1,,,Level1,,Level1,Level1
4,C,Level1,Level1,Level2,Level2,Level2,,,
5,B,Level1,,,,,,,
6,B,Level1,,Level1,Level1,Level3,Level3,Level1,Level1
7,A,Level1,Level1,,,,,,
8,A,Level1,Level1,Level1,Level1,Level1,,,
9,D,Level1,Level4,Level4,Level4,Level4,Level4,Level4,
10,D,Level1,Level1,Level1,,,,,
11,E,Level1,,,,,,,
12,E,Level1,Level1,Level1,Level1,Level1,Level1,Level1,
13,D,Level1,,Level1,,,,,
14,B,Level1,Level1,Level1,,Level3,Level3,Level3,Level3
15,A,Level1,Level1,Level1,Level1,Level1,,,
16,D,Level1,Level1,,,,,,
17,B,Level1,Level1,Level1,,,,,
18,E,Level1,Level1,Level1,,,,,
19,F,Level1,Level1,Level1,Level1,Level1,,,
20,D,Level1,Level1,Level1,Level1,Level1,,,
21,A,Level1,Level1,,Level1,Level1,Level1,Level1,Level1
22,A,Level1,Level1,Level1,Level1,Level1,Level1,Level1,
23,D,Level1,Level1,Level1,Level1,Level1,Level1,Level2,
24,E,Level1,Level1,,,,,,
;

data want;
set have;
array t(*) month:;
_k=whichc(' ', of t(*));
if _k>1 then do;
do _n_=dim(t) by -1 to _k+1;
if not missing(t(_n_)) then do; _t=1;leave;end;
end;
end;
do _n_=dim(t) by -1 to 1;
if not missing(t(_n_)) and t(_n_) ne 'Level1' then do; __t=1;leave;end;
end;
if _t or __t then delete;
drop _:;
run;

d0816 · Posted 01-18-2018 02:52 PM

I tried with the sample dataset like you said for both part 1 and part 2. This is definitely working for the sample dataset that I provided to you. But somehow is not working for the full dataset. In the sample dataset, No. of variables remain the same which is 10 for both Part1 and part 2 code. But for my full dataset, Part 1 code returns me 11 variables (11th variable is k) even after carefully making sure I am using the right dataset.

novinosrin · Posted 01-18-2018 07:06 PM

i use underscore(_k) as a prefix to flag variables to drop them later as you may have noticed in the code. Please make sure there are absolutely no typos at all.

d0816 · Posted 01-22-2018 12:56 PM

HI, The code worked for the part 2 as well. Thank you.

Thank you all who replied.

Registration is open

SAS Training: Just a Click Away