BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
d0816
Quartz | Level 8

HI, In the dataset below, The dataset has values Level1, Level2, Level3, Level4 and Blank for the variables named MONTH.  I would like to keep only rows with one or more values missing but missing has to be consecutive. i.e. Delete ID 3,6,13,14,21 and keep the rest.

 

In the new dataset created I would like to keep only those with value as Level1 and consecutive missing values i.e. Delete ID 9 and 23.

Here is the sample of my dataset.

IDStoreMonth 1Month 2Month 3Month 4Month 5Month 6Month 7Month 8
1ALevel1Level1Level1Level1    
2ALevel1Level1      
3BLevel1Level1  Level1 Level1Level1
4CLevel1Level1Level2Level2Level2   
5BLevel1       
6BLevel1 Level1Level1Level3Level3Level1Level1
7ALevel1Level1      
8ALevel1Level1Level1Level1Level1   
9DLevel1Level4Level4Level4Level4Level4Level4 
10DLevel1Level1Level1     
11ELevel1       
12ELevel1Level1Level1Level1Level1Level1Level1 
13DLevel1 Level1     
14BLevel1Level1Level1 Level3Level3Level3Level3
15ALevel1Level1Level1Level1Level1   
16DLevel1Level1      
17BLevel1Level1Level1     
18ELevel1Level1Level1     
19FLevel1Level1Level1Level1Level1   
20DLevel1Level1Level1Level1Level1   
21ALevel1Level1 Level1Level1Level1Level1Level1
22ALevel1Level1Level1Level1Level1Level1Level1 
23DLevel1Level1Level1Level1Level1Level1Level2 
24ELevel1Level1      
1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

what source dataset are you testing against. 

I tested again for you, here is my log

 


3952 data have;
3953 input ID Store$ (Month1-Month8)($);
3954 infile datalines dsd missover;
3955 datalines;

NOTE: The data set WORK.HAVE has 24 observations and 10 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds


3980 ;
3981
3982 data want;
3983 set have;
3984 array t(*) month:;
3985 k=whichc(' ', of t(*));
3986 if k>1 then do;
3987 do _n_=dim(t) by -1 to k+1;
3988 if not missing(t(_n_)) then do; _t=1;leave;end;
3989 end;
3990 end;
3991 if _t then delete;
3992 drop _t;
3993 run;

NOTE: There were 24 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 19 observations and 11 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds

 

 

This is my output:

IDStoreMonth1Month2Month3Month4Month5Month6Month7Month8
1ALevel1Level1Level1Level1    
2ALevel1Level1      
4CLevel1Level1Level2Level2Level2   
5BLevel1       
7ALevel1Level1      
8ALevel1Level1Level1Level1Level1   
9DLevel1Level4Level4Level4Level4Level4Level4 
10DLevel1Level1Level1     
11ELevel1       
12ELevel1Level1Level1Level1Level1Level1Level1 
15ALevel1Level1Level1Level1Level1   
16DLevel1Level1      
17BLevel1Level1Level1     
18ELevel1Level1Level1     
19FLevel1Level1Level1Level1Level1   
20DLevel1Level1Level1Level1Level1   
22ALevel1Level1Level1Level1Level1Level1Level1 
23DLevel1Level1Level1Level1Level1Level1Level2 
24ELevel1Level1      

View solution in original post

27 REPLIES 27
PeterClemmensen
Tourmaline | Level 20

Just to understand your logic, why is ID=2 not deleted? It has a consecutive missing value for Month 5?

d0816
Quartz | Level 8

If variables MONTH has a value and then has a consecutive missing values then I want to keep it so ID 2 is a keep and so is 1,4,5 and so on.

PeterClemmensen
Tourmaline | Level 20

Ok. I think you are looking for something like this

 

data have;
input ID Store$ (Month1-Month8)($);
infile datalines dsd missover;
datalines;
1,A,Level1,Level1,Level1,Level1,,,,
2,A,Level1,Level1,,,,,,
3,B,Level1,Level1,,,Level1,,Level1,Level1
4,C,Level1,Level1,Level2,Level2,Level2,,,
5,B,Level1,,,,,,,
6,B,Level1,,Level1,Level1,Level3,Level3,Level1,Level1
7,A,Level1,Level1,,,,,,
8,A,Level1,Level1,Level1,Level1,Level1,,,
9,D,Level1,Level4,Level4,Level4,Level4,Level4,Level4,
10,D,Level1,Level1,Level1,,,,,
11,E,Level1,,,,,,,
12,E,Level1,Level1,Level1,Level1,Level1,Level1,Level1,
13,D,Level1,,Level1,,,,,
14,B,Level1,Level1,Level1,,Level3,Level3,Level3,Level3
15,A,Level1,Level1,Level1,Level1,Level1,,,
16,D,Level1,Level1,,,,,,
17,B,Level1,Level1,Level1,,,,,
18,E,Level1,Level1,Level1,,,,,
19,F,Level1,Level1,Level1,Level1,Level1,,,
20,D,Level1,Level1,Level1,Level1,Level1,,,
21,A,Level1,Level1,,Level1,Level1,Level1,Level1,Level1
22,A,Level1,Level1,Level1,Level1,Level1,Level1,Level1,
23,D,Level1,Level1,Level1,Level1,Level1,Level1,Level2,
24,E,Level1,Level1,,,,,,
;

data want(drop=i);
   set have;
   array months{*} Month1-Month8;
   do i=1 to dim(months)-1;
      if (months[i] = '' and months[i+1] ne '') then delete;
   end;
run;
d0816
Quartz | Level 8

HI, Thanks so much.

I think this is working but the problem I found is that in my original dataset I have 947 Obs.

When I run your code below, I got 842 Obs which is fine.

data want(drop=i);
   set have;
   array months{*} Month1-Month8; do i=1 to dim(months)-1;
      if (months[i] = '' and months[i+1] ne '') then delete;
   end;
run;

 

BUT then I run following code, to check the deleted obs, but I get 107 Obs. 107 + 842 does not total 947 obs.

data want(drop=i);
   set have;
   array months{*} Month1-Month8; do i=1 to dim(months)-1;
      if (months[i] = '' and months[i+1] ne '') then Output;
   end;
run;

PeterClemmensen
Tourmaline | Level 20

@d0816, that is because the two are not additive. The contition

 

if (months[i] = '' and months[i+1] ne '')

can be true more than once for the same observation. However, you can delete an observation only once (Your first data step), while you can output an observation more than once (Your second data step).

 

You can check this by looking at the data created by your second data step. I suspect than you have multiple lines with the same ID.

d0816
Quartz | Level 8

The solution works for the first part of my original question. Solution by novisnorin also worked for the first part of my question. Thanks a lot.

kiranv_
Rhodochrosite | Level 12

something like this 

data abc;
infile datalines missover dlm ='09'x;
input 
ID	1-2 Store $ 3-4	Month1 $	Month2 $	Month3	$ Month4 $	Month5 $ Month6	$ Month7 $ Month8 $;
datalines;
1	A	Level1	Level1	Level1	Level1	 	 	 	 
2	A	Level1	Level1	 	 	 	 	 	 
3	B	Level1	Level1	 	 	Level1	 	Level1	Level1
4	C	Level1	Level1	Level2	Level2	Level2	 	 	 
5	B	Level1	 	 	 	 	 	 	 
6	B	Level1	 	Level1	Level1	Level3	Level3	Level1	Level1
7	A	Level1	Level1	 	 	 	 	 	 
8	 	Level1	Level1	Level1	Level1	Level1	 	 	 
9	D	Level1	Level4	Level4	Level4	Level4	Level4	Level4	 
10	D	Level1	Level1	Level1	 	 	 	 	 
11	E	Level1	 	 	 	 	 	 	 
12	E	Level1	Level1	Level1	Level1	Level1	Level1	Level1	 
13	D	Level1	 	Level1	 	 	 	 	 
14	B	Level1	Level1	Level1	 	Level3	Level3	Level3	Level3
15	A	Level1	Level1	Level1	Level1	Level1	 	 	 
16	D	Level1	Level1	 	 	 	 	 	 
17	B	Level1	Level1	Level1	 	 	 	 	 
18	E	Level1	Level1	Level1	 	 	 	 	 
19	F	Level1	Level1	Level1	Level1	Level1	 	 	 
20	D	Level1	Level1	Level1	Level1	Level1	 	 	 
21	A	Level1	Level1	 	Level1	Level1	Level1	Level1	Level1
22	A	Level1	Level1	Level1	Level1	Level1	Level1	Level1	 
23	D	Level1	Level1	Level1	Level1	Level1	Level1	Level2	 
24	E	Level1	Level1	 	
; 	


data abss;
length newval $ 100;
set abc;
tot= 0;
array varsnew (8) $ newmonth1-newmonth8;
array vars (8) $ month1-month8;
do i = 1 to 8;
if vars{i} = '' then varsnew{i} = '-';
else varsnew{i} =vars{i};
if vars{i} not in ('Level1') then tot=tot+1;
else tot=tot+0;
end;
newval= cats(of newmonth1-newmonth8);
if index(newval,'--' )> 0 or tot >1 then delete;
drop i newmonth1-newmonth8 newval tot;
run;
novinosrin
Tourmaline | Level 20

some mathematical thinking to make it easy:

 

data have;
input ID Store$ (Month1-Month8)($);
infile datalines dsd missover;
datalines;
1,A,Level1,Level1,Level1,Level1,,,,
2,A,Level1,Level1,,,,,,
3,B,Level1,Level1,,,Level1,,Level1,Level1
4,C,Level1,Level1,Level2,Level2,Level2,,,
5,B,Level1,,,,,,,
6,B,Level1,,Level1,Level1,Level3,Level3,Level1,Level1
7,A,Level1,Level1,,,,,,
8,A,Level1,Level1,Level1,Level1,Level1,,,
9,D,Level1,Level4,Level4,Level4,Level4,Level4,Level4,
10,D,Level1,Level1,Level1,,,,,
11,E,Level1,,,,,,,
12,E,Level1,Level1,Level1,Level1,Level1,Level1,Level1,
13,D,Level1,,Level1,,,,,
14,B,Level1,Level1,Level1,,Level3,Level3,Level3,Level3
15,A,Level1,Level1,Level1,Level1,Level1,,,
16,D,Level1,Level1,,,,,,
17,B,Level1,Level1,Level1,,,,,
18,E,Level1,Level1,Level1,,,,,
19,F,Level1,Level1,Level1,Level1,Level1,,,
20,D,Level1,Level1,Level1,Level1,Level1,,,
21,A,Level1,Level1,,Level1,Level1,Level1,Level1,Level1
22,A,Level1,Level1,Level1,Level1,Level1,Level1,Level1,
23,D,Level1,Level1,Level1,Level1,Level1,Level1,Level2,
24,E,Level1,Level1,,,,,,
;

data want;
set have;
array t(*) month:;
k=whichc(' ', of t(*));
if k>1 then do;
do _n_=dim(t) by -1 to k+1;
if not missing(t(_n_)) then do; _t=1;leave;end;
end;
end;
if _t then delete;
drop _t k;
run;
d0816
Quartz | Level 8

Hi, got the following warning message when running the code:

 

data want;
set have;
array t(*) month:;

WARNING: Defining an array with zero elements
k=whichc(' ', of t(*));
if k>1 then do; do _n_=dim(t) by -1 to k+1;
if not missing(t(_n_)) then do; _t=1;leave;end;
end;
end;
if _t then delete;
drop _t;
run;

NOTE: Numeric values have been converted to character values at the places given by:

(Line):(Column).

420:18

NOTE: The SAS System stopped processing this step because of errors.

WARNING: The data set WANT may be incomplete. When this step was stopped there

were 0 observations and 11 variables.

WARNING: Data set WANT was not replaced because this step was stopped.

NOTE: DATA statement used (Total process time):

real time 0.03 seconds

cpu time 0.03 seconds

novinosrin
Tourmaline | Level 20

what source dataset are you testing against. 

I tested again for you, here is my log

 


3952 data have;
3953 input ID Store$ (Month1-Month8)($);
3954 infile datalines dsd missover;
3955 datalines;

NOTE: The data set WORK.HAVE has 24 observations and 10 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds


3980 ;
3981
3982 data want;
3983 set have;
3984 array t(*) month:;
3985 k=whichc(' ', of t(*));
3986 if k>1 then do;
3987 do _n_=dim(t) by -1 to k+1;
3988 if not missing(t(_n_)) then do; _t=1;leave;end;
3989 end;
3990 end;
3991 if _t then delete;
3992 drop _t;
3993 run;

NOTE: There were 24 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 19 observations and 11 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds

 

 

This is my output:

IDStoreMonth1Month2Month3Month4Month5Month6Month7Month8
1ALevel1Level1Level1Level1    
2ALevel1Level1      
4CLevel1Level1Level2Level2Level2   
5BLevel1       
7ALevel1Level1      
8ALevel1Level1Level1Level1Level1   
9DLevel1Level4Level4Level4Level4Level4Level4 
10DLevel1Level1Level1     
11ELevel1       
12ELevel1Level1Level1Level1Level1Level1Level1 
15ALevel1Level1Level1Level1Level1   
16DLevel1Level1      
17BLevel1Level1Level1     
18ELevel1Level1Level1     
19FLevel1Level1Level1Level1Level1   
20DLevel1Level1Level1Level1Level1   
22ALevel1Level1Level1Level1Level1Level1Level1 
23DLevel1Level1Level1Level1Level1Level1Level2 
24ELevel1Level1      
d0816
Quartz | Level 8

It worked for the first part of my original question. Thanks a lot.

kiranv_
Rhodochrosite | Level 12

did you try my answer, I think it works for 2nd part of question too

d0816
Quartz | Level 8

I don't know why my reply to your post is not being visible here. did you get my reply. I did it 3 times already.

novinosrin
Tourmaline | Level 20

Ok sorry, let me help you with the 2nd part ina bit. I didn't see it. too much coffee caffeine effect 😞

 

Your wrote:

"In the new dataset created I would like to keep only those with value as Level1 and consecutive missing values i.e. Delete ID 9 and 23."

can you explain more clearly as there many records with level1 and other levels? please

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 27 replies
  • 1987 views
  • 1 like
  • 4 in conversation