Hi,
Have the following problem, I want to count the number of successive missing values in a row, only in specific lines, also I want to know when the missing values occur at the beginning or at the end of the sequence. Take as an example the following code:
data example;
input id var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12;
cards;
A 1 2 3 . . . . . 1 1 1 3
B 3 3 2 1 3 2 1 . . . . .
C . . . . 1 2 3 1 2 3 2 .
D 3 . 1 . 3 . 1 . 3 . 1 .
F 1 3 . . 1 3 . . 1 3 . .
E 3 2 1 . . . . . 1 1 1 3
G 3 3 2 1 3 2 1 . . . . .
H . . . . . 1 2 3 1 2 3 2
I 3 . 1 . 3 . 1 . 3 . 1 .
J A E . . A E . . A E . .
;
In particular, I want to know the successive missing values in B,C,G and F.
Thank you
Here you go
data have;
input id $2. var1 $2. var2 $2. var3 $2. var4 $2. var5 $2. var6 $2. var7 $2. var8 $2. var9 $2. var10 $2. var11 $2. var12 $2.;
cards;
A 1 2 3 . . . . . 1 1 1 3
B 3 3 2 1 3 2 1 . . . . .
C . . . . 1 2 3 1 2 3 2 .
D 3 . 1 . 3 . 1 . 3 . 1 .
F 1 3 . . 1 3 . . 1 3 . .
E 3 2 1 . . . . . 1 1 1 3
G 3 3 2 1 3 2 1 . . . . .
H . . . . . 1 2 3 1 2 3 2
I 3 . 1 . 3 . 1 . 3 . 1 .
J A E . . A E . . A E . .
;
data want(keep=id miss_lead miss_trail);
set have;
miss_lead=0;
miss_trail=0;
array v {*} var1-var12;
do i=1 to dim(v);
if missing(v[i]) and miss_lead+1=i then miss_lead=i;
if missing(v[dim(v)-i+1]) and miss_trail+1=i then miss_trail=i;
end;
run;
Why B, C, G and F? Is that arbitrary?
I edited your data set a bit since it doesn't quite work as stated.
data example;
input id $2. var1 $2. var2 $2. var3 $2. var4 $2. var5 $2. var6 $2. var7 $2. var8 $2. var9 $2. var10 $2. var11 $2. var12 $2.;
cards;
A 1 2 3 . . . . . 1 1 1 3
B 3 3 2 1 3 2 1 . . . . .
C . . . . 1 2 3 1 2 3 2 .
D 3 . 1 . 3 . 1 . 3 . 1 .
F 1 3 . . 1 3 . . 1 3 . .
E 3 2 1 . . . . . 1 1 1 3
G 3 3 2 1 3 2 1 . . . . .
H . . . . . 1 2 3 1 2 3 2
I 3 . 1 . 3 . 1 . 3 . 1 .
J A E . . A E . . A E . .
;
run;
proc sort data=example; by id; run;
proc transpose data=example out=texample;
var var1-var12;
by id;
run;
data texample2;
set texample;
by id;
if first.id then misscount=1*(col1='');
else do;
if col1='' then misscount+1;
if col1 ne '' then misscount=0;
end;
run;
proc tabulate data=texample2;
class id;
var misscount;
tables id='', misscount*(max=''*f=6.0) / box='id';
run;
This provides the number of consecutively missing observations for each id.
id misscount A 5 B 5 C 4 D 1 E 5 F 2 G 5 H 5 I 1 J 2
Is this close to what you want?
@cminard
The result I want is:
id misscount
A 0
B 0
C 4
D 0
E 0
F 0
G 5
H 5
I 0
J 0
I want to count the initial missing values
and
id misscount
A 0
B 5
C 1
D 1
E 2
F 0
G 5
H 0
I 1
J 2
I want to count the last missing values.
Why is misscount = 1 for ID = "H"?
Ok. The whole thing can be done like this then.
data want(keep = id misscount);
set example;
array v var12 - var1;
misscount = 0;
do over v;
if not missing(v) then leave;
misscount + 1;
end;
run;
Result:
id misscount A 0 B 5 C 1 D 1 F 2 E 0 G 5 H 0 I 1 J 2
Here you go
data have;
input id $2. var1 $2. var2 $2. var3 $2. var4 $2. var5 $2. var6 $2. var7 $2. var8 $2. var9 $2. var10 $2. var11 $2. var12 $2.;
cards;
A 1 2 3 . . . . . 1 1 1 3
B 3 3 2 1 3 2 1 . . . . .
C . . . . 1 2 3 1 2 3 2 .
D 3 . 1 . 3 . 1 . 3 . 1 .
F 1 3 . . 1 3 . . 1 3 . .
E 3 2 1 . . . . . 1 1 1 3
G 3 3 2 1 3 2 1 . . . . .
H . . . . . 1 2 3 1 2 3 2
I 3 . 1 . 3 . 1 . 3 . 1 .
J A E . . A E . . A E . .
;
data want(keep=id miss_lead miss_trail);
set have;
miss_lead=0;
miss_trail=0;
array v {*} var1-var12;
do i=1 to dim(v);
if missing(v[i]) and miss_lead+1=i then miss_lead=i;
if missing(v[dim(v)-i+1]) and miss_trail+1=i then miss_trail=i;
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.