Hi,
Have the following problem, I want to count the number of successive missing values in a row, only in specific lines, also I want to know when the missing values occur at the beginning or at the end of the sequence. Take as an example the following code:
data example;
input id var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12;
cards;
A 1 2 3 . . . . . 1 1 1 3
B 3 3 2 1 3 2 1 . . . . .
C . . . . 1 2 3 1 2 3 2 .
D 3 . 1 . 3 . 1 . 3 . 1 .
F 1 3 . . 1 3 . . 1 3 . .
E 3 2 1 . . . . . 1 1 1 3
G 3 3 2 1 3 2 1 . . . . .
H . . . . . 1 2 3 1 2 3 2
I 3 . 1 . 3 . 1 . 3 . 1 .
J A E . . A E . . A E . .
;
In particular, I want to know the successive missing values in B,C,G and F.
Thank you
Here you go
data have;
input id $2. var1 $2. var2 $2. var3 $2. var4 $2. var5 $2. var6 $2. var7 $2. var8 $2. var9 $2. var10 $2. var11 $2. var12 $2.;
cards;
A 1 2 3 . . . . . 1 1 1 3
B 3 3 2 1 3 2 1 . . . . .
C . . . . 1 2 3 1 2 3 2 .
D 3 . 1 . 3 . 1 . 3 . 1 .
F 1 3 . . 1 3 . . 1 3 . .
E 3 2 1 . . . . . 1 1 1 3
G 3 3 2 1 3 2 1 . . . . .
H . . . . . 1 2 3 1 2 3 2
I 3 . 1 . 3 . 1 . 3 . 1 .
J A E . . A E . . A E . .
;
data want(keep=id miss_lead miss_trail);
set have;
miss_lead=0;
miss_trail=0;
array v {*} var1-var12;
do i=1 to dim(v);
if missing(v[i]) and miss_lead+1=i then miss_lead=i;
if missing(v[dim(v)-i+1]) and miss_trail+1=i then miss_trail=i;
end;
run;
Why B, C, G and F? Is that arbitrary?
I edited your data set a bit since it doesn't quite work as stated.
data example;
input id $2. var1 $2. var2 $2. var3 $2. var4 $2. var5 $2. var6 $2. var7 $2. var8 $2. var9 $2. var10 $2. var11 $2. var12 $2.;
cards;
A 1 2 3 . . . . . 1 1 1 3
B 3 3 2 1 3 2 1 . . . . .
C . . . . 1 2 3 1 2 3 2 .
D 3 . 1 . 3 . 1 . 3 . 1 .
F 1 3 . . 1 3 . . 1 3 . .
E 3 2 1 . . . . . 1 1 1 3
G 3 3 2 1 3 2 1 . . . . .
H . . . . . 1 2 3 1 2 3 2
I 3 . 1 . 3 . 1 . 3 . 1 .
J A E . . A E . . A E . .
;
run;
proc sort data=example; by id; run;
proc transpose data=example out=texample;
var var1-var12;
by id;
run;
data texample2;
set texample;
by id;
if first.id then misscount=1*(col1='');
else do;
if col1='' then misscount+1;
if col1 ne '' then misscount=0;
end;
run;
proc tabulate data=texample2;
class id;
var misscount;
tables id='', misscount*(max=''*f=6.0) / box='id';
run;
This provides the number of consecutively missing observations for each id.
id misscount A 5 B 5 C 4 D 1 E 5 F 2 G 5 H 5 I 1 J 2
Is this close to what you want?
@cminard
The result I want is:
id misscount
A 0
B 0
C 4
D 0
E 0
F 0
G 5
H 5
I 0
J 0
I want to count the initial missing values
and
id misscount
A 0
B 5
C 1
D 1
E 2
F 0
G 5
H 0
I 1
J 2
I want to count the last missing values.
Why is misscount = 1 for ID = "H"?
Ok. The whole thing can be done like this then.
data want(keep = id misscount);
set example;
array v var12 - var1;
misscount = 0;
do over v;
if not missing(v) then leave;
misscount + 1;
end;
run;
Result:
id misscount A 0 B 5 C 1 D 1 F 2 E 0 G 5 H 0 I 1 J 2
Here you go
data have;
input id $2. var1 $2. var2 $2. var3 $2. var4 $2. var5 $2. var6 $2. var7 $2. var8 $2. var9 $2. var10 $2. var11 $2. var12 $2.;
cards;
A 1 2 3 . . . . . 1 1 1 3
B 3 3 2 1 3 2 1 . . . . .
C . . . . 1 2 3 1 2 3 2 .
D 3 . 1 . 3 . 1 . 3 . 1 .
F 1 3 . . 1 3 . . 1 3 . .
E 3 2 1 . . . . . 1 1 1 3
G 3 3 2 1 3 2 1 . . . . .
H . . . . . 1 2 3 1 2 3 2
I 3 . 1 . 3 . 1 . 3 . 1 .
J A E . . A E . . A E . .
;
data want(keep=id miss_lead miss_trail);
set have;
miss_lead=0;
miss_trail=0;
array v {*} var1-var12;
do i=1 to dim(v);
if missing(v[i]) and miss_lead+1=i then miss_lead=i;
if missing(v[dim(v)-i+1]) and miss_trail+1=i then miss_trail=i;
end;
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.