Hi Everyone,
My data has id and value.
For each row[i], I want to check the window from [i] to [i+5]. I want to find the first non-missing value, say 1, and count how many 1 before -1 shows up.
If the first non-missing value is -1, I want to count how many -1 before 1 show up.
To distinguish between 1 series and -1 series, the count should have + sign for 1 and - sign for -1.
After a lot of work, I get the code run correctly but Separately for +1 and -1 and then combine the 2 counts.
The logic is simply counting and very straight forward as you can see.
My issue is how to combine the 2 parts in 1 do loop.
Can you please help?
Many thanks,
HHC
data have;
input id value;
datalines;
1 1
1 .
1 1
1 .
1 1
1 .
1 .
1 .
1 .
1 .
1 .
1 -1
1 .
1 -1
1 .
1 .
1 1
1 1
2 -1
2 .
2 1
2 .
2 1
;
/*Working on positive value*/
data want;
set have nobs=nobs;
drop i j_: out_:;
out_i=0;
pos_count=0;
i+1;
DO j=i to i+5 until (j=nobs or out_j=1);
set have (keep = id value rename=(id=j_id value=j_value)) point=j;
if id^=j_id then leave;
if j_value=1 then pos_count=pos_count+1;
else
if j_value=-1 then leave;
END;
run;
/*Working on Negative value*/
data want;
set want nobs=nobs;
drop i j_: out_:;
out_i=0;
neg_count=0;
i+1;
DO j=i to i+5 until (j=nobs or out_j=1);
set want (keep = id value rename=(id=j_id value=j_value)) point=j;
if id^=j_id then leave;
if j_value=-1 then neg_count=neg_count-1;
else
if j_value=1 then leave;
END;
run;
/*Combine*/
data want; set want;
final_count = pos_count+neg_count;
run;
Finally, I got it!!!
The 2 issues I fixed which is quite difficult to see is:
1- To get data for all rows, I need a quite "stupid" and "redundant" step: Do j=i;
2- Once value=1 is found, put an out statement so that the outer loop will not run again (to avoid the double/triple counts)
If you could, please help to make my code more efficient or suggest other solution.
Many many thanks.
HHC
data want;
set have nobs=nobs;
drop i j k l k_: l_: out_:;
i+1;
DO J=i;
out_k=0;
count=0;
Do k=j to j+5 until (k=nobs or out_k=1);
set have (keep = id value rename=(id=k_id value=k_value)) point=k;
IF id^=k_id then leave;
/*positive section----*/
ELSE
IF k_value=1 then do;
out_k=1;/*critical step: once k_value=1, it is the only iteration for k (no more run on k)*/
DO l=k to j+5 until (l=nobs);
set have (keep = id value rename=(id=l_id value=l_value)) point=l;
if id^=l_id then leave;
else
if l_value>0 then count=count+1;
else
if l_value=-1 then leave;
End;
End;
/*negative section----*/
ELSE
IF k_value=-1 then do;
out_k=1;/*critical step: once k_value=1, it is the only iteration for k (no more run on k)*/
DO l=k to j+5 until (l=nobs);
set have (keep = id value rename=(id=l_id value=l_value)) point=l;
if id^=l_id then leave;
else
if l_value=-1 then count=count-1;
else
if l_value=1 then leave;
End;
End;
end;
END;
run;
Finally, I got it!!!
The 2 issues I fixed which is quite difficult to see is:
1- To get data for all rows, I need a quite "stupid" and "redundant" step: Do j=i;
2- Once value=1 is found, put an out statement so that the outer loop will not run again (to avoid the double/triple counts)
If you could, please help to make my code more efficient or suggest other solution.
Many many thanks.
HHC
data want;
set have nobs=nobs;
drop i j k l k_: l_: out_:;
i+1;
DO J=i;
out_k=0;
count=0;
Do k=j to j+5 until (k=nobs or out_k=1);
set have (keep = id value rename=(id=k_id value=k_value)) point=k;
IF id^=k_id then leave;
/*positive section----*/
ELSE
IF k_value=1 then do;
out_k=1;/*critical step: once k_value=1, it is the only iteration for k (no more run on k)*/
DO l=k to j+5 until (l=nobs);
set have (keep = id value rename=(id=l_id value=l_value)) point=l;
if id^=l_id then leave;
else
if l_value>0 then count=count+1;
else
if l_value=-1 then leave;
End;
End;
/*negative section----*/
ELSE
IF k_value=-1 then do;
out_k=1;/*critical step: once k_value=1, it is the only iteration for k (no more run on k)*/
DO l=k to j+5 until (l=nobs);
set have (keep = id value rename=(id=l_id value=l_value)) point=l;
if id^=l_id then leave;
else
if l_value=-1 then count=count-1;
else
if l_value=1 then leave;
End;
End;
end;
END;
run;
I'm posting the same response I gave to your virtually identical topic. It does not require separate sets of codes for -1 vs +1:
data have;
input id value @@;
datalines;
1 1 1 . 1 1 1 . 1 1 1 . 1 . 1 .
1 . 1 . 1 . 1 -1 1 . 1 -1 1 . 1 .
1 1 1 1
2 -1 2 . 2 1 2 . 2 1
;
data want (drop=_: i);
merge have (firstobs=1)
have (firstobs=2 keep=id value rename=(id=_id2 value=_val2))
have (firstobs=3 keep=id value rename=(id=_id3 value=_val3))
have (firstobs=4 keep=id value rename=(id=_id4 value=_val4))
have (firstobs=5 keep=id value rename=(id=_id5 value=_val5))
have (firstobs=6 keep=id value rename=(id=_id6 value=_val6)) ;
array _x {*} id _id: ;
array _v {*} value _val: ;
count=0;
do i=1 to dim(_x) while(id=_x{i});
if _nonmiss1=. then _nonmiss1=_v{i};
else if _v{i}=. then continue;
else if _v{i}=_nonmiss1 then count=count+sign(_nonmiss1);
else if sign(_v{i})=-1*sign(_nonmiss1) then leave;
end;
run;
Hi mkeitz,
I run your code and for row 4 and 5, the count is 0. They suppose to be 1.
Since I will have multiple windows to check, such as 5, 50 or 60. I am not sure if this code is easy to change.
Thank you so much for helping.
HHC
Use the POINT= option to "lookahead".
You will need to know how many observations there are for each ID to make sure you don't look TOO far.
So given this data:
data have;
input id value @@;
datalines;
1 1 1 . 1 1 1 . 1 1 1 . 1 . 1 .
1 . 1 . 1 . 1 -1 1 . 1 -1 1 . 1 .
1 1 1 1
2 -1 2 . 2 1 2 . 2 1
;
With a window of 5 observations (counting the current non-missing value).
%let window=5;
data want;
length id value nfound firstobs nobs 8;
keep id -- nobs;
do nobs=1 by 1 until(last.id);
set have;
by id;
end;
do row=1 by 1 until(last.id);
set have;
by id;
obsnum+1;
if row=1 then firstobs=obsnum;
if not missing(value) then do;
nfound=1;
do p=obsnum+1 to min(obsnum+&window-1,firstobs+nobs-1) until(newvalue=-value);
set have(keep=value rename=(value=newvalue)) point=p;
nfound + (value=newvalue);
end;
output;
end;
end;
run;
You get these counts.
Obs id value nfound firstobs nobs 1 1 1 3 1 18 2 1 1 2 1 18 3 1 1 1 1 18 4 1 -1 2 1 18 5 1 -1 1 1 18 6 1 1 2 1 18 7 1 1 1 1 18 8 2 -1 1 19 5 9 2 1 2 19 5 10 2 1 1 19 5
Thank you as always, Tom.
HHC
Have not reviewed/replied SAS programming threads for fun in recent years. This one is an interesting fun topic to practice all basic DATA step statement elements.
data have;
input id value;
datalines;
1 1
1 .
1 1
1 .
1 1
1 .
1 .
1 .
1 .
1 .
1 .
1 -1
1 .
1 -1
1 .
1 .
1 1
1 1
2 -1
2 .
2 1
2 .
2 1
;
data ahuige(keep=id value count);
do obsnum=1 to last;
set have nobs=last;
count=sum(0,sign(value));
over=0;
do point=obsnum+1 to obsnum+5;
if point<=last and not over then
do;
set have(rename=(id=nowId value=nowValue)) point=point;
over=((id ne nowId) or sign(NowValue)*sign(count)=-1 );
if not over then count=sum(count,sign(NowValue)) ;
end;
end;
output;
end;stop;
run;
proc print;run;;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.