Hi Everyone,
My data has id and value.
For each row[i], I want to check the window from [i] to [i+5]. I want to find the first non-missing value, say 1, and count how many 1 before -1 shows up.
If the first non-missing value is -1, I want to count how many -1 before 1 show up.
To distinguish between 1 series and -1 series, the count should have + sign for 1 and - sign for -1.
After a lot of work, I get the code run correctly but Separately for +1 and -1 and then combine the 2 counts.
The logic is simply counting and very straight forward as you can see.
My issue is how to combine the 2 parts in 1 do loop.
Can you please help?
Many thanks,
HHC
data have;
input id value;
datalines;
1 1
1 .
1 1
1 .
1 1
1 .
1 .
1 .
1 .
1 .
1 .
1 -1
1 .
1 -1
1 .
1 .
1 1
1 1
2 -1
2 .
2 1
2 .
2 1
;
/*Working on positive value*/
data want;
set have nobs=nobs;
drop i j_: out_:;
out_i=0;
pos_count=0;
i+1;
DO j=i to i+5 until (j=nobs or out_j=1);
set have (keep = id value rename=(id=j_id value=j_value)) point=j;
if id^=j_id then leave;
if j_value=1 then pos_count=pos_count+1;
else
if j_value=-1 then leave;
END;
run;
/*Working on Negative value*/
data want;
set want nobs=nobs;
drop i j_: out_:;
out_i=0;
neg_count=0;
i+1;
DO j=i to i+5 until (j=nobs or out_j=1);
set want (keep = id value rename=(id=j_id value=j_value)) point=j;
if id^=j_id then leave;
if j_value=-1 then neg_count=neg_count-1;
else
if j_value=1 then leave;
END;
run;
/*Combine*/
data want; set want;
final_count = pos_count+neg_count;
run;
Finally, I got it!!!
The 2 issues I fixed which is quite difficult to see is:
1- To get data for all rows, I need a quite "stupid" and "redundant" step: Do j=i;
2- Once value=1 is found, put an out statement so that the outer loop will not run again (to avoid the double/triple counts)
If you could, please help to make my code more efficient or suggest other solution.
Many many thanks.
HHC
data want;
set have nobs=nobs;
drop i j k l k_: l_: out_:;
i+1;
DO J=i;
out_k=0;
count=0;
Do k=j to j+5 until (k=nobs or out_k=1);
set have (keep = id value rename=(id=k_id value=k_value)) point=k;
IF id^=k_id then leave;
/*positive section----*/
ELSE
IF k_value=1 then do;
out_k=1;/*critical step: once k_value=1, it is the only iteration for k (no more run on k)*/
DO l=k to j+5 until (l=nobs);
set have (keep = id value rename=(id=l_id value=l_value)) point=l;
if id^=l_id then leave;
else
if l_value>0 then count=count+1;
else
if l_value=-1 then leave;
End;
End;
/*negative section----*/
ELSE
IF k_value=-1 then do;
out_k=1;/*critical step: once k_value=1, it is the only iteration for k (no more run on k)*/
DO l=k to j+5 until (l=nobs);
set have (keep = id value rename=(id=l_id value=l_value)) point=l;
if id^=l_id then leave;
else
if l_value=-1 then count=count-1;
else
if l_value=1 then leave;
End;
End;
end;
END;
run;
Finally, I got it!!!
The 2 issues I fixed which is quite difficult to see is:
1- To get data for all rows, I need a quite "stupid" and "redundant" step: Do j=i;
2- Once value=1 is found, put an out statement so that the outer loop will not run again (to avoid the double/triple counts)
If you could, please help to make my code more efficient or suggest other solution.
Many many thanks.
HHC
data want;
set have nobs=nobs;
drop i j k l k_: l_: out_:;
i+1;
DO J=i;
out_k=0;
count=0;
Do k=j to j+5 until (k=nobs or out_k=1);
set have (keep = id value rename=(id=k_id value=k_value)) point=k;
IF id^=k_id then leave;
/*positive section----*/
ELSE
IF k_value=1 then do;
out_k=1;/*critical step: once k_value=1, it is the only iteration for k (no more run on k)*/
DO l=k to j+5 until (l=nobs);
set have (keep = id value rename=(id=l_id value=l_value)) point=l;
if id^=l_id then leave;
else
if l_value>0 then count=count+1;
else
if l_value=-1 then leave;
End;
End;
/*negative section----*/
ELSE
IF k_value=-1 then do;
out_k=1;/*critical step: once k_value=1, it is the only iteration for k (no more run on k)*/
DO l=k to j+5 until (l=nobs);
set have (keep = id value rename=(id=l_id value=l_value)) point=l;
if id^=l_id then leave;
else
if l_value=-1 then count=count-1;
else
if l_value=1 then leave;
End;
End;
end;
END;
run;
I'm posting the same response I gave to your virtually identical topic. It does not require separate sets of codes for -1 vs +1:
data have;
input id value @@;
datalines;
1 1 1 . 1 1 1 . 1 1 1 . 1 . 1 .
1 . 1 . 1 . 1 -1 1 . 1 -1 1 . 1 .
1 1 1 1
2 -1 2 . 2 1 2 . 2 1
;
data want (drop=_: i);
merge have (firstobs=1)
have (firstobs=2 keep=id value rename=(id=_id2 value=_val2))
have (firstobs=3 keep=id value rename=(id=_id3 value=_val3))
have (firstobs=4 keep=id value rename=(id=_id4 value=_val4))
have (firstobs=5 keep=id value rename=(id=_id5 value=_val5))
have (firstobs=6 keep=id value rename=(id=_id6 value=_val6)) ;
array _x {*} id _id: ;
array _v {*} value _val: ;
count=0;
do i=1 to dim(_x) while(id=_x{i});
if _nonmiss1=. then _nonmiss1=_v{i};
else if _v{i}=. then continue;
else if _v{i}=_nonmiss1 then count=count+sign(_nonmiss1);
else if sign(_v{i})=-1*sign(_nonmiss1) then leave;
end;
run;
Hi mkeitz,
I run your code and for row 4 and 5, the count is 0. They suppose to be 1.
Since I will have multiple windows to check, such as 5, 50 or 60. I am not sure if this code is easy to change.
Thank you so much for helping.
HHC
Use the POINT= option to "lookahead".
You will need to know how many observations there are for each ID to make sure you don't look TOO far.
So given this data:
data have;
input id value @@;
datalines;
1 1 1 . 1 1 1 . 1 1 1 . 1 . 1 .
1 . 1 . 1 . 1 -1 1 . 1 -1 1 . 1 .
1 1 1 1
2 -1 2 . 2 1 2 . 2 1
;
With a window of 5 observations (counting the current non-missing value).
%let window=5;
data want;
length id value nfound firstobs nobs 8;
keep id -- nobs;
do nobs=1 by 1 until(last.id);
set have;
by id;
end;
do row=1 by 1 until(last.id);
set have;
by id;
obsnum+1;
if row=1 then firstobs=obsnum;
if not missing(value) then do;
nfound=1;
do p=obsnum+1 to min(obsnum+&window-1,firstobs+nobs-1) until(newvalue=-value);
set have(keep=value rename=(value=newvalue)) point=p;
nfound + (value=newvalue);
end;
output;
end;
end;
run;
You get these counts.
Obs id value nfound firstobs nobs 1 1 1 3 1 18 2 1 1 2 1 18 3 1 1 1 1 18 4 1 -1 2 1 18 5 1 -1 1 1 18 6 1 1 2 1 18 7 1 1 1 1 18 8 2 -1 1 19 5 9 2 1 2 19 5 10 2 1 1 19 5
Thank you as always, Tom.
HHC
Have not reviewed/replied SAS programming threads for fun in recent years. This one is an interesting fun topic to practice all basic DATA step statement elements.
data have;
input id value;
datalines;
1 1
1 .
1 1
1 .
1 1
1 .
1 .
1 .
1 .
1 .
1 .
1 -1
1 .
1 -1
1 .
1 .
1 1
1 1
2 -1
2 .
2 1
2 .
2 1
;
data ahuige(keep=id value count);
do obsnum=1 to last;
set have nobs=last;
count=sum(0,sign(value));
over=0;
do point=obsnum+1 to obsnum+5;
if point<=last and not over then
do;
set have(rename=(id=nowId value=nowValue)) point=point;
over=((id ne nowId) or sign(NowValue)*sign(count)=-1 );
if not over then count=sum(count,sign(NowValue)) ;
end;
end;
output;
end;stop;
run;
proc print;run;;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.