Hi Everyone,
My data has id and value.
For each row[i], I want to check the window from [i] to [i+5]. I want to find the first non-missing value, say 1, and count how many 1 before -1 shows up.
If the first non-missing value is -1, I want to count how many -1 before 1 show up.
To distinguish between 1 series and -1 series, the count should have + sign for 1 and - sign for -1.
In my data below, the first row is 1, which serves as the first non-missing, thus the count is 3 (value 1)
2nd row: count = 2
3 row: count=1
4 row: count =1 as -1 shows in 7the record
My code is below and I really curious why it doesn't work.
It appears that the count is wrong for row with value =. and for the last few rows.
For simplicity, I don't introduce the condition of the same id yet.
Can you please help?
Many thanks,
HHC
data have;
input id value;
datalines;
1 1
1 .
1 1
1 .
1 1
1 .
1 -1
1 1
1 .
2 1
2 1
2 .
;run;
data want;
set have nobs=nobs;
drop i j_: out_:;
i+1;
out_j=0;
count=0;
DO j=i to i+5 until (j=nobs or out_j=1);
out_k=0;
set have (keep = id value rename=(id=j_id value=j_value)) point=j;
IF j_value>0 then do; /*checking positive value*/
do k=j to i+5 until (k=nobs or out_k=1);
set have (keep = id value rename=(id=k_id value=k_value)) point=k;
if k_value>0 then
count=count+1;
else
if k_value =-1 then do;
out_k=1;
leave;
end;
END;
End;
ELSE
IF j_value<0 then do;/*checking positive value*/
do k=j to i+5 until (k=nobs or out_k=1);
set have (keep = id value rename=(id=k_id value=k_value)) point=k;
if k_value<0 and k_value^=. then
count=count-1;
else
if k_value =1 then do;
out_k=1;
leave;
end;
END;
End;
if out_k=1 then out_j=1;
END;
run;
You'd better some real data you have and the output you want to see to illustrate your question better.
If you want count POSITIVE or NEGATIVE number , try this one :
/*
Since your data process logic is so complicated,
I would like to use ARRAY.*/
data have;
input id value;
datalines;
1 11
1 .
1 12
1 .
1 11
1 .
1 -12
1 11
1 .
2 12
2 13
2 .
3 14
3 15
3 .
3 -11
3 -12
3 .
3 -13
3 11
3 -14
;
data want;
array x{9999} _temporary_;
call missing(of x{*});
do j=1 by 1 until(last.id);
set have;
by id;
x{j}=value;
end;
do i=1 by 1 until(last.id);
set have;
by id;
found=0;count=0;first_no_missing=0;
do k=i to i+5;
if x{k} and not found then do;found=1;first_no_missing=x{k}; end;
if sign(x{k})=sign(first_no_missing) then count+1;
if sign(x{k})=-sign(first_no_missing) then leave;
end;
output;
end;
keep id value count;
run;
I didn't spend enough time on your code to answer your question but check if below a bit simpler version returns the desired result.
data have;
input id value;
datalines;
1 1
1 .
1 1
1 .
1 1
1 .
1 -1
1 1
1 .
2 1
2 1
2 .
;
data want(drop=_:);
set have;
do _i=_n_ to _n_+5 while(_i<=_nobs);
set have(keep=id value rename=(id=_id value=_value)) nobs=_nobs point=_i;
if id ne _id then leave;
_firstNonMiss = coalesce(_firstNonMiss,_value);
if n(_firstNonMiss, _value)=2 then
do;
if _firstNonMiss=_value then count=sum(count,1);
else leave;
end;
end;
run;
proc print data=want;
run;
Because your variable Value contains 1, -1 or missing you could simply use this variable for the count. In the code I proposed already change...
from: count=sum(count,1) to: count=sum(count,_firstNonMiss)
Here the amended code:
data want(drop=_:);
set have;
do _i=_n_ to _n_+5 while(_i<=_nobs);
set have(keep=id value rename=(id=_id value=_value)) nobs=_nobs point=_i;
if id ne _id then leave;
_firstNonMiss = coalesce(_firstNonMiss,_value);
if n(_firstNonMiss, _value)=2 then
do;
if _firstNonMiss=_value then count=sum(count,_firstNonMiss);
else leave;
end;
end;
run;
If in your real data the value could also be something else than 1, -1 or missing and you just want a count until the value changes then below expression should work:
count=sum(count, abs(_firstNonMiss)/_firstNonMiss );
You'd better some real data you have and the output you want to see to illustrate your question better.
If you want count POSITIVE or NEGATIVE number , try this one :
/*
Since your data process logic is so complicated,
I would like to use ARRAY.*/
data have;
input id value;
datalines;
1 11
1 .
1 12
1 .
1 11
1 .
1 -12
1 11
1 .
2 12
2 13
2 .
3 14
3 15
3 .
3 -11
3 -12
3 .
3 -13
3 11
3 -14
;
data want;
array x{9999} _temporary_;
call missing(of x{*});
do j=1 by 1 until(last.id);
set have;
by id;
x{j}=value;
end;
do i=1 by 1 until(last.id);
set have;
by id;
found=0;count=0;first_no_missing=0;
do k=i to i+5;
if x{k} and not found then do;found=1;first_no_missing=x{k}; end;
if sign(x{k})=sign(first_no_missing) then count+1;
if sign(x{k})=-sign(first_no_missing) then leave;
end;
output;
end;
keep id value count;
run;
Thank you, Ksharp for helping.
HHC
/*
Since your data process logic is so complicated,
I would like to use ARRAY.*/
data have;
input id value;
datalines;
1 1
1 .
1 1
1 .
1 1
1 .
1 -1
1 1
1 .
2 1
2 1
2 .
3 1
3 1
3 .
3 -1
3 -1
3 .
3 -1
3 1
3 -1
;
data want;
array x{9999} _temporary_;
call missing(of x{*});
do j=1 by 1 until(last.id);
set have;
by id;
x{j}=value;
end;
do i=1 by 1 until(last.id);
set have;
by id;
found=0;count=0;first_no_missing=0;
do k=i to i+5;
if x{k} and not found then do;found=1;first_no_missing=x{k}; end;
if x{k}=first_no_missing then count+1;
if x{k}=-first_no_missing then leave;
end;
output;
end;
keep id value count;
run;
That's the best piece of code I've ever seen
- Cheers -
Hi Everyone,
Thank you so much for helping.
The issue is that, I have been using this Do Loop as my tool and failing it this time make me very uncomfortable and unconfident about my code.
I see the logic is quite straight forward to implement 😕
I hope you guys expert can help me to fix my code.
Thanks,
HHC
Hi,
I'd follow an other approach. Rather than try to loop and perform the check within the next rows, I'd transpose the next rows and work with it.
It eases debugging and code review (in my view)
I'm working id by id that is I won't compare the next rows if the id differs. I think it makes sense.
That's what I come up with, let me know it suits your needs:
data have;
input id value;
datalines;
1 1
1 .
1 1
1 .
1 1
1 .
1 -1
1 1
1 .
2 1
2 1
2 .
;run;
data have1;
set have;
i+1;
run;
*Macro to transpose the next rows;
%MACRO getnextRows(idvalue=1, i_position=1);
DATA _tmp1;
SET have1(where=(id=&idvalue. and &i_position.<=i<=&i_position.+5));
i=&i_position.; *set i to &i_position. for merging after transpose;
keep id value i;
RUN;
PROC TRANSPOSE DATA=_tmp1 out=_tmp2 prefix=next;
BY id i;
VAR value;
RUN;
DATA have1;
MERGE have1 _tmp2;
BY id i;
RUN;
PROC DATASETS lib=work nolist; delete _tmp:; RUN;QUIT;
%MEND getnextRows;
%MACRO reset(vars,val);
array _vars_ &vars.;
do over _vars_;
_vars_=&val.;
end;
%MEND reset;
*Create working dataset with transposed rows;
DATA _NULL_;
set have1;
call execute('%nrstr(%getnextRows(idvalue='||strip(put(id,best.))||', i_position='||strip(put(i,best.))||'))');
RUN;
DATA want;
SET have1;
ARRAY nextRow(6) NEXT1-NEXT6; *NEXT1 is actually the current row;
%reset(count countPos countNeg firstValue,0);
DO c1=1 TO 6;
*Check for first non-missing value and get position of firstValue or -firstValue;
if sum(count, countPos, countNeg)=0 then do;
firstValue=nextRow[c1];
if not missing(firstValue) then do;
count+1;
if firstValue>0 then countPos+1;
else if .<firstValue<0 then countNeg+1;
end;
end;
else do;
*count how many firstValue until -firstValue;
if nextRow[c1]=-firstValue then leave;
else if nextRow[c1]=firstValue then do;
count+1;
if firstValue>0 then countPos+1;
else countNeg+1;
end;
end;
END;
keep id value count:;
RUN;
- Cheers -
You can do a self-merge of obs i through obs i+5. Put the values of ID and VALUE in arrays. Then use those arrays to generate COUNT.
I understand that if the reference value is -1, then you want to count negative ones, and also give a negative sign to COUNT, so the below uses the SIGN function.
Because there is an array for ID's as well as for VALUE's, it's easy to avoid examining obs for the upcoming ID.
data have;
input id value @@;
datalines;
1 1 1 . 1 1 1 . 1 1
1 . 1 -1 1 1 1 .
2 1 2 1 2 .
run;
data want (drop=_: i);
merge have (firstobs=1)
have (firstobs=2 keep=id value rename=(id=_id2 value=_val2))
have (firstobs=3 keep=id value rename=(id=_id3 value=_val3))
have (firstobs=4 keep=id value rename=(id=_id4 value=_val4))
have (firstobs=5 keep=id value rename=(id=_id5 value=_val5))
have (firstobs=6 keep=id value rename=(id=_id6 value=_val6)) ;
array _x {*} id _id: ;
array _v {*} value _val: ;
count=0;
do i=1 to dim(_x) while(id=_x{i});
if _nonmiss1=. then _nonmiss1=_v{i};
else if _v{i}=. then continue;
else if _v{i}=_nonmiss1 then count=count+sign(_nonmiss1);
else if sign(_v{i})=-1*sign(_nonmiss1) then leave;
end;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.