Solved: Re: Count number of value 1 using do loop

hhchenfx · Posted 11-08-2023 01:04 AM

Hi Everyone,

My data has id and value.

For each row[i], I want to check the window from [i] to [i+5]. I want to find the first non-missing value, say 1, and count how many 1 before -1 shows up.

If the first non-missing value is -1, I want to count how many -1 before 1 show up.

To distinguish between 1 series and -1 series, the count should have + sign for 1 and - sign for -1.

In my data below, the first row is 1, which serves as the first non-missing, thus the count is 3 (value 1)

2nd row: count = 2

3 row: count=1

4 row: count =1 as -1 shows in 7the record

My code is below and I really curious why it doesn't work.

It appears that the count is wrong for row with value =. and for the last few rows.

For simplicity, I don't introduce the condition of the same id yet.

Can you please help?

Many thanks,

HHC

data have; 
input id value;
datalines;
1 1
1 .
1 1
1 .
1 1
1 .
1 -1
1 1
1 .
2 1
2 1
2 .
;run;

data want; 
set have nobs=nobs;
drop i j_: out_:;
i+1;
out_j=0;
count=0;

DO j=i to i+5 until (j=nobs or out_j=1);
out_k=0;
set have (keep = id value rename=(id=j_id value=j_value)) point=j;

	IF j_value>0  then do; /*checking positive value*/
		do k=j to i+5  until (k=nobs or out_k=1);
		set have (keep = id value rename=(id=k_id value=k_value)) point=k;
			if k_value>0 then
				count=count+1;
			else 
			if k_value =-1 then do;
				out_k=1;
				leave;
				end;

		END;
		End;

	ELSE
	IF j_value<0  then do;/*checking positive value*/
		do k=j to i+5  until (k=nobs or out_k=1);
		set have (keep = id value rename=(id=k_id value=k_value)) point=k;
			if k_value<0 and k_value^=. then
				count=count-1;
			else 
			if k_value =1 then do;
				out_k=1;
				leave;
				end;

		END;
		End;
	
	if out_k=1 then out_j=1;

END;
run;

Ksharp · Posted 11-08-2023 08:05 PM

You'd better some real data you have and the output you want to see to illustrate your question better.

If you want count POSITIVE or NEGATIVE number , try this one :

/*
Since your data process logic is so complicated,
I would like to use ARRAY.*/
data have; 
input id value;
datalines;
1 11
1 .
1 12
1 .
1 11
1 .
1 -12
1 11
1 .
2 12
2 13
2 .
3 14
3 15
3 .
3 -11
3 -12
3 .
3 -13
3 11
3 -14
;

data want;
array x{9999} _temporary_;
call missing(of x{*});

do j=1 by 1 until(last.id);
 set have;
 by id;
 x{j}=value;
end;

do i=1 by 1 until(last.id);
 set have;
 by id;
 found=0;count=0;first_no_missing=0;
 do k=i to i+5;
   if x{k} and not found then do;found=1;first_no_missing=x{k}; end;
   if sign(x{k})=sign(first_no_missing) then count+1;
   if sign(x{k})=-sign(first_no_missing) then leave;
 end;
 output;
end;
keep id value count;
run;

View solution in original post

Patrick · Posted 11-08-2023 04:02 AM

I didn't spend enough time on your code to answer your question but check if below a bit simpler version returns the desired result.

data have;
  input id value;
  datalines;
1 1
1 .
1 1
1 .
1 1
1 .
1 -1
1 1
1 .
2 1
2 1
2 .
;

data want(drop=_:);
  set have;
  do _i=_n_ to _n_+5 while(_i<=_nobs);
    set have(keep=id value rename=(id=_id value=_value)) nobs=_nobs point=_i;
    if id ne _id then leave;

    _firstNonMiss = coalesce(_firstNonMiss,_value);
    if n(_firstNonMiss, _value)=2 then
      do;
        if _firstNonMiss=_value then count=sum(count,1);
        else leave;
      end;
  end;
run;

proc print data=want;
run;

hhchenfx · Posted 11-08-2023 08:01 AM

@Ksharp and @Patrick

For negative series, I need the count having - sign. Otherwise, there is no distinguish between +1 series vs. -1 series.

Can you help?

Thanks,

HHC

Patrick · Posted 11-08-2023 07:48 PM

Because your variable Value contains 1, -1 or missing you could simply use this variable for the count. In the code I proposed already change...

from:
count=sum(count,1)

to:
count=sum(count,_firstNonMiss)

Here the amended code:

data want(drop=_:);
  set have;
  do _i=_n_ to _n_+5 while(_i<=_nobs);
    set have(keep=id value rename=(id=_id value=_value)) nobs=_nobs point=_i;
    if id ne _id then leave;

    _firstNonMiss = coalesce(_firstNonMiss,_value);
    if n(_firstNonMiss, _value)=2 then
      do;
        if _firstNonMiss=_value then count=sum(count,_firstNonMiss);
        else leave;
      end;
  end;
run;

If in your real data the value could also be something else than 1, -1 or missing and you just want a count until the value changes then below expression should work:

count=sum(count, abs(_firstNonMiss)/_firstNonMiss );

Ksharp · Posted 11-08-2023 08:05 PM

You'd better some real data you have and the output you want to see to illustrate your question better.

If you want count POSITIVE or NEGATIVE number , try this one :

/*
Since your data process logic is so complicated,
I would like to use ARRAY.*/
data have; 
input id value;
datalines;
1 11
1 .
1 12
1 .
1 11
1 .
1 -12
1 11
1 .
2 12
2 13
2 .
3 14
3 15
3 .
3 -11
3 -12
3 .
3 -13
3 11
3 -14
;

data want;
array x{9999} _temporary_;
call missing(of x{*});

do j=1 by 1 until(last.id);
 set have;
 by id;
 x{j}=value;
end;

do i=1 by 1 until(last.id);
 set have;
 by id;
 found=0;count=0;first_no_missing=0;
 do k=i to i+5;
   if x{k} and not found then do;found=1;first_no_missing=x{k}; end;
   if sign(x{k})=sign(first_no_missing) then count+1;
   if sign(x{k})=-sign(first_no_missing) then leave;
 end;
 output;
end;
keep id value count;
run;

hhchenfx · Posted 11-09-2023 12:20 AM

Thank you, Ksharp for helping.

HHC

Ksharp · Posted 11-08-2023 04:46 AM

/*
Since your data process logic is so complicated,
I would like to use ARRAY.*/
data have; 
input id value;
datalines;
1 1
1 .
1 1
1 .
1 1
1 .
1 -1
1 1
1 .
2 1
2 1
2 .
3 1
3 1
3 .
3 -1
3 -1
3 .
3 -1
3 1
3 -1
;

data want;
array x{9999} _temporary_;
call missing(of x{*});

do j=1 by 1 until(last.id);
 set have;
 by id;
 x{j}=value;
end;

do i=1 by 1 until(last.id);
 set have;
 by id;
 found=0;count=0;first_no_missing=0;
 do k=i to i+5;
   if x{k} and not found then do;found=1;first_no_missing=x{k}; end;
   if x{k}=first_no_missing then count+1;
   if x{k}=-first_no_missing then leave;
 end;
 output;
end;
keep id value count;
run;

Oligolas · Posted 11-08-2023 06:14 AM

That's the best piece of code I've ever seen

________________________
- Cheers -

hhchenfx · Posted 11-08-2023 06:56 AM

Hi Everyone,

Thank you so much for helping.

The issue is that, I have been using this Do Loop as my tool and failing it this time make me very uncomfortable and unconfident about my code.

I see the logic is quite straight forward to implement 😕

I hope you guys expert can help me to fix my code.

Thanks,

HHC

Oligolas · Posted 11-08-2023 05:35 AM

Hi,

I'd follow an other approach. Rather than try to loop and perform the check within the next rows, I'd transpose the next rows and work with it.

It eases debugging and code review (in my view)

I'm working id by id that is I won't compare the next rows if the id differs. I think it makes sense.

That's what I come up with, let me know it suits your needs:

data have; 
input id value;
datalines;
1 1
1 .
1 1
1 .
1 1
1 .
1 -1
1 1
1 .
2 1
2 1
2 .
;run;

data have1; 
set have;
i+1;
run;

*Macro to transpose the next rows;
%MACRO getnextRows(idvalue=1, i_position=1);
   DATA _tmp1;
      SET have1(where=(id=&idvalue. and &i_position.<=i<=&i_position.+5));
      i=&i_position.; *set i to &i_position. for merging after transpose;
      keep id value i;
   RUN;

   PROC TRANSPOSE DATA=_tmp1 out=_tmp2 prefix=next;
      BY id i;
      VAR value;
   RUN;

   DATA have1;
      MERGE have1 _tmp2;
      BY id i;
   RUN;

   PROC DATASETS lib=work nolist; delete _tmp:; RUN;QUIT;
%MEND getnextRows;

%MACRO reset(vars,val);
   array _vars_ &vars.;
   do over _vars_;
      _vars_=&val.;
   end;
%MEND reset;

*Create working dataset with transposed rows;
DATA _NULL_;
   set have1;
   call execute('%nrstr(%getnextRows(idvalue='||strip(put(id,best.))||', i_position='||strip(put(i,best.))||'))');
RUN;

DATA want;
   SET have1;
   ARRAY nextRow(6) NEXT1-NEXT6; *NEXT1 is actually the current row;
   %reset(count countPos countNeg firstValue,0);
   DO c1=1 TO 6;
      *Check for first non-missing value and get position of firstValue or -firstValue;
      if sum(count, countPos, countNeg)=0 then do;
         firstValue=nextRow[c1];
         if not missing(firstValue) then do;
            count+1;
            if firstValue>0  then countPos+1;
            else if .<firstValue<0 then countNeg+1;
         end;
      end;
      else do;
         *count how many firstValue until -firstValue;
         if nextRow[c1]=-firstValue then leave;
         else if nextRow[c1]=firstValue then do;
            count+1;
            if firstValue>0 then countPos+1;
            else countNeg+1;
         end;
      end;
   END;
   keep id value count:;
RUN;

________________________
- Cheers -

mkeintz · Posted 11-08-2023 11:45 PM

You can do a self-merge of obs i through obs i+5. Put the values of ID and VALUE in arrays. Then use those arrays to generate COUNT.

I understand that if the reference value is -1, then you want to count negative ones, and also give a negative sign to COUNT, so the below uses the SIGN function.

Because there is an array for ID's as well as for VALUE's, it's easy to avoid examining obs for the upcoming ID.

data have; 
input id value @@;
datalines;
1  1    1  .    1  1    1  .    1  1    
1  .    1 -1    1  1    1  .    
2  1    2  1    2  .    
run;

data want (drop=_: i);
  merge have (firstobs=1)
        have (firstobs=2 keep=id value rename=(id=_id2 value=_val2))
        have (firstobs=3 keep=id value rename=(id=_id3 value=_val3))
        have (firstobs=4 keep=id value rename=(id=_id4 value=_val4))
        have (firstobs=5 keep=id value rename=(id=_id5 value=_val5))
        have (firstobs=6 keep=id value rename=(id=_id6 value=_val6)) ;

  array _x {*} id     _id:  ;
  array _v {*} value  _val: ;

  count=0;
  do i=1 to dim(_x) while(id=_x{i});
    if _nonmiss1=. then _nonmiss1=_v{i};
    else if _v{i}=. then continue;
    else if _v{i}=_nonmiss1 then count=count+sign(_nonmiss1);
    else if sign(_v{i})=-1*sign(_nonmiss1) then leave;
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Classroom Training Available!