BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
hhchenfx
Rhodochrosite | Level 12

Hi Everyone,

My data has id and value.

For each row[i], I want to check the window from [i] to [i+5]. I want to find the first non-missing value, say 1, and count how many 1 before -1 shows up.

If the first non-missing value is -1, I want to count how many -1 before 1 show up.

To distinguish between 1 series and -1 series, the count should have + sign for 1 and - sign for -1.

In my data below, the first row is 1, which serves as the first non-missing, thus the count is 3 (value 1)

2nd row: count = 2

3 row: count=1

4 row: count =1 as -1 shows in 7the record 

My code is below and I really curious why it doesn't work.

It appears that the count is wrong for row with value =. and for the last few rows.

For simplicity, I don't introduce the condition of the same id yet.

Can you please help?

Many thanks,

HHC

data have; 
input id value;
datalines;
1 1
1 .
1 1
1 .
1 1
1 .
1 -1
1 1
1 .
2 1
2 1
2 .
;run;

data want; 
set have nobs=nobs;
drop i j_: out_:;
i+1;
out_j=0;
count=0;

DO j=i to i+5 until (j=nobs or out_j=1);
out_k=0;
set have (keep = id value rename=(id=j_id value=j_value)) point=j;

	IF j_value>0  then do; /*checking positive value*/
		do k=j to i+5  until (k=nobs or out_k=1);
		set have (keep = id value rename=(id=k_id value=k_value)) point=k;
			if k_value>0 then
				count=count+1;
			else 
			if k_value =-1 then do;
				out_k=1;
				leave;
				end;

		END;
		End;

	ELSE
	IF j_value<0  then do;/*checking positive value*/
		do k=j to i+5  until (k=nobs or out_k=1);
		set have (keep = id value rename=(id=k_id value=k_value)) point=k;
			if k_value<0 and k_value^=. then
				count=count-1;
			else 
			if k_value =1 then do;
				out_k=1;
				leave;
				end;

		END;
		End;
	
	if out_k=1 then out_j=1;

END;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

You'd better some real data you have and the output you want to see to illustrate your question better.

If you want count POSITIVE or NEGATIVE number , try this one :

 

/*
Since your data process logic is so complicated,
I would like to use ARRAY.*/
data have; 
input id value;
datalines;
1 11
1 .
1 12
1 .
1 11
1 .
1 -12
1 11
1 .
2 12
2 13
2 .
3 14
3 15
3 .
3 -11
3 -12
3 .
3 -13
3 11
3 -14
;

data want;
array x{9999} _temporary_;
call missing(of x{*});

do j=1 by 1 until(last.id);
 set have;
 by id;
 x{j}=value;
end;

do i=1 by 1 until(last.id);
 set have;
 by id;
 found=0;count=0;first_no_missing=0;
 do k=i to i+5;
   if x{k} and not found then do;found=1;first_no_missing=x{k}; end;
   if sign(x{k})=sign(first_no_missing) then count+1;
   if sign(x{k})=-sign(first_no_missing) then leave;
 end;
 output;
end;
keep id value count;
run;

View solution in original post

10 REPLIES 10
Patrick
Opal | Level 21

I didn't spend enough time on your code to answer your question but check if below a bit simpler version returns the desired result.

data have;
  input id value;
  datalines;
1 1
1 .
1 1
1 .
1 1
1 .
1 -1
1 1
1 .
2 1
2 1
2 .
;

data want(drop=_:);
  set have;
  do _i=_n_ to _n_+5 while(_i<=_nobs);
    set have(keep=id value rename=(id=_id value=_value)) nobs=_nobs point=_i;
    if id ne _id then leave;

    _firstNonMiss = coalesce(_firstNonMiss,_value);
    if n(_firstNonMiss, _value)=2 then
      do;
        if _firstNonMiss=_value then count=sum(count,1);
        else leave;
      end;
  end;
run;

proc print data=want;
run;

 

hhchenfx
Rhodochrosite | Level 12

@Ksharp and @Patrick 

For negative series, I need the count having - sign. Otherwise, there is no distinguish between +1 series vs. -1 series.

Can you help?

Thanks,

HHC

Patrick
Opal | Level 21

Because your variable Value contains 1, -1 or missing you could simply use this variable for the count. In the code I proposed already change...

from:
count=sum(count,1)

to:
count=sum(count,_firstNonMiss)

 

Here the amended code:

data want(drop=_:);
  set have;
  do _i=_n_ to _n_+5 while(_i<=_nobs);
    set have(keep=id value rename=(id=_id value=_value)) nobs=_nobs point=_i;
    if id ne _id then leave;

    _firstNonMiss = coalesce(_firstNonMiss,_value);
    if n(_firstNonMiss, _value)=2 then
      do;
        if _firstNonMiss=_value then count=sum(count,_firstNonMiss);
        else leave;
      end;
  end;
run;

 If in your real data the value could also be something else than 1, -1 or missing and you just want a count until the value changes then below expression should work:

count=sum(count, abs(_firstNonMiss)/_firstNonMiss );
Ksharp
Super User

You'd better some real data you have and the output you want to see to illustrate your question better.

If you want count POSITIVE or NEGATIVE number , try this one :

 

/*
Since your data process logic is so complicated,
I would like to use ARRAY.*/
data have; 
input id value;
datalines;
1 11
1 .
1 12
1 .
1 11
1 .
1 -12
1 11
1 .
2 12
2 13
2 .
3 14
3 15
3 .
3 -11
3 -12
3 .
3 -13
3 11
3 -14
;

data want;
array x{9999} _temporary_;
call missing(of x{*});

do j=1 by 1 until(last.id);
 set have;
 by id;
 x{j}=value;
end;

do i=1 by 1 until(last.id);
 set have;
 by id;
 found=0;count=0;first_no_missing=0;
 do k=i to i+5;
   if x{k} and not found then do;found=1;first_no_missing=x{k}; end;
   if sign(x{k})=sign(first_no_missing) then count+1;
   if sign(x{k})=-sign(first_no_missing) then leave;
 end;
 output;
end;
keep id value count;
run;
hhchenfx
Rhodochrosite | Level 12

Thank you, Ksharp for helping.

HHC

Ksharp
Super User
/*
Since your data process logic is so complicated,
I would like to use ARRAY.*/
data have; 
input id value;
datalines;
1 1
1 .
1 1
1 .
1 1
1 .
1 -1
1 1
1 .
2 1
2 1
2 .
3 1
3 1
3 .
3 -1
3 -1
3 .
3 -1
3 1
3 -1
;

data want;
array x{9999} _temporary_;
call missing(of x{*});

do j=1 by 1 until(last.id);
 set have;
 by id;
 x{j}=value;
end;

do i=1 by 1 until(last.id);
 set have;
 by id;
 found=0;count=0;first_no_missing=0;
 do k=i to i+5;
   if x{k} and not found then do;found=1;first_no_missing=x{k}; end;
   if x{k}=first_no_missing then count+1;
   if x{k}=-first_no_missing then leave;
 end;
 output;
end;
keep id value count;
run;
Oligolas
Barite | Level 11

That's the best piece of code I've ever seen

 

________________________

- Cheers -

hhchenfx
Rhodochrosite | Level 12

Hi Everyone,

Thank you so much for helping.

The issue is that, I have been using this Do Loop as my tool and failing it this time make me very uncomfortable and unconfident about my code.

I see the logic is quite straight forward to implement 😕

I hope you guys expert can help me to fix my code.

Thanks,

HHC

 

Oligolas
Barite | Level 11

Hi,

 

I'd follow an other approach. Rather than try to loop and perform the check within the next rows, I'd transpose the next rows and work with it.

It eases debugging and code review (in my view)

 

I'm working id by id that is I won't compare the next rows if the id differs. I think it makes sense.

 

That's what I come up with, let me know it suits your needs:

data have; 
input id value;
datalines;
1 1
1 .
1 1
1 .
1 1
1 .
1 -1
1 1
1 .
2 1
2 1
2 .
;run;

data have1; 
set have;
i+1;
run;

*Macro to transpose the next rows;
%MACRO getnextRows(idvalue=1, i_position=1);
   DATA _tmp1;
      SET have1(where=(id=&idvalue. and &i_position.<=i<=&i_position.+5));
      i=&i_position.; *set i to &i_position. for merging after transpose;
      keep id value i;
   RUN;

   PROC TRANSPOSE DATA=_tmp1 out=_tmp2 prefix=next;
      BY id i;
      VAR value;
   RUN;

   DATA have1;
      MERGE have1 _tmp2;
      BY id i;
   RUN;

   PROC DATASETS lib=work nolist; delete _tmp:; RUN;QUIT;
%MEND getnextRows;

%MACRO reset(vars,val);
   array _vars_ &vars.;
   do over _vars_;
      _vars_=&val.;
   end;
%MEND reset;

*Create working dataset with transposed rows;
DATA _NULL_;
   set have1;
   call execute('%nrstr(%getnextRows(idvalue='||strip(put(id,best.))||', i_position='||strip(put(i,best.))||'))');
RUN;

DATA want;
   SET have1;
   ARRAY nextRow(6) NEXT1-NEXT6; *NEXT1 is actually the current row;
   %reset(count countPos countNeg firstValue,0);
   DO c1=1 TO 6;
      *Check for first non-missing value and get position of firstValue or -firstValue;
      if sum(count, countPos, countNeg)=0 then do;
         firstValue=nextRow[c1];
         if not missing(firstValue) then do;
            count+1;
            if firstValue>0  then countPos+1;
            else if .<firstValue<0 then countNeg+1;
         end;
      end;
      else do;
         *count how many firstValue until -firstValue;
         if nextRow[c1]=-firstValue then leave;
         else if nextRow[c1]=firstValue then do;
            count+1;
            if firstValue>0 then countPos+1;
            else countNeg+1;
         end;
      end;
   END;
   keep id value count:;
RUN;
________________________

- Cheers -

mkeintz
PROC Star

You can do a self-merge of obs i through obs i+5.  Put the values of ID and VALUE in arrays.  Then use those arrays to generate COUNT.

 

I understand that if the reference value is -1, then you want to count negative ones, and also give a negative sign to COUNT, so the below uses the SIGN function.

 

Because there is an array for ID's as well as for VALUE's, it's easy to avoid examining obs for the upcoming ID.

 

data have; 
input id value @@;
datalines;
1  1    1  .    1  1    1  .    1  1    
1  .    1 -1    1  1    1  .    
2  1    2  1    2  .    
run;

data want (drop=_: i);
  merge have (firstobs=1)
        have (firstobs=2 keep=id value rename=(id=_id2 value=_val2))
        have (firstobs=3 keep=id value rename=(id=_id3 value=_val3))
        have (firstobs=4 keep=id value rename=(id=_id4 value=_val4))
        have (firstobs=5 keep=id value rename=(id=_id5 value=_val5))
        have (firstobs=6 keep=id value rename=(id=_id6 value=_val6)) ;

  array _x {*} id     _id:  ;
  array _v {*} value  _val: ;

  count=0;
  do i=1 to dim(_x) while(id=_x{i});
    if _nonmiss1=. then _nonmiss1=_v{i};
    else if _v{i}=. then continue;
    else if _v{i}=_nonmiss1 then count=count+sign(_nonmiss1);
    else if sign(_v{i})=-1*sign(_nonmiss1) then leave;
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2508 views
  • 4 likes
  • 5 in conversation