Re: General question on confirming records.

philgo · Posted 08-12-2018 07:19 PM

All, first time here, please be gentle. Thanks.

I have a table of records. For each row, I have an existing variable, x, and a new variable y to be derived.

For each row, say with row index r, the new variable y is defined such that it is '1' if I can find a row afterwards, say row r+p, such that ALL the rows from row r to row r+p having the existing variable x larger than a threshold, say, 100, and it is '0' otherwise.

How can I do this in SAS? thanks, from a newbie.

Is it necessary to use a do-until loop here? or not necessary.

Reeza · Posted 08-12-2018 07:26 PM

No loops required, but you may need a RETAIN, it depends exactly on what you're trying to do.

Can you please provide a small example that includes the data you start with and what you want as output. It does not need to be your actual data but should reflect the complexities, ie if you need to do this for multiple groups include at least two groups.

@philgo wrote:

All, first time here, please be gentle. Thanks.

I have a table of records. For each row, I have an existing variable, x, and a new variable y to be derived.

For each row, say with row index r, the new variable y is defined such that it is '1' if I can find a row afterwards, say row r+p, such that ALL the rows from row r to row r+p having the existing variable x larger than a threshold, say, 100, and it is '0' otherwise.

How can I do this in SAS? thanks, from a newbie.

Is it necessary to use a do-until loop here? or not necessary.

philgo · Posted 08-12-2018 07:32 PM

row index x (existing) y (to be derived)

--------------------------------------------------------------------

1 200 0 (because I cannot find any row afterwards such that all rows in between have x>100)

2 0 0

3 200 0

4 0 0

5 200 1 (row 5 and row 6 have x>100)

6 200 0

7 0 0

8 200 1 (row 8 and row 9 have x>100)

9 200 1 (row 9 and row 10 have x>100)

10 200 no definition

thanks a lot for the quick reply. please see the above sample data.

to derive the variable y for a row r, i need to look at all rows beyond and see if the condition saitisfies.

mkeintz · Posted 08-12-2018 08:19 PM

All you really seem to want is a dummy variable indicating whether both the current observation ("row") and the next observation have x>100. This is a ready-made problem for solution by merging a data set with itself, but with the simultaneously access records offset by 1 ("firstobs=2" below):

data have;
  input r x;
datalines;
 1    200
 2      0
 3    200
 4      0
 5    200
 6    200
 7      0
 8    200
 9    200
10    200
run;
 
data want (drop=_:);
  merge have  
        have (firstobs=2 keep=x rename=(x=_x_next));
  if _x_next=. then y=.;
  else y=min(x,_x_next)>100;
run;

Just as you can merge two separate datasets you can to a self-merge, think of them as two parallel streams of data. In this case merging record 1 with record 2 of have, record 2 with 3, etc. But you can't have 2 variables named X at the same time, so X in the "firstobs=2" stream is renamed to _x_next.

The rest is logic. _x_next for the matching of record 10 has no record 11, so it is missing, allowing an easy test for arriving at the end of the merge.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

philgo · Posted 08-12-2018 11:36 PM

Thanks very much. That helps a lot.

General question on confirming records.