All, first time here, please be gentle. Thanks.
I have a table of records. For each row, I have an existing variable, x, and a new variable y to be derived.
For each row, say with row index r, the new variable y is defined such that it is '1' if I can find a row afterwards, say row r+p, such that ALL the rows from row r to row r+p having the existing variable x larger than a threshold, say, 100, and it is '0' otherwise.
How can I do this in SAS? thanks, from a newbie.
Is it necessary to use a do-until loop here? or not necessary.
No loops required, but you may need a RETAIN, it depends exactly on what you're trying to do.
Can you please provide a small example that includes the data you start with and what you want as output. It does not need to be your actual data but should reflect the complexities, ie if you need to do this for multiple groups include at least two groups.
@philgo wrote:
All, first time here, please be gentle. Thanks.
I have a table of records. For each row, I have an existing variable, x, and a new variable y to be derived.
For each row, say with row index r, the new variable y is defined such that it is '1' if I can find a row afterwards, say row r+p, such that ALL the rows from row r to row r+p having the existing variable x larger than a threshold, say, 100, and it is '0' otherwise.
How can I do this in SAS? thanks, from a newbie.
Is it necessary to use a do-until loop here? or not necessary.
row index x (existing) y (to be derived)
--------------------------------------------------------------------
1 200 0 (because I cannot find any row afterwards such that all rows in between have x>100)
2 0 0
3 200 0
4 0 0
5 200 1 (row 5 and row 6 have x>100)
6 200 0
7 0 0
8 200 1 (row 8 and row 9 have x>100)
9 200 1 (row 9 and row 10 have x>100)
10 200 no definition
thanks a lot for the quick reply. please see the above sample data.
to derive the variable y for a row r, i need to look at all rows beyond and see if the condition saitisfies.
All you really seem to want is a dummy variable indicating whether both the current observation ("row") and the next observation have x>100. This is a ready-made problem for solution by merging a data set with itself, but with the simultaneously access records offset by 1 ("firstobs=2" below):
data have;
input r x;
datalines;
1 200
2 0
3 200
4 0
5 200
6 200
7 0
8 200
9 200
10 200
run;
data want (drop=_:);
merge have
have (firstobs=2 keep=x rename=(x=_x_next));
if _x_next=. then y=.;
else y=min(x,_x_next)>100;
run;
Just as you can merge two separate datasets you can to a self-merge, think of them as two parallel streams of data. In this case merging record 1 with record 2 of have, record 2 with 3, etc. But you can't have 2 variables named X at the same time, so X in the "firstobs=2" stream is renamed to _x_next.
The rest is logic. _x_next for the matching of record 10 has no record 11, so it is missing, allowing an easy test for arriving at the end of the merge.
Thanks very much. That helps a lot.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.