Hi everyone -
I have a dataset where I'd like to compare a record against the first row in the dataset. For example, if I have the following:
ID Height
1 40
2 35
3 40
4 58
I want to be able to create another variable that says which ones are greater than the first.
ID Height Flag
1 40 0
2 35 0
3 40 0
4 58 1
I don't think a lag will work since I always want to compare against record 1, not the record immediately preceding. And I can't get a first. command to work since I need to sort by ID in order to get my comparison row on the top, though I think there may be a way to do this that I can't figure out.
Any ideas? Thank you!
Use two SET statements. One SET read just the single first obs. The other reads the entire data set, and does the compare:
data have;
input id height;
datalines;
1 40
2 35
3 40
4 58
run;
data want (drop=h_ref);
if _n_=1 then set have (keep=height rename=(height=h_ref));
set have;
flag=height>h_ref;
run;
The "if _n_=1 then set have ..." reads observation 1, renaming height to h_ref. Just as importantly any variable read by SET is automatically "retained" until that same SET statement is executed in a later iteration of the data set. But because the program does not re-execute that set, variable H_REF is retained throughout all the observations. Meanwhile the other SET statement reads all the obs, from obs 1 to the end. I.e., the two SET statements set up separate streams of data from HAVE.
RETAIN.
RETAIN first_record_value;
if _n_=1 /*check for first record*/
then first_record_value = height;
@ATLien wrote:
Hi everyone -
I have a dataset where I'd like to compare a record against the first row in the dataset. For example, if I have the following:
ID Height
1 40
2 35
3 40
4 58
I want to be able to create another variable that says which ones are greater than the first.
ID Height Flag
1 40 0
2 35 0
3 40 0
4 58 1
I don't think a lag will work since I always want to compare against record 1, not the record immediately preceding. And I can't get a first. command to work since I need to sort by ID in order to get my comparison row on the top, though I think there may be a way to do this that I can't figure out.
Any ideas? Thank you!
When I try that, it just adds in a new variable with a record only for the first row. I still only have the relevant data in that top row, which I want to compare against data in subsequent rows.
Did you include the RETAIN statement?
Post your exact code please.
Use two SET statements. One SET read just the single first obs. The other reads the entire data set, and does the compare:
data have;
input id height;
datalines;
1 40
2 35
3 40
4 58
run;
data want (drop=h_ref);
if _n_=1 then set have (keep=height rename=(height=h_ref));
set have;
flag=height>h_ref;
run;
The "if _n_=1 then set have ..." reads observation 1, renaming height to h_ref. Just as importantly any variable read by SET is automatically "retained" until that same SET statement is executed in a later iteration of the data set. But because the program does not re-execute that set, variable H_REF is retained throughout all the observations. Meanwhile the other SET statement reads all the obs, from obs 1 to the end. I.e., the two SET statements set up separate streams of data from HAVE.
Thank you! I had never thought to use two set statements but this works perfectly. What a great new trick!
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.