BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ATLien
Calcite | Level 5

Hi everyone - 

 

I have a dataset where I'd like to compare a record against the first row in the dataset. For example, if I have the following:

 

ID    Height

1      40

2      35

3      40

4      58

 

I want to be able to create another variable that says which ones are greater than the first.

ID    Height    Flag

1      40          0

2      35          0

3      40          0

4      58          1

 

I don't think a lag will work since I always want to compare against record 1, not the record immediately preceding. And I can't get a first. command to work since I need to sort by ID in order to get my comparison row on the top, though I think there may be a way to do this that I can't figure out.

 

Any ideas? Thank you!

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Use two SET statements.  One SET read just the single first obs.  The other reads the entire data set, and does the compare:

 

data have;
  input id height;
datalines;
1      40
2      35
3      40
4      58
run;
data want (drop=h_ref);
  if _n_=1 then set have (keep=height rename=(height=h_ref));
  set have;
  flag=height>h_ref;
run;

 

The "if _n_=1 then set have ..." reads observation 1, renaming height to h_ref.  Just as importantly any variable read by SET  is automatically "retained" until that same SET statement is executed in a later iteration of the data set.  But because the program does not re-execute that set, variable H_REF is retained throughout all the observations.  Meanwhile the other SET statement reads all the obs, from obs 1 to the end.  I.e., the two SET statements set up separate streams of data from HAVE.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

5 REPLIES 5
Reeza
Super User

RETAIN. 

 

RETAIN first_record_value;
if _n_=1 /*check for first record*/
     then first_record_value = height;

@ATLien wrote:

Hi everyone - 

 

I have a dataset where I'd like to compare a record against the first row in the dataset. For example, if I have the following:

 

ID    Height

1      40

2      35

3      40

4      58

 

I want to be able to create another variable that says which ones are greater than the first.

ID    Height    Flag

1      40          0

2      35          0

3      40          0

4      58          1

 

I don't think a lag will work since I always want to compare against record 1, not the record immediately preceding. And I can't get a first. command to work since I need to sort by ID in order to get my comparison row on the top, though I think there may be a way to do this that I can't figure out.

 

Any ideas? Thank you!

 



 

ATLien
Calcite | Level 5

When I try that, it just adds in a new variable with a record only for the first row. I still only have the relevant data in that top row, which I want to compare against data in subsequent rows.

Reeza
Super User

Did you include the RETAIN statement?

 

Post your exact code please.

mkeintz
PROC Star

Use two SET statements.  One SET read just the single first obs.  The other reads the entire data set, and does the compare:

 

data have;
  input id height;
datalines;
1      40
2      35
3      40
4      58
run;
data want (drop=h_ref);
  if _n_=1 then set have (keep=height rename=(height=h_ref));
  set have;
  flag=height>h_ref;
run;

 

The "if _n_=1 then set have ..." reads observation 1, renaming height to h_ref.  Just as importantly any variable read by SET  is automatically "retained" until that same SET statement is executed in a later iteration of the data set.  But because the program does not re-execute that set, variable H_REF is retained throughout all the observations.  Meanwhile the other SET statement reads all the obs, from obs 1 to the end.  I.e., the two SET statements set up separate streams of data from HAVE.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ATLien
Calcite | Level 5

Thank you! I had never thought to use two set statements but this works perfectly. What a great new trick!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 729 views
  • 1 like
  • 3 in conversation