BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mgrasmussen
Quartz | Level 8

Dear SAS experts

 

I would like to create a variable which includes only the first (lowest) value of another variable for each id in my dataset. I would like to copy this first value for every observation of id in this new variable. As an example:

 

data reading;
input ID Name $ Score;
cards;
1 David 45
1 David 74
2 Sam 53
2 Ram 54
3 Bane 87
3 Mary 92
3 Bane 87
4 Dane 23
5 Jenny 87
5 Ken 87
6 Simran 63
8 Priya 72
;
run;

 

*Sort the new dataset by id and score;
proc sort data = reading;
by id score;
run;

 

Let's call the new variable that I would like to create: firstvalue. In this new variable I would the first two observations (i.e. id 1) to be the first value of score (45) for id 1. For id 2 the first value of score is 53 and I would like therefore like the third and fourth observation to be 53. And so forth throughout the entire dataset.

 

Can anyone suggest some (simple) syntax to do perform the operation described above?

 

I have tried using the first. variable (https://www.listendata.com/2016/01/first-and-last-variables.html) but with no succes so far.

 

Thank you 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

How about

 

data reading;
input ID Name $ Score;
cards;
1 David 45
1 David 74
2 Sam 53
2 Ram 54
3 Bane 87
3 Mary 92
3 Bane 87
4 Dane 23
5 Jenny 87
5 Ken 87
6 Simran 63
8 Priya 72
;
run;

data want;
   set reading;
   by ID;
   if first.id then _iorc_ = Score;
   firstvalue = _iorc_;
run;

 

Result:

 

ID Name   Score firstvalue 
1  David  45    45 
1  David  74    45 
2  Sam    53    53 
2  Ram    54    53 
3  Bane   87    87 
3  Mary   92    87 
3  Bane   87    87 
4  Dane   23    23 
5  Jenny  87    87 
5  Ken    87    87 
6  Simran 63    63 
8  Priya  72    72 

 

 

View solution in original post

15 REPLIES 15
maguiremq
SAS Super FREQ

Does 'name' have any relevance with 'id'? If not, this gives you the lowest per ID.

 

proc sql;
	create table 	want as
		select
					t1.*,
					t2.low
		from
					reading as t1
					 inner join
					(select
								id,
								min(score) as low
					 from
					 			reading
					 group by
					 			id) as t2
						on t1.id = t2.id;
quit;
Obs ID Name Score low 
1 1 David 45 45 
2 1 David 74 45 
3 2 Sam 53 53 
4 2 Ram 54 53 
5 3 Bane 87 87 
6 3 Mary 92 87 
7 3 Bane 87 87 
8 4 Dane 23 23 
9 5 Jenny 87 87 
10 5 Ken 87 87 
11 6 Simran 63 63 
12 8 Priya 72 72 
mgrasmussen
Quartz | Level 8

Dear Maguiremq

 

'Name' is as such not relevant for 'id' in this context.

 

Thanks for the syntax. I have so far not worked with sql but I find that soon I might have to given how much it can do. For the moment I was hoping for a non-sql solution.

PeterClemmensen
Tourmaline | Level 20

How about

 

data reading;
input ID Name $ Score;
cards;
1 David 45
1 David 74
2 Sam 53
2 Ram 54
3 Bane 87
3 Mary 92
3 Bane 87
4 Dane 23
5 Jenny 87
5 Ken 87
6 Simran 63
8 Priya 72
;
run;

data want;
   set reading;
   by ID;
   if first.id then _iorc_ = Score;
   firstvalue = _iorc_;
run;

 

Result:

 

ID Name   Score firstvalue 
1  David  45    45 
1  David  74    45 
2  Sam    53    53 
2  Ram    54    53 
3  Bane   87    87 
3  Mary   92    87 
3  Bane   87    87 
4  Dane   23    23 
5  Jenny  87    87 
5  Ken    87    87 
6  Simran 63    63 
8  Priya  72    72 

 

 

mgrasmussen
Quartz | Level 8

Hey PeterClemmensen

 

This is great. Thanks.

 

However, I am wondering whether or not this could be done just as simple by creating a macro variable (%let fvalue=**code**) in the data statement which contains the first value of score for first.id (as with _iorc_). I.e. could one in the fourth line of the second data step create the abovementioned macro variable and then state the new variable - firstvalue - should contain the content of this macro variable? In the context that I am going to use this code (for work) it could be an advantage (in terms of presentation) if the operation could be done using a macro variable.

 

Thank you

Kurt_Bremser
Super User

Macro variables are used for the keeping of values between steps, not within steps.

 

You can get your result without sorting by score first by using a double DO loop:

data want;
firstvalue = 99999999999;
do until (last.id);
  set reading;
  by id;
  firstvalue = min(firstvalue,score);
end;
do until (last.id);
  set reading;
  by id;
  output;
end;
run;

I assume your dataset is already sorted by id.

PeterClemmensen
Tourmaline | Level 20

I don't understand why it is an advantage to use a macro variable because the code is to be used at work?

 

Unless your boss said so. And in that case, your boss is wrong 🙂

 

This is as simple as it gets if what you want is to take the first for each ID. If you want to have the Minimum for each id, I would go with @Kurt_Bremsers code.

 

Regards

mkeintz
PROC Star

If the data are already sorted by ID/SCORE, then:

 

data want;
  set reading;
  by id;
  if first.id then set reading (keep=score rename=(score=low_score)) point=_n_;
run;

Added the "keep=score" option to the second SET statement.  This has no impact on the results, but makes the purpose of the second SET more evident. 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mgrasmussen
Quartz | Level 8

Hey mkeintz

 

Great. Thanks. This syntax does the job.

 

Could you perhaps elaborate on the syntax in line 4 of the data statement? I am relatively new to SAS. It appears that based on the if statement for the first value of id (for each id) you are renaming a variable (or here you might actually be copying a variable and then renaming it) and creating another new variable which contains running numbers (1-12 in this case). It is however not clear to me how this ends up the way it does in the new dataset 'want'. It is not clear to me how the 'point' variable is used.

 

Thank you

mkeintz
PROC Star

@mgrasmussen wrote:

Hey mkeintz

 

Great. Thanks. This syntax does the job.

 

Could you perhaps elaborate on the syntax in line 4 of the data statement? I am relatively new to SAS. It appears that based on the if statement for the first value of id (for each id) you are renaming a variable (or here you might actually be copying a variable and then renaming it) and creating another new variable which contains running numbers (1-12 in this case). It is however not clear to me how this ends up the way it does in the new dataset 'want'. It is not clear to me how the 'point' variable is used.

 

Thank you


Usually the SET statement applies sequential to the specified dataset (i.e. reads obs 1, then obs2, ... through the last obs).   But there are also a number of ways to read a dataset using "direct access" - for example, to read observation 100 without need to process observations 1 through 99.

 

One way to do direct access is using the POINT= option of the set command, as in

data sample;
  p=10;
  set sashelp.class point=p;
  put  name=;
  output;
  stop;
run;

You'll see name=John, the 10th obs of sashelp.class.   Also, because the code uses then variable name P for the POINT= option, it is automatically excluded from the output data set.  And in case you're wondering, you cannot use  a constant in "POINT=" option, i.e. you can't use

  set sashelp.class point=10;

So using "point=_N_" just says to read the N'th observation, where _N_ is an automatic variable tracking the iterations of the data step over the data set.

 

The other property to recognize is that any variable populated by a SET statement will have its value automatically retained, until overwritten by a programming statement, usually the next invocation of the same SET statement.  In this program the variable LOW_SCORE is overwritten only when starting a new ID ("if first.id then ...").  All the other variables are overwritten with every new observation.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mgrasmussen
Quartz | Level 8

Hey mkeintz

 

I really appreciate the explaination. Thanks.

 

I think I understand the concept of what you are saying. I definitely have a baseline understanding of the use of set in this way and will be able to attempt build my own code.

Tom
Super User Tom
Super User

@mkeintz wrote:

If the data are already sorted by ID/SCORE, then:

 

data want;
  set reading;
  by id;
  if first.id then set reading (rename=(score=low_score)) point=_n_;
run;

You also should include the KEEP=SCORE dataset option. Otherwise any other variables in the dataset  that is not ID or SCORE will get overwritten with the value from the first (minimum score) record.

mkeintz
PROC Star
Thx Tom, will do.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

@Tom wrote:

@mkeintz wrote:

If the data are already sorted by ID/SCORE, then:

 

data want;
  set reading;
  by id;
  if first.id then set reading (rename=(score=low_score)) point=_n_;
run;

You also should include the KEEP=SCORE dataset option. Otherwise any other variables in the dataset  that is not ID or SCORE will get overwritten with the value from the first (minimum score) record.


@Tom 

 

On second thought I don't think your suggestion is necessary, because the only time those other values are overwritten is when first.id=1, meaning the overwriting uses the same values as are already there.

 

But I do think it's a good practice as a means of making the purpose of the code clear.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 12790 views
  • 14 likes
  • 6 in conversation