Solved: Copy first value in variable to all observations for each id

mgrasmussen · Posted 08-27-2021 01:43 PM

Dear SAS experts

I would like to create a variable which includes only the first (lowest) value of another variable for each id in my dataset. I would like to copy this first value for every observation of id in this new variable. As an example:

data reading;
input ID Name $ Score;
cards;
1 David 45
1 David 74
2 Sam 53
2 Ram 54
3 Bane 87
3 Mary 92
3 Bane 87
4 Dane 23
5 Jenny 87
5 Ken 87
6 Simran 63
8 Priya 72
;
run;

*Sort the new dataset by id and score;
proc sort data = reading;
by id score;
run;

Let's call the new variable that I would like to create: firstvalue. In this new variable I would the first two observations (i.e. id 1) to be the first value of score (45) for id 1. For id 2 the first value of score is 53 and I would like therefore like the third and fourth observation to be 53. And so forth throughout the entire dataset.

Can anyone suggest some (simple) syntax to do perform the operation described above?

I have tried using the first. variable (https://www.listendata.com/2016/01/first-and-last-variables.html) but with no succes so far.

Thank you

PeterClemmensen · Posted 08-27-2021 02:09 PM

How about

data reading;
input ID Name $ Score;
cards;
1 David 45
1 David 74
2 Sam 53
2 Ram 54
3 Bane 87
3 Mary 92
3 Bane 87
4 Dane 23
5 Jenny 87
5 Ken 87
6 Simran 63
8 Priya 72
;
run;

data want;
   set reading;
   by ID;
   if first.id then _iorc_ = Score;
   firstvalue = _iorc_;
run;

Result:

ID Name   Score firstvalue 
1  David  45    45 
1  David  74    45 
2  Sam    53    53 
2  Ram    54    53 
3  Bane   87    87 
3  Mary   92    87 
3  Bane   87    87 
4  Dane   23    23 
5  Jenny  87    87 
5  Ken    87    87 
6  Simran 63    63 
8  Priya  72    72

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

maguiremq · Posted 08-27-2021 01:58 PM

Does 'name' have any relevance with 'id'? If not, this gives you the lowest per ID.

proc sql;
	create table 	want as
		select
					t1.*,
					t2.low
		from
					reading as t1
					 inner join
					(select
								id,
								min(score) as low
					 from
					 			reading
					 group by
					 			id) as t2
						on t1.id = t2.id;
quit;

Obs ID Name Score low 
1 1 David 45 45 
2 1 David 74 45 
3 2 Sam 53 53 
4 2 Ram 54 53 
5 3 Bane 87 87 
6 3 Mary 92 87 
7 3 Bane 87 87 
8 4 Dane 23 23 
9 5 Jenny 87 87 
10 5 Ken 87 87 
11 6 Simran 63 63 
12 8 Priya 72 72

mgrasmussen · Posted 08-27-2021 03:09 PM

Dear Maguiremq

'Name' is as such not relevant for 'id' in this context.

Thanks for the syntax. I have so far not worked with sql but I find that soon I might have to given how much it can do. For the moment I was hoping for a non-sql solution.

PeterClemmensen · Posted 08-27-2021 02:09 PM

How about

data reading;
input ID Name $ Score;
cards;
1 David 45
1 David 74
2 Sam 53
2 Ram 54
3 Bane 87
3 Mary 92
3 Bane 87
4 Dane 23
5 Jenny 87
5 Ken 87
6 Simran 63
8 Priya 72
;
run;

data want;
   set reading;
   by ID;
   if first.id then _iorc_ = Score;
   firstvalue = _iorc_;
run;

Result:

ID Name   Score firstvalue 
1  David  45    45 
1  David  74    45 
2  Sam    53    53 
2  Ram    54    53 
3  Bane   87    87 
3  Mary   92    87 
3  Bane   87    87 
4  Dane   23    23 
5  Jenny  87    87 
5  Ken    87    87 
6  Simran 63    63 
8  Priya  72    72

The DATA to DATA Step Macro
Blog: SASnrd

mgrasmussen · Posted 08-27-2021 03:05 PM

Hey PeterClemmensen

This is great. Thanks.

However, I am wondering whether or not this could be done just as simple by creating a macro variable (%let fvalue=**code**) in the data statement which contains the first value of score for first.id (as with _iorc_). I.e. could one in the fourth line of the second data step create the abovementioned macro variable and then state the new variable - firstvalue - should contain the content of this macro variable? In the context that I am going to use this code (for work) it could be an advantage (in terms of presentation) if the operation could be done using a macro variable.

Thank you

Kurt_Bremser · Posted 08-28-2021 02:17 AM

Macro variables are used for the keeping of values between steps, not within steps.

You can get your result without sorting by score first by using a double DO loop:

data want;
firstvalue = 99999999999;
do until (last.id);
  set reading;
  by id;
  firstvalue = min(firstvalue,score);
end;
do until (last.id);
  set reading;
  by id;
  output;
end;
run;

I assume your dataset is already sorted by id.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PeterClemmensen · Posted 08-28-2021 03:00 AM

I don't understand why it is an advantage to use a macro variable because the code is to be used at work?

Unless your boss said so. And in that case, your boss is wrong 🙂

This is as simple as it gets if what you want is to take the first for each ID. If you want to have the Minimum for each id, I would go with @Kurt_Bremsers code.

Regards

The DATA to DATA Step Macro
Blog: SASnrd

mgrasmussen · Posted 08-31-2021 03:13 AM

Got it. Thanks! 🙂

mkeintz · Posted 08-27-2021 11:59 PM

If the data are already sorted by ID/SCORE, then:

data want;
  set reading;
  by id;
  if first.id then set reading (keep=score rename=(score=low_score)) point=_n_;
run;

Added the "keep=score" option to the second SET statement. This has no impact on the results, but makes the purpose of the second SET more evident.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

mgrasmussen · Posted 08-28-2021 02:01 AM

Hey mkeintz

Great. Thanks. This syntax does the job.

Could you perhaps elaborate on the syntax in line 4 of the data statement? I am relatively new to SAS. It appears that based on the if statement for the first value of id (for each id) you are renaming a variable (or here you might actually be copying a variable and then renaming it) and creating another new variable which contains running numbers (1-12 in this case). It is however not clear to me how this ends up the way it does in the new dataset 'want'. It is not clear to me how the 'point' variable is used.

Thank you

mkeintz · Posted 08-28-2021 07:12 AM

@mgrasmussen wrote:

Hey mkeintz

Great. Thanks. This syntax does the job.

Could you perhaps elaborate on the syntax in line 4 of the data statement? I am relatively new to SAS. It appears that based on the if statement for the first value of id (for each id) you are renaming a variable (or here you might actually be copying a variable and then renaming it) and creating another new variable which contains running numbers (1-12 in this case). It is however not clear to me how this ends up the way it does in the new dataset 'want'. It is not clear to me how the 'point' variable is used.

Thank you

Usually the SET statement applies sequential to the specified dataset (i.e. reads obs 1, then obs2, ... through the last obs). But there are also a number of ways to read a dataset using "direct access" - for example, to read observation 100 without need to process observations 1 through 99.

One way to do direct access is using the POINT= option of the set command, as in

data sample;
  p=10;
  set sashelp.class point=p;
  put  name=;
  output;
  stop;
run;

You'll see name=John, the 10th obs of sashelp.class. Also, because the code uses then variable name P for the POINT= option, it is automatically excluded from the output data set. And in case you're wondering, you cannot use a constant in "POINT=" option, i.e. you can't use

  set sashelp.class point=10;

So using "point=_N_" just says to read the N'th observation, where _N_ is an automatic variable tracking the iterations of the data step over the data set.

The other property to recognize is that any variable populated by a SET statement will have its value automatically retained, until overwritten by a programming statement, usually the next invocation of the same SET statement. In this program the variable LOW_SCORE is overwritten only when starting a new ID ("if first.id then ..."). All the other variables are overwritten with every new observation.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

mgrasmussen · Posted 08-31-2021 03:23 AM

Hey mkeintz

I really appreciate the explaination. Thanks.

I think I understand the concept of what you are saying. I definitely have a baseline understanding of the use of set in this way and will be able to attempt build my own code.

Tom · Posted 08-28-2021 09:39 AM

@mkeintz wrote:

If the data are already sorted by ID/SCORE, then:
data want;
  set reading;
  by id;
  if first.id then set reading (rename=(score=low_score)) point=_n_;
run;

You also should include the KEEP=SCORE dataset option. Otherwise any other variables in the dataset that is not ID or SCORE will get overwritten with the value from the first (minimum score) record.

mkeintz · Posted 08-28-2021 09:48 AM

Thx Tom, will do.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

mkeintz · Posted 08-28-2021 09:57 AM

@Tom wrote:
@mkeintz wrote:

If the data are already sorted by ID/SCORE, then:
data want;
  set reading;
  by id;
  if first.id then set reading (rename=(score=low_score)) point=_n_;
run;
You also should include the KEEP=SCORE dataset option. Otherwise any other variables in the dataset that is not ID or SCORE will get overwritten with the value from the first (minimum score) record.

@Tom

On second thought I don't think your suggestion is necessary, because the only time those other values are overwritten is when first.id=1, meaning the overwriting uses the same values as are already there.

But I do think it's a good practice as a means of making the purpose of the code clear.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Copy first value in variable to all observations for each id

Re: Copy first value in variable to all observations for each id

Re: Copy first value in variable to all observations for each id

Re: Copy first value in variable to all observations for each id

Re: Copy first value in variable to all observations for each id

Re: Copy first value in variable to all observations for each id

Re: Copy first value in variable to all observations for each id

Re: Copy first value in variable to all observations for each id

Re: Copy first value in variable to all observations for each id

Re: Copy first value in variable to all observations for each id

Re: Copy first value in variable to all observations for each id

Re: Copy first value in variable to all observations for each id

Re: Copy first value in variable to all observations for each id

Re: Copy first value in variable to all observations for each id

Re: Copy first value in variable to all observations for each id

Re: Copy first value in variable to all observations for each id

SAS Innovate 2025: Call for Content

Classroom Training Available!