- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dear SAS experts
I would like to create a variable which includes only the first (lowest) value of another variable for each id in my dataset. I would like to copy this first value for every observation of id in this new variable. As an example:
data reading;
input ID Name $ Score;
cards;
1 David 45
1 David 74
2 Sam 53
2 Ram 54
3 Bane 87
3 Mary 92
3 Bane 87
4 Dane 23
5 Jenny 87
5 Ken 87
6 Simran 63
8 Priya 72
;
run;
*Sort the new dataset by id and score;
proc sort data = reading;
by id score;
run;
Let's call the new variable that I would like to create: firstvalue. In this new variable I would the first two observations (i.e. id 1) to be the first value of score (45) for id 1. For id 2 the first value of score is 53 and I would like therefore like the third and fourth observation to be 53. And so forth throughout the entire dataset.
Can anyone suggest some (simple) syntax to do perform the operation described above?
I have tried using the first. variable (https://www.listendata.com/2016/01/first-and-last-variables.html) but with no succes so far.
Thank you
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
How about
data reading;
input ID Name $ Score;
cards;
1 David 45
1 David 74
2 Sam 53
2 Ram 54
3 Bane 87
3 Mary 92
3 Bane 87
4 Dane 23
5 Jenny 87
5 Ken 87
6 Simran 63
8 Priya 72
;
run;
data want;
set reading;
by ID;
if first.id then _iorc_ = Score;
firstvalue = _iorc_;
run;
Result:
ID Name Score firstvalue 1 David 45 45 1 David 74 45 2 Sam 53 53 2 Ram 54 53 3 Bane 87 87 3 Mary 92 87 3 Bane 87 87 4 Dane 23 23 5 Jenny 87 87 5 Ken 87 87 6 Simran 63 63 8 Priya 72 72
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Does 'name' have any relevance with 'id'? If not, this gives you the lowest per ID.
proc sql;
create table want as
select
t1.*,
t2.low
from
reading as t1
inner join
(select
id,
min(score) as low
from
reading
group by
id) as t2
on t1.id = t2.id;
quit;
Obs ID Name Score low 1 1 David 45 45 2 1 David 74 45 3 2 Sam 53 53 4 2 Ram 54 53 5 3 Bane 87 87 6 3 Mary 92 87 7 3 Bane 87 87 8 4 Dane 23 23 9 5 Jenny 87 87 10 5 Ken 87 87 11 6 Simran 63 63 12 8 Priya 72 72
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dear Maguiremq
'Name' is as such not relevant for 'id' in this context.
Thanks for the syntax. I have so far not worked with sql but I find that soon I might have to given how much it can do. For the moment I was hoping for a non-sql solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
How about
data reading;
input ID Name $ Score;
cards;
1 David 45
1 David 74
2 Sam 53
2 Ram 54
3 Bane 87
3 Mary 92
3 Bane 87
4 Dane 23
5 Jenny 87
5 Ken 87
6 Simran 63
8 Priya 72
;
run;
data want;
set reading;
by ID;
if first.id then _iorc_ = Score;
firstvalue = _iorc_;
run;
Result:
ID Name Score firstvalue 1 David 45 45 1 David 74 45 2 Sam 53 53 2 Ram 54 53 3 Bane 87 87 3 Mary 92 87 3 Bane 87 87 4 Dane 23 23 5 Jenny 87 87 5 Ken 87 87 6 Simran 63 63 8 Priya 72 72
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hey PeterClemmensen
This is great. Thanks.
However, I am wondering whether or not this could be done just as simple by creating a macro variable (%let fvalue=**code**) in the data statement which contains the first value of score for first.id (as with _iorc_). I.e. could one in the fourth line of the second data step create the abovementioned macro variable and then state the new variable - firstvalue - should contain the content of this macro variable? In the context that I am going to use this code (for work) it could be an advantage (in terms of presentation) if the operation could be done using a macro variable.
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Macro variables are used for the keeping of values between steps, not within steps.
You can get your result without sorting by score first by using a double DO loop:
data want;
firstvalue = 99999999999;
do until (last.id);
set reading;
by id;
firstvalue = min(firstvalue,score);
end;
do until (last.id);
set reading;
by id;
output;
end;
run;
I assume your dataset is already sorted by id.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't understand why it is an advantage to use a macro variable because the code is to be used at work?
Unless your boss said so. And in that case, your boss is wrong 🙂
This is as simple as it gets if what you want is to take the first for each ID. If you want to have the Minimum for each id, I would go with @Kurt_Bremsers code.
Regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Got it. Thanks! 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If the data are already sorted by ID/SCORE, then:
data want;
set reading;
by id;
if first.id then set reading (keep=score rename=(score=low_score)) point=_n_;
run;
Added the "keep=score" option to the second SET statement. This has no impact on the results, but makes the purpose of the second SET more evident.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hey mkeintz
Great. Thanks. This syntax does the job.
Could you perhaps elaborate on the syntax in line 4 of the data statement? I am relatively new to SAS. It appears that based on the if statement for the first value of id (for each id) you are renaming a variable (or here you might actually be copying a variable and then renaming it) and creating another new variable which contains running numbers (1-12 in this case). It is however not clear to me how this ends up the way it does in the new dataset 'want'. It is not clear to me how the 'point' variable is used.
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@mgrasmussen wrote:
Hey mkeintz
Great. Thanks. This syntax does the job.
Could you perhaps elaborate on the syntax in line 4 of the data statement? I am relatively new to SAS. It appears that based on the if statement for the first value of id (for each id) you are renaming a variable (or here you might actually be copying a variable and then renaming it) and creating another new variable which contains running numbers (1-12 in this case). It is however not clear to me how this ends up the way it does in the new dataset 'want'. It is not clear to me how the 'point' variable is used.
Thank you
Usually the SET statement applies sequential to the specified dataset (i.e. reads obs 1, then obs2, ... through the last obs). But there are also a number of ways to read a dataset using "direct access" - for example, to read observation 100 without need to process observations 1 through 99.
One way to do direct access is using the POINT= option of the set command, as in
data sample;
p=10;
set sashelp.class point=p;
put name=;
output;
stop;
run;
You'll see name=John, the 10th obs of sashelp.class. Also, because the code uses then variable name P for the POINT= option, it is automatically excluded from the output data set. And in case you're wondering, you cannot use a constant in "POINT=" option, i.e. you can't use
set sashelp.class point=10;
So using "point=_N_" just says to read the N'th observation, where _N_ is an automatic variable tracking the iterations of the data step over the data set.
The other property to recognize is that any variable populated by a SET statement will have its value automatically retained, until overwritten by a programming statement, usually the next invocation of the same SET statement. In this program the variable LOW_SCORE is overwritten only when starting a new ID ("if first.id then ..."). All the other variables are overwritten with every new observation.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hey mkeintz
I really appreciate the explaination. Thanks.
I think I understand the concept of what you are saying. I definitely have a baseline understanding of the use of set in this way and will be able to attempt build my own code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@mkeintz wrote:
If the data are already sorted by ID/SCORE, then:
data want; set reading; by id; if first.id then set reading (rename=(score=low_score)) point=_n_; run;
You also should include the KEEP=SCORE dataset option. Otherwise any other variables in the dataset that is not ID or SCORE will get overwritten with the value from the first (minimum score) record.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Tom wrote:
@mkeintz wrote:
If the data are already sorted by ID/SCORE, then:
data want; set reading; by id; if first.id then set reading (rename=(score=low_score)) point=_n_; run;
You also should include the KEEP=SCORE dataset option. Otherwise any other variables in the dataset that is not ID or SCORE will get overwritten with the value from the first (minimum score) record.
On second thought I don't think your suggestion is necessary, because the only time those other values are overwritten is when first.id=1, meaning the overwriting uses the same values as are already there.
But I do think it's a good practice as a means of making the purpose of the code clear.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------