Name | Date | Fruit | Is_it_an_apple |
Tom | 2-Jan | apple | 1 |
Tom | 3-Jan | apple | 1 |
Tom | 4-Jan | orange | 0 |
Tom | 5-Jan | pear | 0 |
Tom | 6-Jan | peach | 0 |
Tom | 7-Jan | blueberry | 0 |
Tom | 8-Jan | blueberry | 0 |
Tom | 9-Jan | apple | 1 |
Mike | 2-Jan | orange | 0 |
Mike | 3-Jan | pear | 0 |
Mike | 4-Jan | apple | 1 |
Mike | 5-Jan | apple | 1 |
Mike | 6-Jan | blueberry | 0 |
Megan | 2-Jan | apple | 1 |
Megan | 3-Jan | orange | 0 |
Megan | 4-Jan | pear | 0 |
Lily | 2-Jan | apple | 1 |
Lily | 3-Jan | apple | 1 |
Lily | 4-Jan | blueberry | 0 |
Lily | 5-Jan | apple | 1 |
Lily | 6-Jan | orange | 0 |
Lily | 7-Jan | pear | 0 |
What I want
Name | Date | Fruit | Is_it_an_apple | Apple_Count_by_Name |
Tom | 2-Jan | apple | 1 | 3 |
Tom | 3-Jan | apple | 1 | 3 |
Tom | 4-Jan | orange | 0 | 3 |
Tom | 5-Jan | pear | 0 | 3 |
Tom | 6-Jan | peach | 0 | 3 |
Tom | 7-Jan | blueberry | 0 | 3 |
Tom | 8-Jan | blueberry | 0 | 3 |
Tom | 9-Jan | apple | 1 | 3 |
Mike | 2-Jan | orange | 0 | 2 |
Mike | 3-Jan | pear | 0 | 2 |
Mike | 4-Jan | apple | 1 | 2 |
Mike | 5-Jan | apple | 1 | 2 |
Mike | 6-Jan | blueberry | 0 | 2 |
Megan | 2-Jan | apple | 1 | 1 |
Megan | 3-Jan | orange | 0 | 1 |
Megan | 4-Jan | pear | 0 | 1 |
Lily | 2-Jan | apple | 1 | 3 |
Lily | 3-Jan | apple | 1 | 3 |
Lily | 4-Jan | blueberry | 0 | 3 |
Lily | 5-Jan | apple | 1 | 3 |
Lily | 6-Jan | orange | 0 | 3 |
Lily | 7-Jan | pear | 0 | 3 |
Thank you in advance for any help with this question!
All the best,
data have;
input Name $ Date $ Fruit $ Is_it_an_apple;
datalines;
Tom 2-Jan apple 1
Tom 3-Jan apple 1
Tom 4-Jan orange 0
Tom 5-Jan pear 0
Tom 6-Jan peach 0
Tom 7-Jan blueberry 0
Tom 8-Jan blueberry 0
Tom 9-Jan apple 1
Mike 2-Jan orange 0
Mike 3-Jan pear 0
Mike 4-Jan apple 1
Mike 5-Jan apple 1
Mike 6-Jan blueberry 0
Megan 2-Jan apple 1
Megan 3-Jan orange 0
Megan 4-Jan pear 0
Lily 2-Jan apple 1
Lily 3-Jan apple 1
Lily 4-Jan blueberry 0
Lily 5-Jan apple 1
Lily 6-Jan orange 0
Lily 7-Jan pear 0
;
proc sql;
create table want as
select *,
sum(Is_it_an_apple)
from have
group by name;
quit;
I appreciate your response very much!
But, how to create a variable named Apple_Count_by_Name, a variable name that I want, in the step that you just told me.
Best,
Like this
proc sql;
create table want as
select *,
sum(Is_it_an_apple) as Apple_Count_by_Name
from have
group by name;
quit;
This is exactly the same question as from:
For which you will get exactly the same answers, a simple change from count() to sum() is what you are looking for.
Alternatively, you can do this. This preserves the original order in the data set
data want;
do until (last.name);
set have;
by Name notsorted;
Apple_Count_by_Name=sum(Apple_Count_by_Name, Is_it_an_apple);
end;
do until (last.Name);
set have;
by Name notsorted;
output;
end;
run;
@CynthiaWei wrote:
Thank you very much! May I know what is "last" in the do until statement? And, there is a "." in between last and name?
Best,
When a BY statement is used in a data step SAS creates temporary variables to indicate that a specific value is the first or last of the by group variable. So FIRST.variable or LAST.variable is a true/false value that is true (or 1) when the observation is the first one with that value. The dot indicates the use of this special feature instead of expecting or creating a variable named firstvariable or lastvariable.
Multiple variables on the BY statement means each variable gets its own first and last indicators.
If First.variable = Last.variable then there is only one value in the by group.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.