data claims;
input client_id year specialty;
datalines;
16094 2012 3
16094 2012 89
16094 2013 89
16094 2014 13
16094 2015 3
16094 2016 27
16094 2016 30
16094 2016 30
;
run;
data claims;
set claims;
countbyyearspecialty+1;
by client_id year specialty;
if first.specialty then countbyyearspecialty=1;
output;
run;
And this is the output I get. Obviously this is wrong, but I can't find my mistake.
client_id | year | specialty | countbyyearspecialty |
16094 | 2012 | 3 | 1 |
16094 | 2012 | 89 | 1 |
16094 | 2013 | 89 | 1 |
16094 | 2014 | 13 | 1 |
16094 | 2015 | 3 | 1 |
16094 | 2016 | 27 | 1 |
16094 | 2016 | 30 | 1 |
16094 | 2016 | 30 | 10 |
When I run your code:
data claims;
input client_id year specialty;
datalines;
16094 2012 3
16094 2012 89
16094 2013 89
16094 2014 13
16094 2015 3
16094 2016 27
16094 2016 30
16094 2016 30
;
run;
data claims;
set claims;
countbyyearspecialty+1;
by client_id year specialty;
if first.specialty then countbyyearspecialty=1;
output; *note that this statement is not necesssary;
run;
proc print data=claims noobs;
run;
I get this:
client_ id year specialty countbyyearspecialty 16094 2012 3 1 16094 2012 89 1 16094 2013 89 1 16094 2014 13 1 16094 2015 3 1 16094 2016 27 1 16094 2016 30 1 16094 2016 30 2
which looks quite right to me.
When I run your code:
data claims;
input client_id year specialty;
datalines;
16094 2012 3
16094 2012 89
16094 2013 89
16094 2014 13
16094 2015 3
16094 2016 27
16094 2016 30
16094 2016 30
;
run;
data claims;
set claims;
countbyyearspecialty+1;
by client_id year specialty;
if first.specialty then countbyyearspecialty=1;
output; *note that this statement is not necesssary;
run;
proc print data=claims noobs;
run;
I get this:
client_ id year specialty countbyyearspecialty 16094 2012 3 1 16094 2012 89 1 16094 2013 89 1 16094 2014 13 1 16094 2015 3 1 16094 2016 27 1 16094 2016 30 1 16094 2016 30 2
which looks quite right to me.
Interestingly, once I have deleted the variable countbyyearspecialty and re-ran the command, I got the same results.
Thanks Everyone.
RETAINing a variable (explicitly, or implicitly by using the variable + value; statement) only makes sense if it is not already on (one of) the input dataset(s). Any seemingly retained value will always be overwritten with the value from the dataset when a new observation is read.
Instead of a 10, I get a 2 in the last observation of countbyyearspecialty when running
data claims;
input client_id year specialty;
datalines;
16094 2012 3
16094 2012 89
16094 2013 89
16094 2014 13
16094 2015 3
16094 2016 27
16094 2016 30
16094 2016 30
;
proc sort data = claims;
by client_id year specialty;
run;
data claims;
set claims;
countbyyearspecialty+1;
by client_id year specialty;
if first.specialty then countbyyearspecialty=1;
output;
run;
, which seems right?
So here is a follow-up:
I want to take the maximum value (of countbyyearspecialty) by year and specialty.
DATA claims;
set claims;
BY client_id year specialty;
IF FIRST.specialty THEN claimsperyearspecialty = countbyyearspecialty;
ELSE claimsperyearspecialty = claimsperyearspecialty;
RUN;
This would be my desired outcome:
client_id | year | specialty | countbyyearspecialty | claimsbyyearspecialty |
16094 | 2012 | 3 | 1 | 1 |
16094 | 2012 | 89 | 1 | 1 |
16094 | 2013 | 89 | 1 | 1 |
16094 | 2014 | 13 | 1 | 1 |
16094 | 2015 | 3 | 1 | 1 |
16094 | 2016 | 27 | 1 | 1 |
16094 | 2016 | 30 | 1 | 2 |
16094 | 2016 | 30 | 2 | 2 |
Thanks
Since I've already started to solve this with data steps, I'll stay with them:
data claims;
input client_id year specialty;
datalines;
16094 2012 3
16094 2012 89
16094 2013 89
16094 2014 13
16094 2015 3
16094 2016 27
16094 2016 30
16094 2016 30
;
run;
data
claims
sums (
keep=client_id year specialty countbyyearspecialty
rename=(countbyyearspecialty=claimsbyyearspecialty)
)
;
set claims;
countbyyearspecialty + 1;
by client_id year specialty;
if first.specialty then countbyyearspecialty = 1;
output claims;
if last.specialty then output sums;
run;
data claims_final;
merge
claims
sums
;
by client_id year specialty;
run;
proc print data=claims_final noobs;
run;
Result
client_ id year specialty countbyyearspecialty claimsbyyearspecialty 16094 2012 3 1 1 16094 2012 89 1 1 16094 2013 89 1 1 16094 2014 13 1 1 16094 2015 3 1 1 16094 2016 27 1 1 16094 2016 30 1 2 16094 2016 30 2 2
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.