data claims;
input client_id year specialty;
datalines;
16094 2012 3
16094 2012 89
16094 2013 89
16094 2014 13
16094 2015 3
16094 2016 27
16094 2016 30
16094 2016 30
;
run;
data claims;
set claims;
countbyyearspecialty+1;
by client_id year specialty;
if first.specialty then countbyyearspecialty=1;
output;
run;
And this is the output I get. Obviously this is wrong, but I can't find my mistake.
client_id | year | specialty | countbyyearspecialty |
16094 | 2012 | 3 | 1 |
16094 | 2012 | 89 | 1 |
16094 | 2013 | 89 | 1 |
16094 | 2014 | 13 | 1 |
16094 | 2015 | 3 | 1 |
16094 | 2016 | 27 | 1 |
16094 | 2016 | 30 | 1 |
16094 | 2016 | 30 | 10 |
When I run your code:
data claims;
input client_id year specialty;
datalines;
16094 2012 3
16094 2012 89
16094 2013 89
16094 2014 13
16094 2015 3
16094 2016 27
16094 2016 30
16094 2016 30
;
run;
data claims;
set claims;
countbyyearspecialty+1;
by client_id year specialty;
if first.specialty then countbyyearspecialty=1;
output; *note that this statement is not necesssary;
run;
proc print data=claims noobs;
run;
I get this:
client_ id year specialty countbyyearspecialty 16094 2012 3 1 16094 2012 89 1 16094 2013 89 1 16094 2014 13 1 16094 2015 3 1 16094 2016 27 1 16094 2016 30 1 16094 2016 30 2
which looks quite right to me.
When I run your code:
data claims;
input client_id year specialty;
datalines;
16094 2012 3
16094 2012 89
16094 2013 89
16094 2014 13
16094 2015 3
16094 2016 27
16094 2016 30
16094 2016 30
;
run;
data claims;
set claims;
countbyyearspecialty+1;
by client_id year specialty;
if first.specialty then countbyyearspecialty=1;
output; *note that this statement is not necesssary;
run;
proc print data=claims noobs;
run;
I get this:
client_ id year specialty countbyyearspecialty 16094 2012 3 1 16094 2012 89 1 16094 2013 89 1 16094 2014 13 1 16094 2015 3 1 16094 2016 27 1 16094 2016 30 1 16094 2016 30 2
which looks quite right to me.
Interestingly, once I have deleted the variable countbyyearspecialty and re-ran the command, I got the same results.
Thanks Everyone.
RETAINing a variable (explicitly, or implicitly by using the variable + value; statement) only makes sense if it is not already on (one of) the input dataset(s). Any seemingly retained value will always be overwritten with the value from the dataset when a new observation is read.
Instead of a 10, I get a 2 in the last observation of countbyyearspecialty when running
data claims;
input client_id year specialty;
datalines;
16094 2012 3
16094 2012 89
16094 2013 89
16094 2014 13
16094 2015 3
16094 2016 27
16094 2016 30
16094 2016 30
;
proc sort data = claims;
by client_id year specialty;
run;
data claims;
set claims;
countbyyearspecialty+1;
by client_id year specialty;
if first.specialty then countbyyearspecialty=1;
output;
run;
, which seems right?
So here is a follow-up:
I want to take the maximum value (of countbyyearspecialty) by year and specialty.
DATA claims;
set claims;
BY client_id year specialty;
IF FIRST.specialty THEN claimsperyearspecialty = countbyyearspecialty;
ELSE claimsperyearspecialty = claimsperyearspecialty;
RUN;
This would be my desired outcome:
client_id | year | specialty | countbyyearspecialty | claimsbyyearspecialty |
16094 | 2012 | 3 | 1 | 1 |
16094 | 2012 | 89 | 1 | 1 |
16094 | 2013 | 89 | 1 | 1 |
16094 | 2014 | 13 | 1 | 1 |
16094 | 2015 | 3 | 1 | 1 |
16094 | 2016 | 27 | 1 | 1 |
16094 | 2016 | 30 | 1 | 2 |
16094 | 2016 | 30 | 2 | 2 |
Thanks
Since I've already started to solve this with data steps, I'll stay with them:
data claims;
input client_id year specialty;
datalines;
16094 2012 3
16094 2012 89
16094 2013 89
16094 2014 13
16094 2015 3
16094 2016 27
16094 2016 30
16094 2016 30
;
run;
data
claims
sums (
keep=client_id year specialty countbyyearspecialty
rename=(countbyyearspecialty=claimsbyyearspecialty)
)
;
set claims;
countbyyearspecialty + 1;
by client_id year specialty;
if first.specialty then countbyyearspecialty = 1;
output claims;
if last.specialty then output sums;
run;
data claims_final;
merge
claims
sums
;
by client_id year specialty;
run;
proc print data=claims_final noobs;
run;
Result
client_ id year specialty countbyyearspecialty claimsbyyearspecialty 16094 2012 3 1 1 16094 2012 89 1 1 16094 2013 89 1 1 16094 2014 13 1 1 16094 2015 3 1 1 16094 2016 27 1 1 16094 2016 30 1 2 16094 2016 30 2 2
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.