BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GKati
Pyrite | Level 9

 

data claims;

input client_id year specialty;

datalines;

16094 2012 3

16094 2012 89

16094 2013 89

16094 2014 13

16094 2015 3

16094 2016 27

16094 2016 30

16094 2016 30

;

run;

 

data claims;

  set claims;

  countbyyearspecialty+1;

  by client_id year specialty;

  if first.specialty then countbyyearspecialty=1;

output;

run;

 

And this is the output I get. Obviously this is wrong, but I can't find my mistake. 

client_id

year

specialty

countbyyearspecialty

16094

2012

3

1

16094

2012

89

1

16094

2013

89

1

16094

2014

13

1

16094

2015

3

1

16094

2016

27

1

16094

2016

30

1

16094

2016

30

10

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

When I run your code:

data claims;
input client_id year specialty;
datalines;
16094 2012 3
16094 2012 89
16094 2013 89
16094 2014 13
16094 2015 3
16094 2016 27
16094 2016 30
16094 2016 30
;
run;

data claims;
  set claims;
  countbyyearspecialty+1;
  by client_id year specialty;
  if first.specialty then countbyyearspecialty=1;
output; *note that this statement is not necesssary;
run;

proc print data=claims noobs;
run;

I get this:

client_
   id      year    specialty    countbyyearspecialty

 16094     2012         3                 1         
 16094     2012        89                 1         
 16094     2013        89                 1         
 16094     2014        13                 1         
 16094     2015         3                 1         
 16094     2016        27                 1         
 16094     2016        30                 1         
 16094     2016        30                 2         

which looks quite right to me.

View solution in original post

6 REPLIES 6
Kurt_Bremser
Super User

When I run your code:

data claims;
input client_id year specialty;
datalines;
16094 2012 3
16094 2012 89
16094 2013 89
16094 2014 13
16094 2015 3
16094 2016 27
16094 2016 30
16094 2016 30
;
run;

data claims;
  set claims;
  countbyyearspecialty+1;
  by client_id year specialty;
  if first.specialty then countbyyearspecialty=1;
output; *note that this statement is not necesssary;
run;

proc print data=claims noobs;
run;

I get this:

client_
   id      year    specialty    countbyyearspecialty

 16094     2012         3                 1         
 16094     2012        89                 1         
 16094     2013        89                 1         
 16094     2014        13                 1         
 16094     2015         3                 1         
 16094     2016        27                 1         
 16094     2016        30                 1         
 16094     2016        30                 2         

which looks quite right to me.

GKati
Pyrite | Level 9

Interestingly, once I have deleted the variable countbyyearspecialty and re-ran the command, I got the same results. 

 

Thanks Everyone.

Kurt_Bremser
Super User

RETAINing a variable (explicitly, or implicitly by using the variable + value; statement) only makes sense if it is not already on (one of) the input dataset(s). Any seemingly retained value will always be overwritten with the value from the dataset when a new observation is read.

PeterClemmensen
Tourmaline | Level 20

Instead of a 10, I get a 2 in the last observation of countbyyearspecialty when running

 

data claims;
input client_id year specialty;
datalines;
16094 2012 3
16094 2012 89
16094 2013 89
16094 2014 13
16094 2015 3
16094 2016 27
16094 2016 30
16094 2016 30
;

proc sort data = claims;
	by client_id year specialty;
run;

data claims;
  set claims;
  countbyyearspecialty+1;
  by client_id year specialty;
  if first.specialty then countbyyearspecialty=1;
  output;
run;

, which seems right?

 

GKati
Pyrite | Level 9

So here is a follow-up:

 

I want to take the maximum value (of countbyyearspecialty) by year and specialty.

 

DATA claims;

set claims;

BY client_id year specialty;

IF FIRST.specialty THEN claimsperyearspecialty = countbyyearspecialty;

ELSE  claimsperyearspecialty = claimsperyearspecialty;

RUN;

 

This would be my desired outcome:

 

client_idyearspecialtycountbyyearspecialtyclaimsbyyearspecialty
160942012311
1609420128911
1609420138911
1609420141311
160942015311
1609420162711
1609420163012
1609420163022

 

Thanks

Kurt_Bremser
Super User

Since I've already started to solve this with data steps, I'll stay with them:

data claims;
input client_id year specialty;
datalines;
16094 2012 3
16094 2012 89
16094 2013 89
16094 2014 13
16094 2015 3
16094 2016 27
16094 2016 30
16094 2016 30
;
run;

data
  claims
  sums (
    keep=client_id year specialty countbyyearspecialty
    rename=(countbyyearspecialty=claimsbyyearspecialty)
  )
;
set claims;
countbyyearspecialty + 1;
by client_id year specialty;
if first.specialty then countbyyearspecialty = 1;
output claims;
if last.specialty then output sums;
run;

data claims_final;
merge
  claims
  sums
;
by client_id year specialty;
run;

proc print data=claims_final noobs;
run;

Result

client_
   id      year    specialty    countbyyearspecialty    claimsbyyearspecialty

 16094     2012         3                 1                       1          
 16094     2012        89                 1                       1          
 16094     2013        89                 1                       1          
 16094     2014        13                 1                       1          
 16094     2015         3                 1                       1          
 16094     2016        27                 1                       1          
 16094     2016        30                 1                       2          
 16094     2016        30                 2                       2          

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2222 views
  • 0 likes
  • 3 in conversation