Solved: Creating count variable by client_id, year and specialty

GKati · Posted 06-14-2017 07:45 AM

data claims;

input client_id year specialty;

datalines;

16094 2012 3

16094 2012 89

16094 2013 89

16094 2014 13

16094 2015 3

16094 2016 27

16094 2016 30

;

run;

data claims;

set claims;

countbyyearspecialty+1;

by client_id year specialty;

if first.specialty then countbyyearspecialty=1;

output;

run;

And this is the output I get. Obviously this is wrong, but I can't find my mistake.

client_id	year	specialty	countbyyearspecialty
16094	2012	3	1
16094	2012	89	1
16094	2013	89	1
16094	2014	13	1
16094	2015	3	1
16094	2016	27	1
16094	2016	30	1
16094	2016	30	10

Kurt_Bremser · Posted 06-14-2017 07:59 AM

When I run your code:

data claims;
input client_id year specialty;
datalines;
16094 2012 3
16094 2012 89
16094 2013 89
16094 2014 13
16094 2015 3
16094 2016 27
16094 2016 30
16094 2016 30
;
run;

data claims;
  set claims;
  countbyyearspecialty+1;
  by client_id year specialty;
  if first.specialty then countbyyearspecialty=1;
output; *note that this statement is not necesssary;
run;

proc print data=claims noobs;
run;

I get this:

client_
   id      year    specialty    countbyyearspecialty

 16094     2012         3                 1         
 16094     2012        89                 1         
 16094     2013        89                 1         
 16094     2014        13                 1         
 16094     2015         3                 1         
 16094     2016        27                 1         
 16094     2016        30                 1         
 16094     2016        30                 2

which looks quite right to me.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 06-14-2017 07:59 AM

When I run your code:

data claims;
input client_id year specialty;
datalines;
16094 2012 3
16094 2012 89
16094 2013 89
16094 2014 13
16094 2015 3
16094 2016 27
16094 2016 30
16094 2016 30
;
run;

data claims;
  set claims;
  countbyyearspecialty+1;
  by client_id year specialty;
  if first.specialty then countbyyearspecialty=1;
output; *note that this statement is not necesssary;
run;

proc print data=claims noobs;
run;

I get this:

client_
   id      year    specialty    countbyyearspecialty

 16094     2012         3                 1         
 16094     2012        89                 1         
 16094     2013        89                 1         
 16094     2014        13                 1         
 16094     2015         3                 1         
 16094     2016        27                 1         
 16094     2016        30                 1         
 16094     2016        30                 2

which looks quite right to me.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

GKati · Posted 06-14-2017 08:14 AM

Interestingly, once I have deleted the variable countbyyearspecialty and re-ran the command, I got the same results.

Thanks Everyone.

Kurt_Bremser · Posted 06-14-2017 08:43 AM

RETAINing a variable (explicitly, or implicitly by using the variable + value; statement) only makes sense if it is not already on (one of) the input dataset(s). Any seemingly retained value will always be overwritten with the value from the dataset when a new observation is read.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PeterClemmensen · Posted 06-14-2017 08:02 AM

Instead of a 10, I get a 2 in the last observation of countbyyearspecialty when running

data claims;
input client_id year specialty;
datalines;
16094 2012 3
16094 2012 89
16094 2013 89
16094 2014 13
16094 2015 3
16094 2016 27
16094 2016 30
16094 2016 30
;

proc sort data = claims;
	by client_id year specialty;
run;

data claims;
  set claims;
  countbyyearspecialty+1;
  by client_id year specialty;
  if first.specialty then countbyyearspecialty=1;
  output;
run;

, which seems right?

The DATA to DATA Step Macro
Blog: SASnrd

GKati · Posted 06-14-2017 09:53 AM

So here is a follow-up:

I want to take the maximum value (of countbyyearspecialty) by year and specialty.

DATA claims;

set claims;

BY client_id year specialty;

IF FIRST.specialty THEN claimsperyearspecialty = countbyyearspecialty;

ELSE claimsperyearspecialty = claimsperyearspecialty;

RUN;

This would be my desired outcome:

client_id	year	specialty	countbyyearspecialty	claimsbyyearspecialty
16094	2012	3	1	1
16094	2012	89	1	1
16094	2013	89	1	1
16094	2014	13	1	1
16094	2015	3	1	1
16094	2016	27	1	1
16094	2016	30	1	2
16094	2016	30	2	2

Thanks

Kurt_Bremser · Posted 06-14-2017 10:03 AM

Since I've already started to solve this with data steps, I'll stay with them:

data claims;
input client_id year specialty;
datalines;
16094 2012 3
16094 2012 89
16094 2013 89
16094 2014 13
16094 2015 3
16094 2016 27
16094 2016 30
16094 2016 30
;
run;

data
  claims
  sums (
    keep=client_id year specialty countbyyearspecialty
    rename=(countbyyearspecialty=claimsbyyearspecialty)
  )
;
set claims;
countbyyearspecialty + 1;
by client_id year specialty;
if first.specialty then countbyyearspecialty = 1;
output claims;
if last.specialty then output sums;
run;

data claims_final;
merge
  claims
  sums
;
by client_id year specialty;
run;

proc print data=claims_final noobs;
run;

Result

client_
   id      year    specialty    countbyyearspecialty    claimsbyyearspecialty

 16094     2012         3                 1                       1          
 16094     2012        89                 1                       1          
 16094     2013        89                 1                       1          
 16094     2014        13                 1                       1          
 16094     2015         3                 1                       1          
 16094     2016        27                 1                       1          
 16094     2016        30                 1                       2          
 16094     2016        30                 2                       2

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Creating count variable by client_id, year and specialty

Re: Creating count variable by client_id, year and specialty

Re: Creating count variable by client_id, year and specialty

Re: Creating count variable by client_id, year and specialty

Re: Creating count variable by client_id, year and specialty

Re: Creating count variable by client_id, year and specialty

Re: Creating count variable by client_id, year and specialty

Re: Creating count variable by client_id, year and specialty

Catch up on SAS Innovate 2026