Re: Output statement for counts - trying to calculate mean count per p...

gonzy31 · Posted 07-26-2017 03:22 PM

I am using a dataset that merges cross-sectional data from 2004-2014. Physicians are noted by a unique phycode each year (although the phycode is repeated in subsequent years -- i.e. starts again with 1 each year -- even though we are talking about different physicians). Therefore, both the phycode and year are needed to identify each unique physician in the dataset.

I am interested in calculating the average number of patient visits (each unique visit is numbered within each physician -- "patcode") per physician and I created a new count variable to count the number of visits for each unique provider in each year:

proc sort data=test;

by phycode year;

run;

Data count;

Set test;

keep phycode patcode count insurance year;

by phycode year;

if first.phycode or first.year then count=1;

else count+1;

Run;

However, I had problems trying to then calculate the average number of visits per unique physician from this count. I had tried

Data count1;

Set count;

by phycode;

if count>=1;

if last.phycode then output;

Run;

proc means data=count;

var count;

run;

However, this misses physicians in some years with the same phycode but different year (this is only giving me last physician code number overall with highest count, which will not work since there are duplicate codes in some years). Any thoughts on how I could adjust the code above?

Thanks!

Reeza · Posted 07-26-2017 03:36 PM

Can you use SQL? Count Distinct would be helpful here.

proc sql;
create table want as
select count(distinct age), sex
from sashelp.class
group by sex;
quit;

gonzy31 · Posted 07-26-2017 03:55 PM

What would that look like instead here? Would it be:

proc sql;
create table want as
select count(distinct phycode), year
from test
group by phycode year;
quit;

Reeza · Posted 07-26-2017 04:02 PM

@gonzy31 wrote:

What would that look like instead here? Would it be:

proc sql;
create table want as
select count(distinct phycode), year
from test
group by phycode year;
quit;

Depends on your data. Post sample data (as a data step) and expected output if you want code that works with your data. If you post generic questions, you get generic answers.

gonzy31 · Posted 07-26-2017 04:18 PM

Here is a sample of the data with the variables I care about

proc sort data=test;
by phycode year;
run;

Data count;
Set test;
keep phycode patcode count year;

by phycode year;
if first.phycode or first.year then count=1;
else count+1;
Run;

PHYCODE PATCODE YEAR count

11	1	2007	1
11	2	2007	2
11	3	2007	3
11	4	2007	4
11	5	2007	5

13	1	2005	1
13	2	2005	2
13	3	2005	3
13	4	2005	4


13	1	2006	1
13	2	2006	2

I want the output to look like this though:

PHYCODE PATCODE YEAR count

11	5	2007	.	5
13	4	2005		4
13	2	2006		2

Reeza · Posted 07-26-2017 04:23 PM

You've overcomplicated this.

data want;
set have;
by phycode year;
if last.year;
run;

ballardw · Posted 07-26-2017 03:54 PM

@gonzy31 wrote:

I am using a dataset that merges cross-sectional data from 2004-2014. Physicians are noted by a unique phycode each year (although the phycode is repeated in subsequent years -- i.e. starts again with 1 each year -- even though we are talking about different physicians). Therefore, both the phycode and year are needed to identify each unique physician in the dataset.

However, I had problems trying to then calculate the average number of visits per unique physician from this count. I had tried

However, this misses physicians in some years with the same phycode but different year (this is only giving me last physician code number overall with highest count, which will not work since there are duplicate codes in some years). Any thoughts on how I could adjust the code above?

Thanks!

I am not sure I am understaning your last problem. You said that to uniquely identify in an given year tha tyou need both the year and the phycode. And then you are looking only at phycode?!? If the physicican changes code from year to year how do you expect to recognize the same one from multiple years??? Or did you misstate the requirement that both year and phycode are required for unique identifcation and phycode alone identfies the physician and you just need to get counts per year?

It might help considerably to provide some dummy example data as far as what is needed to identify some records for one physician in multiple years and what the expected count should be. Include at least one where the problem you describe exists. The data should be in form of data step code.

I would thint that perhaps the first thing may be to actually create real unique per physician codes that work across years.

gonzy31 · Posted 07-26-2017 04:08 PM

I think I misspoke. Physicians are unique each year, but you need both the phycode and year to identify a unique physician (this being because with each year's data, the phycode identified starts from 1 again for each unique physician in that year).

SAS Output -- phycode is unique to each individual physician in a specific year but the same phycode can be used in another year for a different physician as seen here; therefore, need bothyear and phycode as identifier. Patcode represents each unique patient visit to a physician. I created a count variable to ensure that I could count total number of visits per physician in each unique year (see line 89 when count restarts with repeat phycode but in different year)

PHYCODE PATCODE YEAR insurance count

11	1	2007	0	1
11	2	2007	1	2
11	3	2007	1	3
11	4	2007	1	4
11	5	2007	1	5
11	6	2007	1	6
11	7	2007	0	7
11	8	2007	1	8
11	9	2007	1	9
11	10	2007	1	10
11	11	2007	1	11
11	12	2007	0	12
11	13	2007	1	13
11	14	2007	1	14
11	15	2007	1	15
11	16	2007	1	16
11	17	2007	1	17
11	18	2007	1	18
11	19	2007	1	19
11	20	2007	1	20
11	21	2007	1	21
11	22	2007	1	22
11	23	2007	1	23
11	24	2007	1	24
11	25	2007	0	25
11	26	2007	1	26
11	27	2007	1	27
11	28	2007	0	28
11	29	2007	1	29
11	30	2007	.	30
13	1	2005	1	1
13	2	2005	1	2
13	3	2005	1	3
13	4	2005	0	4
13	5	2005	1	5
13	6	2005	1	6
13	7	2005	1	7
13	8	2005	1	8
13	9	2005	1	9
13	10	2005	1	10
13	11	2005	1	11
13	12	2005	1	12
13	13	2005	1	13
13	14	2005	1	14
13	15	2005	1	15
13	16	2005	1	16
13	17	2005	1	17
13	18	2005	1	18
13	19	2005	1	19
13	20	2005	1	20
13	21	2005	1	21
13	22	2005	0	22
13	23	2005	1	23
13	24	2005	1	24
13	1	2006	0	1
13	2	2006	0	2
13	3	2006	1	3
13	4	2006	.	4
13	5	2006	1	5
13	6	2006	1	6
13	7	2006	1	7
13	8	2006	0	8
13	9	2006	1	9
13	10	2006	0	10
13	11	2006	.	11
13	12	2006	1	12
13	13	2006	.	13
13	14	2006	1	14
13	15	2006	0	15
13	16	2006	1	16
13	17	2006	.	17
13	18	2006	1	18
13	19	2006	1	19
13	20	2006	1	20
13	21	2006	1	21
13	22	2006	0	22
13	23	2006	1	23
13	24	2006	1	24
13	25	2006	0	25
13	26	2006	1	26
13	27	2006	1	27
13	28	2006	1	28
13	29	2006	.	29
13	1	2007	0	1
13	2	2007	1	2
13	3	2007	1	3
13	4	2007	0	4
13	5	2007	1	5
13	6	2007	1	6
13	7	2007	1	7
13	8	2007	0	8
13	9	2007	1	9
13	10	2007	1	10
13	11	2007	0	11
13	12	2007	0	12
13	13	2007	0	13
13	14	2007	0	14
13	15	2007	0	15
13	16	2007	0	16
13	17	2007	1	17

After creating this count variable though for the number of visits per physician, I am not sure how to calculate the average number of visits per provider. I had tried to create a new dataset with just the last count per physician code and doing proc means on the count variable, but this loses the highest count for a physician code in earlier years when the same physician code is repeated over years.

SAS Output

PHYCODE PATCODE YEAR insurance count

11	30	2007	.	30
13	28	2007	1	28
14	20	2009	1	20
15	32	2010	1	32
16	13	2008	1	13
17	21	2008	1	21
18	30	2009	1	30
19	47	2009	1	47
20	25	2013	1	25
21	27	2008	1	27
22	35	2010	1	35
23	17	2012	1	17
24	13	2014	1	13
26	28	2012	1	28
27	9	2008	1	9
28	45	2007	1	45
29	30	2008	1	30
30	1	2014	1	1
31	33	2011	1	33
32	24	2013	1	24
33	40	2012	1	40
34	11	2014	1	11
35	3	2012	0	3
36	40	2014	1	40

Reeza · Posted 07-26-2017 04:20 PM

How are you defining the average of the providers? You can't combine over years, so you can only count the number of distinct per year, per doc and then average that number.

Your patient code has no duplicates...so what are you actually trying to count here? Count is equal to patcode. This data doesn't make sense to me.

gonzy31 · Posted 07-26-2017 05:11 PM

I want the average number of visits per provider. Each provider is unique in each year -- so you can take an overall average number of visits per provider over the ten years (as long as each provider is uniquely identified by year and phycode). Patcode, you are correct, is the same as count here --- but not later when I restrict the visits to only patients without insurance, but this is why I am creating a distinct count variable.

The last.year output worked -- I think you were right that I was overcomplicating it.

I then just got the average count from the output above --

proc means data=test;
var count;
run;

I think this solved the issue... thanks!

Output statement for counts - trying to calculate mean count per provider (with duplicate IDs)

Re: Output statement for counts - trying to calculate mean count per provider (with duplicate IDs)

Re: Output statement for counts - trying to calculate mean count per provider (with duplicate IDs)

Re: Output statement for counts - trying to calculate mean count per provider (with duplicate IDs)

Re: Output statement for counts - trying to calculate mean count per provider (with duplicate IDs)

Re: Output statement for counts - trying to calculate mean count per provider (with duplicate IDs)

Re: Output statement for counts - trying to calculate mean count per provider (with duplicate IDs)

Re: Output statement for counts - trying to calculate mean count per provider (with duplicate IDs)

Re: Output statement for counts - trying to calculate mean count per provider (with duplicate IDs)

Re: Output statement for counts - trying to calculate mean count per provider (with duplicate IDs)

Registration is open

SAS Training: Just a Click Away