BookmarkSubscribeRSS Feed
gonzy31
Calcite | Level 5

I am using a dataset that merges cross-sectional data from 2004-2014.  Physicians are noted by a unique phycode each year (although the phycode is repeated in subsequent years -- i.e. starts again with 1 each year -- even though we are talking about different physicians).  Therefore, both the phycode and year are needed to identify each unique physician in the dataset.  

 

I am interested in calculating the average number of patient visits (each unique visit is numbered within each physician -- "patcode") per physician and I created a new count variable to count the number of visits for each unique provider in each year:

 

proc sort data=test;

by phycode year;

run;

 

Data count;

Set test;

keep phycode patcode count insurance year;

by phycode year;

if first.phycode or first.year then count=1;

else count+1;

Run;

 

However, I had problems trying to then calculate the average number of visits per unique physician from this count.  I had tried 

 

Data count1;

Set count;

by phycode;

if count>=1;

if last.phycode  then output;

Run;

 

 

proc means data=count;

var count;

run;

 

However, this misses physicians in some years with the same phycode but different year (this is only giving me last physician code number overall with highest count, which will not work since there are duplicate codes in some years).  Any thoughts on how I could adjust the code above?

 

Thanks!

9 REPLIES 9
Reeza
Super User

Can you use SQL? Count Distinct would be helpful here.

 

proc sql;
create table want as
select count(distinct age), sex
from sashelp.class
group by sex;
quit;
gonzy31
Calcite | Level 5

What would that look like instead here?  Would it be:

 

proc sql;
create table want as
select count(distinct phycode), year
from test
group by phycode year;
quit;

 

 

 

 

Reeza
Super User

@gonzy31 wrote:

What would that look like instead here?  Would it be:

 

proc sql;
create table want as
select count(distinct phycode), year
from test
group by phycode year;
quit;

 

 

 

 


Depends on your data. Post sample data (as a data step) and expected output if you want code that works with your data. If you post generic questions, you get generic answers.

gonzy31
Calcite | Level 5

Here is a sample of the data with the variables I care about

 

proc sort data=test;
by phycode year;
run;

Data count;
Set test;
keep phycode patcode count  year;

 by phycode year;
if first.phycode or first.year then count=1;
else count+1;
Run;

 

PHYCODE PATCODE YEAR    count 
1112007 1
1122007 2
1132007 3
1142007 4
1152007 5
     
1312005 1
1322005 2
1332005 3
1342005 4
     
     
1312006 1
1322006 2
    

 

 

I want the output to look like this though:  

 

PHYCODE PATCODE YEAR      count 
1152007.5
1342005 4
1322006 2
     
Reeza
Super User

You've overcomplicated this. 

 

data want;
set have;
by phycode year;
if last.year;
run;
ballardw
Super User

@gonzy31 wrote:

I am using a dataset that merges cross-sectional data from 2004-2014.  Physicians are noted by a unique phycode each year (although the phycode is repeated in subsequent years -- i.e. starts again with 1 each year -- even though we are talking about different physicians).  Therefore, both the phycode and year are needed to identify each unique physician in the dataset.  

 

However, I had problems trying to then calculate the average number of visits per unique physician from this count.  I had tried 

 

 

However, this misses physicians in some years with the same phycode but different year (this is only giving me last physician code number overall with highest count, which will not work since there are duplicate codes in some years).  Any thoughts on how I could adjust the code above?

 

Thanks!


I am not sure I am understaning your last problem. You said that to uniquely identify in an given year tha tyou need both the year and the phycode. And then you are looking only at phycode?!?  If the physicican changes code from year to year how do you expect to recognize the same one from multiple years??? Or did you misstate the requirement that both year and phycode are required for unique identifcation and phycode alone identfies the physician and you just need to get counts per year?

 

It might help considerably to provide some dummy example data as far as what is needed to identify some records for one physician in multiple years and what the expected count should be. Include at least one where the problem you describe exists. The data should be in form of data step code.

 

I would thint that perhaps the first thing may be to actually create real unique per physician codes that work across years.

gonzy31
Calcite | Level 5

I think I misspoke.  Physicians are unique each year, but you need both the phycode and year to identify a unique physician (this being because with each year's data, the phycode identified starts from 1 again for each unique physician in that year).

 

SAS Output -- phycode is unique to each individual physician in a specific year but the same phycode can be used in another year for a different physician as seen here; therefore, need bothyear and phycode as identifier.  Patcode represents each unique patient visit to a physician.  I created a count variable to ensure that I could count total number of visits per physician in each unique year (see line 89 when count restarts with repeat phycode but in different year)

 
PHYCODE PATCODE YEAR insurance count 
111200701
112200712
113200713
114200714
115200715
116200716
117200707
118200718
119200719
11102007110
11112007111
11122007012
11132007113
11142007114
11152007115
11162007116
11172007117
11182007118
11192007119
11202007120
11212007121
11222007122
11232007123
11242007124
11252007025
11262007126
11272007127
11282007028
11292007129
11302007.30
131200511
132200512
133200513
134200504
135200515
136200516
137200517
138200518
139200519
13102005110
13112005111
13122005112
13132005113
13142005114
13152005115
13162005116
13172005117
13182005118
13192005119
13202005120
13212005121
13222005022
13232005123
13242005124
131200601
132200602
133200613
1342006.4
135200615
136200616
137200617
138200608
139200619
13102006010
13112006.11
13122006112
13132006.13
13142006114
13152006015
13162006116
13172006.17
13182006118
13192006119
13202006120
13212006121
13222006022
13232006123
13242006124
13252006025
13262006126
13272006127
13282006128
13292006.29
131200701
132200712
133200713
134200704
135200715
136200716
137200717
138200708
139200719
13102007110
13112007011
13122007012
13132007013
13142007014
13152007015
13162007016
13172007117

 

After creating this count variable though for the number of visits per physician, I am not sure how to calculate the average number of visits per provider.  I had tried to create a new dataset with just the last count per physician code and doing proc means on the count variable, but this loses the highest count for a physician code in earlier years when the same physician code is repeated over years.  

 

SAS Output

PHYCODE PATCODE YEAR insurance count 
11302007.30
13282007128
14202009120
15322010132
16132008113
17212008121
18302009130
19472009147
20252013125
21272008127
22352010135
23172012117
24132014113
26282012128
279200819
28452007145
29302008130
301201411
31332011133
32242013124
33402012140
34112014111
353201203
36402014140
Reeza
Super User

How are you defining the average of the providers? You can't combine over years, so you can only count the number of distinct per year, per doc and then average that number.

 

Your patient code has no duplicates...so what are you actually trying to count here? Count is equal to patcode. This data doesn't make sense to me.

 

 

gonzy31
Calcite | Level 5

I want the average number of visits per provider.  Each provider is unique in each year -- so you can take an overall average number of visits per provider over the ten years (as long as each provider is uniquely identified by year and phycode).  Patcode, you are correct, is the same as count here --- but not later when I restrict the visits to only patients without insurance, but this is why I am creating a distinct count variable.

 

The last.year output worked -- I think you were right that I was overcomplicating it.

 

I then just got the average count from the output above -- 

 

proc means data=test;
var count;
run;

 

I think this solved the issue... thanks!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1607 views
  • 0 likes
  • 3 in conversation