BookmarkSubscribeRSS Feed
Cruise
Ammonite | Level 13

 

How to calculate all possible number of disease cases across the probability of risk developing a disease from 0 through 1 by interval of 0.01 among inflow and outflow groups?

 

1. n_inflow=number of inflow *risk (p [0,1 by 0.01])?

1. n_outflow=number of outflow *risk (p [0,1 by 0.01])?

2. take cartesian product of diff(n_inflow - n_outflow) to compute the difference

3. total population + diff

 

I'll highly appreciate your suggestions or example/demo codes.

 

In attached data:

tot_case=number of high-risk individual

tot_pop= total population

sum_inflow=n_inflow

sum_outflow=n_outflow

 

using SAS 9.4.

14 REPLIES 14
ballardw
Super User

If you have a SAS data set use this: Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

If we have to read your csv it is not unlikely that we make choices such that variables are different types than yours and code possibly would not be compatible.

 

And show what you have tried.

Cruise
Ammonite | Level 13
Original form of data was csv. And I imported to SAS using proc import. Attached csv file is the original data that I simply kept first N=131 observations. Do you still want me to follow the instruction? Just let me know please. Thanks for help in advance.
Reeza
Super User

It's not that simple. Your outflow/inflow can be larger than your new_cases and you need to account for that as well.

Cruise
Ammonite | Level 13
Hi Reeza, I'd compute the probability of all new cases to move in and move out independent from the extent of outflow and total inflow, at the moment. Let me start with the simple mutually exclusive probability.
Reeza
Super User

Maybe I'm misunderstanding, because I assume you didn't have the probability which is why you were simulating what happens when the probability goes from 0 to 1 in each case. 

Cruise
Ammonite | Level 13
Reeza, you're right. Good catch! I appreciate it. I just edited my post to how it's supposed to be worded.
Cruise
Ammonite | Level 13

This is my primordial effort to program. At least how I visualize the problem. I will link data cartes_join to my original case data. I used interval 0.01 to make it categorical because I don't know how to run continuous simulation yet. 

 

data risk; 
input risk;
datalines;
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
;
data risk; set risk ;
obs_num=_n_;
run;

proc sql; 
 create table cartes_join as 
 select
 a.obs_num as obs_num_1, b.obs_num as obs_num_2,
 a.risk as risk_out, b.risk as risk_in
 from
 risk as a INNER JOIN risk as b  
 on
a.obs_num >b.obs_num; 
quit;

 

Reeza
Super User

Now take that data set and take a single line of your data and create the numbers you need - so use the Max/MIN functions to determine your N and each of the calculations you need. 

 

Once you have it running for a single iteration, I can help you make it into a macro but unfortunately I don't have time to spend on this - it's a more complex issue but totally doable and I'm trying to work through my own complex recoding at the moment 😞

Cruise
Ammonite | Level 13

@Reeza

I have programmed below so far. I'm baffled here. I wanted to summarize the data by agecat and the risk intervals so that I can plot risk vs bias stratified by agecat. I'd appreciate any comments.

 

bias_risk plot.png

 

data risk; 
input risk;
datalines;
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
;
data risk; set risk ;
obs_num=_n_;
run;

proc sql; 
 create table cartes_join as 
 select
 a.obs_num as obs_num_1, b.obs_num as obs_num_2,
 a.risk as risk_out, b.risk as risk_in
 from
 risk as a INNER JOIN risk as b  
 on
a.obs_num >b.obs_num; 
quit;

proc sql;
create table final_risk as
select *
from cartes_join cross join support;
quit;

data final_risk; set final_risk;
cases_among_inflow=risk_in*sum_inflow;
cases_among_outflow=risk_out*sum_outflow;
pop_adj_mobility=tot_pop+cases_among_inflow-cases_among_outflow-tot_case;
rate_adj=tot_case/pop_adj_mobility;
rate_crude=tot_case/(tot_pop-tot_case);
bias=((rate_adj-rate_crude)/rate_crude)*100;
run; 
Reeza
Super User
I can't run your code so 😞

I'm not sure how that calculation would work overall tbh and if you can't define it, then I'm definitely not sure how it would work.
Cruise
Ammonite | Level 13
Reeza, did you indicate here that you're trying more complex code to solve the problem I posted here? I'd love to see your approach so.
Reeza
Super User

No, I tried to run your code and it didn't run. 


But I don't see where you linked it back to your pop data and did the calculations for the possible N that you were looking for

Cruise
Ammonite | Level 13
my pop data is named support in the second proc sql where I'm linking risk Cartesian file back to pop data =support. Hope you can run now. Possible Ns are cases_among_inflow and cases_among_outflow as shown in the data step. Please let me know your comments.
Reeza
Super User

I got here, not sure about how the next set of calcs need to happen, but from here you can calculate the numbers you need and then aggregate them to what you need. 

If you post that back once you have it figured out I can help you generalize it a bit more. 

 

Because this is still in development stages I only used a single row from the support data to simplify this and for others to work off, if desired. 

 

data support;
	input county agecat tot_pop tot_case sum_inflow sum_outflow;
	cards;
1 12 17881 235 373 217
;
run;

data risk;
	do risk=0 to 1 by 0.05;
		output;
	end;
run;

proc sql;
	create table risk_cross as
		select t1.risk as risk_in, t2.risk as risk_out, t3.*
			from risk as t1
				cross join risk as t2
				cross join support as t3;
quit;

data calcs;
	set risk_cross;
	cases_in = min(risk_in*sum_inflow, tot_case);
	cases_out = min(risk_out*sum_outflow, tot_case);
run;

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 2633 views
  • 2 likes
  • 3 in conversation