Re: Simulation in data step or proc sql

Cruise · Posted 01-30-2018 10:01 AM

How to calculate all possible number of disease cases across the probability of risk developing a disease from 0 through 1 by interval of 0.01 among inflow and outflow groups?

1. n_inflow=number of inflow *risk (p [0,1 by 0.01])?

1. n_outflow=number of outflow *risk (p [0,1 by 0.01])?

2. take cartesian product of diff(n_inflow - n_outflow) to compute the difference

3. total population + diff

I'll highly appreciate your suggestions or example/demo codes.

In attached data:

tot_case=number of high-risk individual

tot_pop= total population

sum_inflow=n_inflow

sum_outflow=n_outflow

using SAS 9.4.

ballardw · Posted 01-30-2018 10:45 AM

If you have a SAS data set use this: Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

If we have to read your csv it is not unlikely that we make choices such that variables are different types than yours and code possibly would not be compatible.

And show what you have tried.

Cruise · Posted 01-30-2018 10:52 AM

Original form of data was csv. And I imported to SAS using proc import. Attached csv file is the original data that I simply kept first N=131 observations. Do you still want me to follow the instruction? Just let me know please. Thanks for help in advance.

Reeza · Posted 01-30-2018 10:58 AM

It's not that simple. Your outflow/inflow can be larger than your new_cases and you need to account for that as well.

Cruise · Posted 01-30-2018 11:04 AM

Hi Reeza, I'd compute the probability of all new cases to move in and move out independent from the extent of outflow and total inflow, at the moment. Let me start with the simple mutually exclusive probability.

Reeza · Posted 01-30-2018 11:06 AM

Maybe I'm misunderstanding, because I assume you didn't have the probability which is why you were simulating what happens when the probability goes from 0 to 1 in each case.

Cruise · Posted 01-30-2018 11:19 AM

Reeza, you're right. Good catch! I appreciate it. I just edited my post to how it's supposed to be worded.

Cruise · Posted 01-30-2018 11:24 AM

This is my primordial effort to program. At least how I visualize the problem. I will link data cartes_join to my original case data. I used interval 0.01 to make it categorical because I don't know how to run continuous simulation yet.

data risk; 
input risk;
datalines;
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
;
data risk; set risk ;
obs_num=_n_;
run;

proc sql; 
 create table cartes_join as 
 select
 a.obs_num as obs_num_1, b.obs_num as obs_num_2,
 a.risk as risk_out, b.risk as risk_in
 from
 risk as a INNER JOIN risk as b  
 on
a.obs_num >b.obs_num; 
quit;

Reeza · Posted 01-30-2018 11:35 AM

Now take that data set and take a single line of your data and create the numbers you need - so use the Max/MIN functions to determine your N and each of the calculations you need.

Once you have it running for a single iteration, I can help you make it into a macro but unfortunately I don't have time to spend on this - it's a more complex issue but totally doable and I'm trying to work through my own complex recoding at the moment 😞

Cruise · Posted 01-30-2018 12:33 PM

@Reeza

I have programmed below so far. I'm baffled here. I wanted to summarize the data by agecat and the risk intervals so that I can plot risk vs bias stratified by agecat. I'd appreciate any comments.

data risk; 
input risk;
datalines;
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
;
data risk; set risk ;
obs_num=_n_;
run;

proc sql; 
 create table cartes_join as 
 select
 a.obs_num as obs_num_1, b.obs_num as obs_num_2,
 a.risk as risk_out, b.risk as risk_in
 from
 risk as a INNER JOIN risk as b  
 on
a.obs_num >b.obs_num; 
quit;

proc sql;
create table final_risk as
select *
from cartes_join cross join support;
quit;

data final_risk; set final_risk;
cases_among_inflow=risk_in*sum_inflow;
cases_among_outflow=risk_out*sum_outflow;
pop_adj_mobility=tot_pop+cases_among_inflow-cases_among_outflow-tot_case;
rate_adj=tot_case/pop_adj_mobility;
rate_crude=tot_case/(tot_pop-tot_case);
bias=((rate_adj-rate_crude)/rate_crude)*100;
run;

Reeza · Posted 01-30-2018 12:46 PM

I can't run your code so 😞

I'm not sure how that calculation would work overall tbh and if you can't define it, then I'm definitely not sure how it would work.

Cruise · Posted 01-30-2018 01:05 PM

Reeza, did you indicate here that you're trying more complex code to solve the problem I posted here? I'd love to see your approach so.

Reeza · Posted 01-30-2018 01:13 PM

No, I tried to run your code and it didn't run.

But I don't see where you linked it back to your pop data and did the calculations for the possible N that you were looking for

Cruise · Posted 01-30-2018 01:23 PM

my pop data is named support in the second proc sql where I'm linking risk Cartesian file back to pop data =support. Hope you can run now. Possible Ns are cases_among_inflow and cases_among_outflow as shown in the data step. Please let me know your comments.

Reeza · Posted 01-30-2018 02:47 PM

I got here, not sure about how the next set of calcs need to happen, but from here you can calculate the numbers you need and then aggregate them to what you need.

If you post that back once you have it figured out I can help you generalize it a bit more.

Because this is still in development stages I only used a single row from the support data to simplify this and for others to work off, if desired.

data support;
	input county agecat tot_pop tot_case sum_inflow sum_outflow;
	cards;
1 12 17881 235 373 217
;
run;

data risk;
	do risk=0 to 1 by 0.05;
		output;
	end;
run;

proc sql;
	create table risk_cross as
		select t1.risk as risk_in, t2.risk as risk_out, t3.*
			from risk as t1
				cross join risk as t2
				cross join support as t3;
quit;

data calcs;
	set risk_cross;
	cases_in = min(risk_in*sum_inflow, tot_case);
	cases_out = min(risk_out*sum_outflow, tot_case);
run;

SAS Innovate 2025: Save the Date