Simulate (or re-create) a dataset

Reply
New Contributor
Posts: 2

Simulate (or re-create) a dataset

I have a dataset with about 5688 records of subjects that have needed sedations for a procedure. The records have information such as the procedure type, the type of physician/professional administering the sedation, etc...So some doctors have more records because they do more procedures. 

I want to simulate (or re-create) this dataset, so that there are 146164 records or as close to this number of records as possible. I want to maintain the 'weighting' of records entered by the sedation doctor. 

 

Here's an example data. Let me describe what I desire by using this small sample. So say I had these 10 records, I want to have an output dataset of 146 records, resampled from this dataset of 10. I want to keep the 'weighting' or proportion of the physician type where the Anesthesiologist accounts for a majority of the records, in this example 7 out of 10 (70%). 

 

Procedure                       Sedative route   Physician type

Laceration/suture            IV                            Nurse Anesthesiologist

Dental procedure            IV                            Dental surgeon

Dental surgery                IV                            Dental surgeion

MRI                                 IV                            Anesthesiologist

Gastro endoscopy - U    IV                            Anesthesiologist

Gastro endoscopy – L    IV                            Anesthesiologist

Other                              IV                            Anesthesiologist

Lumbar puncture            IV                            Anesthesiologist

MRI                                 IV                            Anesthesiologist

CT scan                          IV                            Anesthesiologist

 

 

 

Respected Advisor
Posts: 2,843

Re: Simulate (or re-create) a dataset


Dmcav6 wrote:

 

Here's an example data. Let me describe what I desire by using this small sample. So say I had these 10 records, I want to have an output dataset of 146 records, resampled from this dataset of 10. I want to keep the 'weighting' or proportion of the physician type where the Anesthesiologist accounts for a majority of the records, in this example 7 out of 10 (70%). 

 

Procedure                       Sedative route   Physician type

Laceration/suture            IV                            Nurse Anesthesiologist

Dental procedure            IV                            Dental surgeon

Dental surgery                IV                            Dental surgeion

MRI                                 IV                            Anesthesiologist

Gastro endoscopy - U    IV                            Anesthesiologist

Gastro endoscopy – L    IV                            Anesthesiologist

Other                              IV                            Anesthesiologist

Lumbar puncture            IV                            Anesthesiologist

MRI                                 IV                            Anesthesiologist

CT scan                          IV                            Anesthesiologist


Resampling can't give you exactly 70% anesthesiologist, it will give you some number that is a random deviation from 70%.

 

You could randomly pick integers 1 to 10 until you have 146 observations.

 

It's not clear to me why you would do this.

 

data randomselect;
    do I=1 to 146;
          randominteger=floor(uniform(0)*10);
          output;
    end;
run;
--
Paige Miller
New Contributor
Posts: 2

Re: Simulate (or re-create) a dataset

Posted in reply to PaigeMiller

The reason why I am doing this is because the physicians are entering their data for the sedations that they perform. I do not know if they are entering selected records (ie. positive outcomes. Are they entering only sedation records that the subject does not experience an adverse event, theres more to the dataset that I wasn't explaining).

But, I do know the total number of sedation each physician performs each year.

Ultimately, I want to have the total number of records be equal to the number of sedations performed in the year.

 

So in my 10 sample dataset, I know the procedure record that the physician enters. I also know that I want to have a total of 146 records because that is the sum of the total number of sedations those physicians perform. I want to resample those 10 records to obtain 146 records. My denominator will be 146 records, and I want to summarize/analyze this.

 

I hope this helps you follow.
Thank you.

Respected Advisor
Posts: 2,843

Re: Simulate (or re-create) a dataset


Dmcav6 wrote:

I do not know if they are entering selected records (ie. positive outcomes. Are they entering only sedation records that the subject does not experience an adverse event, theres more to the dataset that I wasn't explaining).


If any of this is true, then random sampling isn't going to give you good data quality that represents the data that actually should have been recorded.

--
Paige Miller
Super User
Posts: 23,357

Re: Simulate (or re-create) a dataset

I'm not going to go into the validity of this approach - I suspect there's many, but I'm also certain that PROC SURVEYSELECT can do this. You can specify a dataset within the samplesize that shows the number required for each physician. 

 

Your better bet is to access the administrative data base that should have billing codes and adverse advent billing codes. It's much harder but it's a likely better data source. 

 

 

Ask a Question
Discussion stats
  • 4 replies
  • 121 views
  • 0 likes
  • 3 in conversation