BookmarkSubscribeRSS Feed
Dmcav6
Calcite | Level 5

I have a dataset with about 5688 records of subjects that have needed sedations for a procedure. The records have information such as the procedure type, the type of physician/professional administering the sedation, etc...So some doctors have more records because they do more procedures. 

I want to simulate (or re-create) this dataset, so that there are 146164 records or as close to this number of records as possible. I want to maintain the 'weighting' of records entered by the sedation doctor. 

 

Here's an example data. Let me describe what I desire by using this small sample. So say I had these 10 records, I want to have an output dataset of 146 records, resampled from this dataset of 10. I want to keep the 'weighting' or proportion of the physician type where the Anesthesiologist accounts for a majority of the records, in this example 7 out of 10 (70%). 

 

Procedure                       Sedative route   Physician type

Laceration/suture            IV                            Nurse Anesthesiologist

Dental procedure            IV                            Dental surgeon

Dental surgery                IV                            Dental surgeion

MRI                                 IV                            Anesthesiologist

Gastro endoscopy - U    IV                            Anesthesiologist

Gastro endoscopy – L    IV                            Anesthesiologist

Other                              IV                            Anesthesiologist

Lumbar puncture            IV                            Anesthesiologist

MRI                                 IV                            Anesthesiologist

CT scan                          IV                            Anesthesiologist

 

 

 

4 REPLIES 4
PaigeMiller
Diamond | Level 26

Dmcav6 wrote:

 

Here's an example data. Let me describe what I desire by using this small sample. So say I had these 10 records, I want to have an output dataset of 146 records, resampled from this dataset of 10. I want to keep the 'weighting' or proportion of the physician type where the Anesthesiologist accounts for a majority of the records, in this example 7 out of 10 (70%). 

 

Procedure                       Sedative route   Physician type

Laceration/suture            IV                            Nurse Anesthesiologist

Dental procedure            IV                            Dental surgeon

Dental surgery                IV                            Dental surgeion

MRI                                 IV                            Anesthesiologist

Gastro endoscopy - U    IV                            Anesthesiologist

Gastro endoscopy – L    IV                            Anesthesiologist

Other                              IV                            Anesthesiologist

Lumbar puncture            IV                            Anesthesiologist

MRI                                 IV                            Anesthesiologist

CT scan                          IV                            Anesthesiologist


Resampling can't give you exactly 70% anesthesiologist, it will give you some number that is a random deviation from 70%.

 

You could randomly pick integers 1 to 10 until you have 146 observations.

 

It's not clear to me why you would do this.

 

data randomselect;
    do I=1 to 146;
          randominteger=floor(uniform(0)*10);
          output;
    end;
run;
--
Paige Miller
Dmcav6
Calcite | Level 5

The reason why I am doing this is because the physicians are entering their data for the sedations that they perform. I do not know if they are entering selected records (ie. positive outcomes. Are they entering only sedation records that the subject does not experience an adverse event, theres more to the dataset that I wasn't explaining).

But, I do know the total number of sedation each physician performs each year.

Ultimately, I want to have the total number of records be equal to the number of sedations performed in the year.

 

So in my 10 sample dataset, I know the procedure record that the physician enters. I also know that I want to have a total of 146 records because that is the sum of the total number of sedations those physicians perform. I want to resample those 10 records to obtain 146 records. My denominator will be 146 records, and I want to summarize/analyze this.

 

I hope this helps you follow.
Thank you.

PaigeMiller
Diamond | Level 26

@Dmcav6 wrote:

I do not know if they are entering selected records (ie. positive outcomes. Are they entering only sedation records that the subject does not experience an adverse event, theres more to the dataset that I wasn't explaining).


If any of this is true, then random sampling isn't going to give you good data quality that represents the data that actually should have been recorded.

--
Paige Miller
Reeza
Super User

I'm not going to go into the validity of this approach - I suspect there's many, but I'm also certain that PROC SURVEYSELECT can do this. You can specify a dataset within the samplesize that shows the number required for each physician. 

 

Your better bet is to access the administrative data base that should have billing codes and adverse advent billing codes. It's much harder but it's a likely better data source. 

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 401 views
  • 0 likes
  • 3 in conversation