About csessa3

csessa3 · ‎07-20-2020

State is a string variable with the state abbreviation (FL, GA, AL, etc.) The error is that my machine runs out of memory. It will spin for 20-30 minutes and then say insufficient memory. When I open the file it will have categorical variables for 10 of the 50 states. So it works, but it can't complete the task for the entire file.

csessa3 · ‎07-20-2020

It's a long story, but the gist is that I have a dataset of medical providers who are listed in multiple states within the dataset. I am going to use the dummy variables to calculate which state they are listed in the most to use this as their "primary" state. So for example, if doctor A is listed in Florida 2 times and Georgia 1 time, I want to say he is a Florida doctor.

csessa3 · ‎07-20-2020

The GLMSELECT and the proc logistic work for creating the categorical variables when the sample size is reduced. I have previously hard coded the state indicators and run my final regression model with no issue, so I am not worried about my final model not working. I am not familiar about the PROC SURVEYSELECT and STRATA method you have suggested. Would you be able to give me a little more clarification?

csessa3 · ‎07-20-2020

Hello, thank you for identifying the issue. Do you know how to correct it? What is wrong with it?

csessa3 · ‎07-20-2020

Hello. I would be fine using the logistic class statement, but it will not run for my large sample. Which is why I am looking for an alternate solution

csessa3 · ‎07-19-2020

Hello, I would like to make state indicator dummy variables for a large dataset (n > 600,000) I have found documentation suggesting to run a prog glm or logistic. This method works for me if I reduce the sample size, but it will not run on my computer with the entire sample. I have also hard coded it before, but am hoping to find something more efficient as it is making my code very long and difficult to sift through. I have also seen the following code, but I am receiving an error saying "Array Subscript out of range." I have yet to decipher what that means. This is the code I have most recently tried: data indicators; set miss_states1; array dummys {*} st1 - st2; do i=1 to DIM(dummys); dummys(i) = 0; end; dummys(state) = 1; run; Thank you in advance to anyone who can help!

csessa3 · ‎06-05-2020

I'm sorry, that was a typo on my part, I will see if I can edit my previous comment. Doctor D should be listed as working with 1 other docotor

csessa3 · ‎06-05-2020

Hello, thank you for taking time to review my question. The code below is close as it adjusts the count to not double count the individual Doctor (Doctor E is listed in 3 practices so it will subtract 3 for that.) However, does not adjust for when the same doctors are in multiple practices together. For example, Doctor A only works with 1 other doctor (Doctor E.) It just so happens that Doctor A and Doctor E work in 2 practices together (P1, P2) In contrast, Doctor E works with 2 other doctors, Doctor A and Doctor D. The final output would look like: Doctor Unique Doctors Work with A 1 B 2 C 2 D 1 E 2

csessa3 · ‎06-05-2020

Hello, I have a dataset that has doctors and the various practices they work in. Each doctor in my dataset works in at least 1 practice but as many as 17 different practices. I would like to know the unique number of doctors each one works with. Example dataset below. This sample shows that Doctor A is in practices, P1, P3, and P5. Doctor E is in practices P1, P2, and P5,etc. Doctor Tot_in_group grou_practice A 2 P1 E 2 P1 C 2 P2 B 2 P2 A 3 P3 D 3 P3 E 3 P3 E 2 P5 A 2 P5 From this chart I would want a new column with the total number of unique doctors each one works with. In this case Doctor A works with 2 other doctors (E & D.) However, if I simply grouped by doctor and summed, I find that Doctor A works with 6 Doctors. However this is wrong because it would count Doctor A 3 times (once for each practice he is listed in) AND it would count Doctor E twice (he is in two group practices with Doctor A, P1 & P5) I have ~ 800,000 doctors with ~400,000 group practices making manual methods unfeasible. Does any one have any suggestions on how to get this started? sample data set code data test; input doctor $ tot_in_group group_practices $; datalines; A 2 P1 E 2 P1 C 3 P2 B 3 P2 E 3 P2 A 2 P3 D 2 P3 E 2 P5 A 2 P5 ; run;

Online Status	Offline
Date Last Visited	‎08-28-2020 10:57 PM

Re: Creating Dummy Variables from Categorical Variable for large datas...

Re: Creating Dummy Variables from Categorical Variable for large datas...

Re: Creating Dummy Variables from Categorical Variable for large datas...

Re: Creating Dummy Variables from Categorical Variable for large datas...

Re: Creating Dummy Variables from Categorical Variable for large datas...

Creating Dummy Variables from Categorical Variable for large dataset

Re: Finding unique number of IDs in multiple groups

Re: Finding unique number of IDs in multiple groups

Finding unique number of IDs in multiple groups

Re: Creating Dummy Variables from Categorical Variable for large datas...

Re: Creating Dummy Variables from Categorical Variable for large datas...

Re: Creating Dummy Variables from Categorical Variable for large datas...

Re: Creating Dummy Variables from Categorical Variable for large datas...

Re: Creating Dummy Variables from Categorical Variable for large datas...

Creating Dummy Variables from Categorical Variable for large dataset

Re: Finding unique number of IDs in multiple groups

Re: Finding unique number of IDs in multiple groups

Finding unique number of IDs in multiple groups