Hi All: Relatively new to SAS. I'm having difficulties with a dataset, and I apologizes I must explain in generalities because I don't know where to begin. I'll try to include some of the codes, etc I have but I apologize in advance for how vague this may seem... I'm lost!
So I have a large-ish dataset of hospital emergency department (ED) visits (approx 40,000 visits), and I am looking at opioid overdoses and suicide with a number of other predictors. The way the data was delivered to me, the diagnoses I requested are listed as categories with "yes" "no" has level of variables.
Later I am going to create regression models with just the sample of patients with a diagnosis of "Opioid poisoning" (n~2000) with a number of predictors for suicidality as an outcome category (binary outcome "yes" "no"). I would like one of those predictors to be previous presentation for a different psychiatric reason (meaning, the patient was previously in the ED NOT for an opioid overdose but FOR a psychiatric diagnosis- again these are categories with "yes" and "no" as levels).
I do have number of admissions in total as a separate predictor (I grouped these based on number of times MRN was entered). I also have dates in SAS format DDMONYR 00:00:00
I'm trying to categorizes individual patients (identified by MRN) who have overdosed at least once based on a category AND who have had PREVIOUS presentations for non opioid OD reasons). The n of this total sample should be less than 2000, so not a high number, though I will be drawing from a much larger sample
Any tips on how/where to begin?? If I didn't explain things clearly or if there is any other info I can provide please let me know!!! Unfortunately I cannot post any data because of HIPAA reasons
Is each record an admission with a patient identifier and admission characteristics, or is each record a patient with some nonnormalized data structure?
Let us start someplace. Assume data where each record is an admit with a patient and a diagnosis.
data admits;
input patientid admitdate:mmddyy10. opiod:$1. psych:$1. else:$1.;
datalines;
1 1/1/2017 n y n
1 2/1/2017 n n y
1 3/1/2017 y n n
2 1/1/2017 n n y
2 2/1/2017 n n y
2 3/1/2017 y n n
3 1/1/2017 n y n
3 2/1/2017 n n y
3 3/1/2017 n y n
4 4/1/2017 n y n
4 5/1/2017 n n y
4 6/1/2017 y n n
4 7/1/2017 y n n
4 8/1/2017 y n n
5 6/1/2017 y n n
5 7/1/2017 n n n
5 8/1/2017 n y n
;
run;
Then let's select anybody who is a y on opiod and also a y on pysch with a date before their y on opiod. In this data 1 and 4. 5 has a psych after and 2 doesn't have a psych.
proc sql;
create table opiod_with_psych as
select distinct a.patientid
from admits a inner join admits b on a.patientid = b.patientid
where a.opiod = 'y' and b.psych = 'y' and b.admitdate < a.admitdate;
quit;
That gives us
The SAS System |
patientid |
1 |
4 |
Start with that and then let's modify.
Thanks!
So I have a list of patient MRNs which fit that category. The next step is to create a variable in my data set where I only have opioid overdoses (the n~2000) and these patients are assigned a "Y". Can i merge this? If so, how?
Thanks!
I guess more sxplicitly- I have a list of MRNs that fit this critieria. In my larger data set how do I create a variable to which I can assign a "y" or "n" to these MRNs based on whether or not they are on this new table? Thanks!!!
I guess more sxplicitly- I have a list of MRNs that fit this critieria. In my larger data set how do I create a variable to which I can assign a "y" or "n" to these MRNs based on whether or not they are on this new table? Thanks!!!
I have no idea. We haven't seen even any made-up data. I have no idea what you want to do. You said you already had things in Y/N format.
You'll have to provide more information and explanation for further help.
Thanks for the suggestions/tips- was out of town, sorry for the delay. Here is fake data simplified from my original set with only the pertinent variables (modified slightly for HIPAA purposes):
data WORK.OPIOID11;
infile datalines dsd truncover;
input MedRec:BEST12. Enct:BEST12. EnteredDT:DATETIME. opioidx:$1. psychdx:$1.;
format MedRec BEST12. Enct BEST12. EnteredDT DATETIME.;
datalines4;
123456,654387982,04JUN15:16:10:00,2,1
123456,846397542,08JUL15:17:10:00,1,2
123456,846397541,09AUG15:18:10:00,1,1
654321,987654321,09JAN15:13:51:00,1,2
654321,123456789,08MAR15:12:30:00,1,1
654321,867530921,07MAY15:14:20:00,2,1
987654,987654321,02FEB15:14:40:00,1,2
987654,123456789, 08MAR15:15:00:00,2,1
987654,867530921, 04APR15:16:00:00,1,1
MedRec is the patient's medical record number (again, made up). This is assigned to an individual patient and sticks with the patient. Enct is the encounter number, assigned to the patient at an individual visit. opioidx is if the visit is for an opioid overdose (OD). psychdx is for other psychiatric complaint. For both of these, 1=yes, 2=no. For the sake of simplicity each patient will have 3 encounterss.
So I want to identify the patients who have overdosed but PRIOR to any overdose attempt they have had at least one encounterWITHOUT an opioid overdose but WITH a psychiatric diagnosis.
Using the examples above, I would want to select for patient "123456" because there is an encounter where psychdx=1 (yes for psych dx) AND opioidx=2 (no opioid OD) PRIOR to 2 encounters WITH opioid overdoses (and whether or not a psychdx is present on these encounters doesn't matter)
I would NOT want to identify patient "654321" because the opioid ODs preceded the psych diagnosis visits. I WOULD want to identify 987654 because an overdose was preceded by at least one psych dx encounter.
I have slightly modified the code above:
proc sql;
create table opioidpsych as
select distinct a.MedRec
from work.opioid11 a inner join work.opioid11 b on a.MedRec = b.MedRec
where a.opioidx = "1" and b.psychdx="1" and b.EnteredDT < a.EnteredDT;
quit;
And as far as I can tell it creates a table for the MedRecs as specified above (meaning the individual patient has at least one opioid OD with a previous visit for psychdx.
Once I am done I get a table as follows (c/w above)
data WORK.OPIOIDPSYCH;
infile datalines dsd truncover;
input BJH_MedRec:BEST12.;
format BJH_MedRec BEST12.;
datalines4;
123456
987654
So my next question is how to note these IDs (individual patient med rec#s) on my larger data set (work.opioid11), but with the caveat being that each individual line is designated by encounter. For example, I would want to flag patient "123456" twice, the 2nd and third encounters because they were both opioid overdoses with a prior psychiatric diagnosis.
I hope I am being clear- please let me know if something isn't making sense!
Thanks so much!
What do you want as output from this data set?
Hi Reeza:
ultimately I would like to create another variable that notes if an encounter that was an opioid overdose ((opioidx=1) was preceded by a prior encounter for THAT SAME PATIENT (meaning, med recs are the same) where there was not an opioid overdose (opioidx=2) but there was a psychiatric diagnosis given (psychdx=1)
Make it easy for me -post the exact data set as above, but with the output.
Gotcha: made it even simpler:
data WORK.opioidfake;
input MedRec:BEST12. Enct:BEST12. EnteredDT DATE. opioidx:$1. psychdx:$1.;
format MedRec BEST12. Enct BEST12. EnteredDT DATE.;
datalines;
123456 654387982 04JUN15 2 1
123456 846397542 08JUL15 1 2
123456 846397541 09AUG15 1 1
654321 987654321 09JAN15 1 2
654321 123456789 08MAR15 1 1
654321 867530921 07MAY15 2 1
987654 987654421 02FEB15 1 2
987654 123456689 08MAR15 2 1
987654 867530121 04APR15 1 1
;
run;
Output (wouldn't allow complete tables but descending labels go across from L to R)
1 | 123456 | 654387982 | 04JUN15 | 2 | 1 |
2 | 123456 | 846397542 | 08JUL15 | 1 | 2 |
3 | 123456 | 846397541 | 09AUG15 | 1 | 1 |
4 | 654321 | 987654321 | 09JAN15 | 1 | 2 |
5 | 654321 | 123456789 | 08MAR15 | 1 | 1 |
6 | 654321 | 867530921 | 07MAY15 | 2 | 1 |
7 | 987654 | 987654421 | 02FEB15 | 1 | 2 |
8 | 987654 | 123456689 | 08MAR15 | 2 | 1 |
9 | 987654 | 867530121 | 04APR15 | 1 | 1 |
Then:
proc sql;
create table opioidpsych as
select distinct a.MedRec
from work.opioidfake a inner join work.opioidfake b on a.MedRec = b.MedRec
where a.opioidx = "1" and b.psychdx="1" and b.EnteredDT < a.EnteredDT;
quit;
1 | 123456 |
2 | 987654 |
Thanks again! Please let me know if you need more info!
I think this gets you pretty close, if not exactly what you want:
data have;
input MedRec:BEST12. Enct:BEST12. EnteredDT DATE. opioidx:$1. psychdx:$1.;
format MedRec BEST12. Enct BEST12. EnteredDT DATE.;
datalines;
123456 654387982 04JUN15 2 1
123456 846397542 08JUL15 1 2
123456 846397541 09AUG15 1 1
654321 987654321 09JAN15 1 2
654321 123456789 08MAR15 1 1
654321 867530921 07MAY15 2 1
987654 987654421 02FEB15 1 2
987654 123456689 08MAR15 2 1
987654 867530121 04APR15 1 1
;
run;
proc sort data=have; by medrec entereddt;
run;
data want;
set have;
by medrec;
retain psychflag;
if first.medrec then psychflag=0;
if opioidx='1' and psychflag=1 then output;
if psychdx='1' then psychflag=1;
run;
Thanks- right now when I do this I am only getting the psychflag=1 lines (about 400 out of my total sample)- how do I get this variable (psychflag) to show in my larger dataset (about 40,000)?
Thanks!
It matches the output you specified, the two rows. This is why I asked explicitly what you wanted as output.
Try removing the OUTPUT statement and just creating the flags if that's what you're looking to do.
So in the PROC SQL code above, it looks as if that produced the list of med rec numbers that I specifically wanted (these are the patients with an opioid OD with prior psych dx WITHOUT OD. I want my ultimate output to be another variable in my main dataset (n=40,000) that matches these med recs with the appropriate encounters (Opioid ODs that for which the individual MedRec had previously been in the ED for a psych dx).
So basically, how do I create another variable in my big table (with 40,000) with the Med recs IDd from the proc SQL, assigned to the appropriate Encounters?
Please let me know if I'm not making sense!
Try removing the OUTPUT statement and just creating the flags if that's what you're looking to do.
This is still the answer. If you can't get it working, post your modified code and I can help you modify it correctly.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.