Hi all;
I have a dataset consisting of individual Medicaid claims with patients’ ID numbers, where each claim is flagged for whether or not it contains certain mental illness diagnoses. I’m trying to turn that into a dataset with one line per patient showing the date(s) of each diagnosis of interest. I’m going to use a toy model because this is a pretty conceptual question, and I am actually using protected health information on a secure server. The real data is similar to this, but there are 25 flagged mental illness diagnoses and 96 million claims for 250,000 people. Here’s what I have to start with (one line per claim):
ID  diagnosis   claim_date  mood_disorder anxiety_disorder 
1   F60.72      23JAN2019   1           0 
1   F60.3       16DEC2018   1           0 
1   F65.51      04MAY2019   0           1 
1   B20.2       28FEB2008   0           0 
2   F65.3       05JUL2018   0           1 
2   B20.5       28OCT2017   0           0 
3   B20.6       20OCT2016   0           0 Here’s what I want (one line per person):
ID  mood_1          mood_2          anxiety_1 
1   23JAN2019       16DEC2018       0 
2   0               0               05JUL2018 
3   0               0               0 I really appreciate for any help!
Thanks,
@dustychair Representative sample data is all we need. It's also great that you provided the information of the real data volume and number of variables you're dealing with.
Ideally provide sample data via a working SAS data step. This not only saves the one helping you some time, it also allows you to provide sample data with data types and formats that match your real data which in turn then leads to proposed code that likely will be suitable for your real data.
Your "Have" data structure is leaner than your "Want" data structure. Given the data volume transposing the data to your "Want" data structure will blow-up the volume significantly. Do you really need it? Are you sure you can't work with the current data structure?
Most SAS procedures will work nicely with your "Have" structure. May-be you explain to us why you believe the "Want" structure is required so we can assess and eventually propose another way that allows you to use the "Have" structure.
Hi @Patrick,
Thanks so much for your reply! Regarding providing sample data in a working data step, I will keep that mind in the future, and may try to update here later (I'm on my phone now). Thanks for letting me know that's best practice.
The reason I want a data structure with one line per person (vs. per claim) is that this data will then be linked to birth records and social services data using the patient's ID. After linking, the patient's pregnancy dates will be determined from the birth record, and each patient will be flagged for whether they experienced any of several types of mental illness diagnosis during their pregnancy. From there, I will be doing linear regression and descriptive analyses on the data with outcomes originating from the social services data. Please let me know if you have additional questions regarding the intended use of the data!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
