I am looking at 5 classes of psychotropic meds (anti_anxiety, anti_psychotic, anti_depressant, mood_stabilizer, stimulants) over two years 2022 and 2023.
The age groups are 0-5, 6-12, 13-17.
Each year was extracted by quarter by the month of the quarter. So there are 4 groupings for each drug class:
anti_anxiety_2022_q1_1 anti_anxiety for 2022 for qtr 1 January
anti_anxiety_2022_q1_2 anti_anxiety for 2022 for qtr 1 February
anti_anxiety_2022_q1_3 anti_anxiety for 2022 for qtr 1 March
. . .
anti_anxiety_2022_q4_1 anti_anxiety for 2022 for qtr 4 October
anti_anxiety_2022_q4_2 anti_anxiety for 2022 for qtr 4 November
anti_anxiety_2022_q4_3 anti_anxiety for 2022 for qtr 4 December
Data for each of the other 4 med classes are have similar construction
The data is cleaned up a little (e.g. if header_amount_paid = 0, record is deleted)
Age groupings are added
SQL coding is used to create
So a hypothetical anxiety quarterly dataset (Q1) looks like
From DOS Recipient_ID Total_Paid Age_group
30Jan22 EO401J5K 132.86 0-5
05FEB22 BY3267G1 43.55 13-17
28MAR22 PA601M4L 67.73 6-12
The four quarters for each med were set together to give
Anxiety_2022_final
Psychotic_2022_final
Depressant_2022_final
Mood_Stable_2022_final
Stimulants_2022_final
The request is to get a count of unique recipients that received all 5 drugs for each age grouping.
My approach was to do the following
Data all drugs_2022;
Set anti_anxiety_2022
anti-psychotic_2022
anti_depress_2022
anti_mood_2022
anti_stimulant_2022;
run;
proc sort data=all_drugs_2022
out=uniquerecipients_2022 nodups dupout=dup_recipients_2022 nodupkey;
by Recipient_ID;
proc sql noprint;
create table unique_rec_meds_2022 AS
select From_DOS, Recipient_ID, Sum(header_amount_paid) AS Total_paid, Age_group,
COUNT(Age_group) AS A_grp
From all_drugs_2022
Group By age_group;
quit;
So the final result would be a excel table (results are hypothetical for illustration)
CY 2022 Age_group Recipients Payments
2022 0-5 8,235 $1,644,876
6-12 41,875 $39,546,233
13-17 54,712 $31,876,098
and a similar table for CY 2023
I thought this would produce a table of the number of children who received all
psychotropic drugs, by age group, and payments
But the requester doesn't think so.
At this point, I am confused then on how to arrive at the correct result.
Any suggestions and direction will be appreciated.
Thanks.
wlierman
The same final output
data anxiety_2022_q1_1;
set anxiety_2022;
The request is to get a count of unique recipients that received all 5 drugs for each age grouping.
My approach was to do the following
Data all drugs_2022;
Set anti_anxiety_2022
anti-psychotic_2022
anti_depress_2022
anti_mood_2022
anti_stimulant_2022;
run;
This should work:
data all_drugs_2022;
merge anti_anxiety_2022(in=in1 keep=recipient_id age_group) anti_psychotic_2022(in=in2 keep=recipient_id age_group)
anti_depress_2022(in=in3 keep=recipient_id age_group) anti_mood_2022(in=in4 keep=recipient_id age_group)
anti_stimulant_2022(in=in5 keep=recipient_id age_group);
by recipient_id;
if in1 and in2 and in3 and in4 and in5;
run;
/* Get the count */
proc sql;
create table count as select count(distinct recipient_id) from all_drugs_2022 group by age_group;
quit;
This assumes all data sets are sorted by recipient_id
Caveat: without actual data of some sort determining "all" of anything is problematic.
Have you verified that none of your Recipient_ID values changed age_group in any of the data sets? It is very likely that from one calendar month to another that an individual would change age and likely age group. So a rule involving that situation as it relates to the desired report is needed before starting. Since you are looking overall at a two year period you are certain to have some change age groups and how that affects the report is needed.
I suspect that some details of the required report are missing.
I see nothing in your attempt that takes the different medications into account. Your SQL dumps all the records into one call and counts by age_group only, nothing by medication.
I suggest providing a small example of data that looks like your 5 anti_anxiety_2022, anti-psychotic_2022 etc. data sets. Make sure to have at least 2 participants that are "all 5 medications" and some that aren't. Then show what the expected result (MANUALLY calculated which is why the set should be small) from is expected to look like.
I assume AGE, AGE_GROUP are only Baseline observations and don't change over time.
To calculate RECIPENTS number and TOTAL_AMOUNT spent by MEDICATION, AGE_GROUP, QUARTER and YEAR I would do smth like this:
%let drugs= 'anti_anxiety' 'anti_psychotic' 'anti_depressant' 'mood_stabilizer' 'stimulants';
data have;
set <input data>;
where lowcase(MEDS) in (&drugs) and HEADER_AMOUNT_PAID >0 and (FROM_DOS between '01Jan2022'd and '31Dec2023'd);
year= year(FROM_DOS);
if month(FROM_DOS) in (1, 2, 3) then QUARTER='q1';
else if month(FROM_DOS) in (4, 5, 6) then QUARTER='q2';
else if month(FROM_DOS) in (7, 8, 9) then QUARTER='q3';
else if QUARTER='q4';
run;
proc means data=have n sum noprint nway;
class YEAR QUARTER AGE_GROUP MEDS;
ways 4;
var HEADER_AMOUNT_PAID;
output out= summary n=RECIPIENTS sum=PAYMENTS;
format payments dollar12.;
run;
The request is to get a count of unique recipients that received all 5 drugs for each age grouping.
My approach was to do the following
Data all drugs_2022;
Set anti_anxiety_2022
anti-psychotic_2022
anti_depress_2022
anti_mood_2022
anti_stimulant_2022;
run;
This should work:
data all_drugs_2022;
merge anti_anxiety_2022(in=in1 keep=recipient_id age_group) anti_psychotic_2022(in=in2 keep=recipient_id age_group)
anti_depress_2022(in=in3 keep=recipient_id age_group) anti_mood_2022(in=in4 keep=recipient_id age_group)
anti_stimulant_2022(in=in5 keep=recipient_id age_group);
by recipient_id;
if in1 and in2 and in3 and in4 and in5;
run;
/* Get the count */
proc sql;
create table count as select count(distinct recipient_id) from all_drugs_2022 group by age_group;
quit;
This assumes all data sets are sorted by recipient_id
@wlierman wrote:
Thank you. That does work (though it is somewhat hard to explain at least for me). The only other filter (that I didn't mention) was that the recipients needed to be unique (deduplicated).I did that which lowered the over all total. Thanks again for your help and suggestion.
Explanation: The variable IN1 is 1 if the data comes from anti_anxiety_2022 and 0 otherwise. The variable IN2 is 1 if the data comes from anti_psychotic_2022 and 0 otherwise. When all five data sets are merged, you want the cases where all IN1 through IN5 have value of 1 (can't have any of them zero) and so that's what the IF statement looks for.
Doesn't the PROC SQL use of DISTINCT de-duplicate the counts?
I didn't use the Distinct in proc Sql (though I could have). I ran a proc sort (by Recipient_ID)
proc sort data=Analyze2.all_drugs_2022
out=Analyze2.uniquerecipients_2022 nodups; /*dupout = Analyze2.dup_recipients_2023 nodupkey;*/
by Recipient_ID;
run;
We have a large state Medicaid program. But even so, it is somewhat disconcerting that there are over 50K (at least from our Business Intelligence / Business Objects data warehouse) recipients (children in the 0-5 age group that have received all five classes of drugs. Though I think some of that is the NDC code (like " children acetaminophen" as well as others that I would not have considered in a psychotropic drug class).
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.