Hello all,
I hope you can help me.
I have a dataset containing information about prescriptions for drugs filled for an individual. I have 4 types of drugs which can be prescribed several times. Some individuals uses more than one drug and I
1. need to count how many different drugs the individual uses.
2. need to count how many different drugs the individual uses in one month
3. how long time is the total drug use (in days)
Data have:
ID date_of_prescription Drug type
1 01012018 drug 1
1 02012018 drug 2
1 03042018 drug 1
2 04052017 drug 1
2 01062017 drug 3
3 10122014 drug 3
Output want:
1.
Number of drugs used Individuals
1 X
2 X
3 X
4 X
2. The same as 1. but within one month
Number of drugs used Individuals
1 X
2 X
3 X
4 X
3. mean, median p25, p75
with data like
data have;
  input id script_date:mmddyy10. drug:$1.;
  format script_date mmddyy10.;
  datalines;
1 01012018 A
1 02012018 B
1 02022018 C
1 03042018 A
2 04052017 A
2 01062017 C
3 10122014 C
4 10122014 A
4 10132014 B
4 10142014 C
;
run;
One approach might be
*how many different drugs an individual uses;
proc sql;
	select id, count(distinct(drug)) as 'num drugs used'n
	from have
	group by id;
quit;
*how many different drugs an individual uses by month;
proc sql;
	select distinct month, year, 
	id, drugcount as 'drug count'n
	from 
	(
	select month(script_date) as month, year(script_date) as year, 
	id as id, count(distinct(drug)) as drugcount
	from have
	group by month(script_date), year(script_date), id
	)
	order by 2 asc, 1 asc, 3;
quit; 
* drug use time period in days;
* start time is assumed to be prescription time;
* stop time is undefined and assumed as today;
proc sql;
	select id, script_date as 'drug start date'n, 
	drug, today() - script_date as 'days of usage'n
	from have;
quit;
which provides out of
| id | num drugs used | 
| 1 | 3 | 
| 2 | 2 | 
| 3 | 1 | 
| 4 | 3 | 
| month | year | id | drug count | 
| 10 | 2014 | 3 | 1 | 
| 10 | 2014 | 4 | 3 | 
| 1 | 2017 | 2 | 1 | 
| 4 | 2017 | 2 | 1 | 
| 1 | 2018 | 1 | 1 | 
| 2 | 2018 | 1 | 2 | 
| 3 | 2018 | 1 | 1 | 
| id | drug start date | drug | days of usage | 
| 1 | 1/1/2018 | A | 1568 | 
| 1 | 2/1/2018 | B | 1537 | 
| 1 | 2/2/2018 | C | 1536 | 
| 1 | 3/4/2018 | A | 1506 | 
| 2 | 4/5/2017 | A | 1839 | 
| 2 | 1/6/2017 | C | 1928 | 
| 3 | 10/12/2014 | C | 2745 | 
| 4 | 10/12/2014 | A | 2745 | 
| 4 | 10/13/2014 | B | 2744 | 
| 4 | 10/14/2014 | C | 2743 | 
with data like
data have;
  input id script_date:mmddyy10. drug:$1.;
  format script_date mmddyy10.;
  datalines;
1 01012018 A
1 02012018 B
1 02022018 C
1 03042018 A
2 04052017 A
2 01062017 C
3 10122014 C
4 10122014 A
4 10132014 B
4 10142014 C
;
run;
One approach might be
*how many different drugs an individual uses;
proc sql;
	select id, count(distinct(drug)) as 'num drugs used'n
	from have
	group by id;
quit;
*how many different drugs an individual uses by month;
proc sql;
	select distinct month, year, 
	id, drugcount as 'drug count'n
	from 
	(
	select month(script_date) as month, year(script_date) as year, 
	id as id, count(distinct(drug)) as drugcount
	from have
	group by month(script_date), year(script_date), id
	)
	order by 2 asc, 1 asc, 3;
quit; 
* drug use time period in days;
* start time is assumed to be prescription time;
* stop time is undefined and assumed as today;
proc sql;
	select id, script_date as 'drug start date'n, 
	drug, today() - script_date as 'days of usage'n
	from have;
quit;
which provides out of
| id | num drugs used | 
| 1 | 3 | 
| 2 | 2 | 
| 3 | 1 | 
| 4 | 3 | 
| month | year | id | drug count | 
| 10 | 2014 | 3 | 1 | 
| 10 | 2014 | 4 | 3 | 
| 1 | 2017 | 2 | 1 | 
| 4 | 2017 | 2 | 1 | 
| 1 | 2018 | 1 | 1 | 
| 2 | 2018 | 1 | 2 | 
| 3 | 2018 | 1 | 1 | 
| id | drug start date | drug | days of usage | 
| 1 | 1/1/2018 | A | 1568 | 
| 1 | 2/1/2018 | B | 1537 | 
| 1 | 2/2/2018 | C | 1536 | 
| 1 | 3/4/2018 | A | 1506 | 
| 2 | 4/5/2017 | A | 1839 | 
| 2 | 1/6/2017 | C | 1928 | 
| 3 | 10/12/2014 | C | 2745 | 
| 4 | 10/12/2014 | A | 2745 | 
| 4 | 10/13/2014 | B | 2744 | 
| 4 | 10/14/2014 | C | 2743 | 
Please post the expected results of all tasks you have to solve and show what you have tried so far.
EDIT
You should also explain how a duration in days should be calculated without having an end-date. You said:
Some individuals uses more than one drug
So using the prescription date of drug b as an end date for drug a seems wrong.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
