Hi All, I have the following dataset
id rx
1 a
1 b
1 d
2 a
2 b
3 a
3 c
4 b
5 a
5 d
6 a
6 c
7 a
7 b
I am trying to summarize these combination as aggregate, here is the output I want
a-b-d 1
a-b 2
a-c 2
b 1
a-d 1
For data in that form this works:
data have;
input patient_id rx $;
datalines;
1 a
1 b
1 c
2 a
2 b
3 a
3 b
;
run;
proc sort data=have;
by patient_id rx;
run;
proc transpose data=have
out=trans;
by patient_id;
var rx;
run;
data want;
set trans;
array c col: ;
length rx_list $ 200;
rx_list= catx('-',of c(*));
drop col: _name_ ;
run;
But this will only work if there is one RX per record and the RX doesn't repeat, unless you want a-a-b. If you don't want the repeats then add NODUPKEY to the Proc sort.
The sort also is used to keep from getting a-b and b-a as different results.
I am concerned about the patient 4 in your original post though. If you have multiple RX in a single record then you should consider making sure the structure is one RX per patient record which will involve some additional code.
How do you get "a-c 2" and "a-d 1" from that data? You seem to be missing a detail or two on what you are counting and how to count it. At least I think those are supposed to be counts.
How many values may be in RX for a single row?
For some reason, when I edit the dataset i get it correct but it is not when posted. Here are the correct entries:
id rx
1 a
1 b
1 d
2 a
2 b
3 a
3 c
4 b
5 a
5 d
6 a
6 c
7 a
7 b
for patient 4 (b), patient 5 (a, d), patient 6 (a, c) and patients 7 (a,b) so you get 2 for a-c because patient 3 had a-c and patient 6 has a-c too. You are right for a-d, the triple combination can be counted under different category so for patient 1, it could be a-b, a-d and b-d and a-b-d but for smipliciy, I would count this only once as a-b-c so patient can be counted only one time in the aggregate measure.
Patient? Your example data does not cleary identify "Patient". If the ID is Patient then the example data is incomplete for the summary result shown.
Apologize if I am not clear, I will simplify my example.
patient_id rx
1 a
1 b
1 c
2 a
2 b
3 a
3 b
The output would be
a-b-c 1
a-b 2
For data in that form this works:
data have;
input patient_id rx $;
datalines;
1 a
1 b
1 c
2 a
2 b
3 a
3 b
;
run;
proc sort data=have;
by patient_id rx;
run;
proc transpose data=have
out=trans;
by patient_id;
var rx;
run;
data want;
set trans;
array c col: ;
length rx_list $ 200;
rx_list= catx('-',of c(*));
drop col: _name_ ;
run;
But this will only work if there is one RX per record and the RX doesn't repeat, unless you want a-a-b. If you don't want the repeats then add NODUPKEY to the Proc sort.
The sort also is used to keep from getting a-b and b-a as different results.
I am concerned about the patient 4 in your original post though. If you have multiple RX in a single record then you should consider making sure the structure is one RX per patient record which will involve some additional code.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.