BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
heretolearn
Obsidian | Level 7

Hi! I have a question about how to use proc sql (or another SAS procedure) to do the following:

 

Below is my start dataset. Each individual can have received ≥ 1 medclasses for a certain duration (txduration). Sometimes they received the same medclass more than once.

 

ID     medclass     txduration

1       1                  0.2

1       1                  0.3

1       3                  0.8

2       2                  1.2

2       2                  0.5

2       3                  1.0

2       4                  0.3

3       1                  0.3

3       1                  0.8

3       2                  1.0

 

I would like to receive both of the following sets of output.

 

Here, I need the sum of the txduration for each medclass for each person:

 

ID     medclass     sum_txduration

1       1                  0.5 - since this should be 0.2 + 0.3

1       3                  0.8

2       2                  1.7 - since this should be 1.2 + 0.5

2       3                  1.0

2       4                  0.3

3       1                  1.1 - since this should be 0.3 + 0.8

3       2                  1.0

 

This is the last set of output that I need. It shows the mean of the sum_txduration values for the corresponding medclass values. So this output is not specific for each id.

 

medclass         mean_txduration

1                      0.8 - since this should be (0.5+1.1) / 2

2                      1.35 – since this should be (1.7 + 1.0) / 2

3                      0.9 – since this should be (0.8 + 1.0) / 2

4                      0.3

 

Any advice?

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
bobpep212
Quartz | Level 8
proc sort data=have;
by id medclass;
run;
data want1;
set have;
by id medclass;
sum+txduration;
if last.medclass then do;
output;
sum=.;
end;
run;
proc sql;
create table want2 as
select
	medclass,
	mean(sum) as mean
from want1
group by medclass;
quit;

/*Edit added to do both steps using proc sql*/
proc sql;
create table want1 as
select
id
,medclass
,sum(txduration) as sum
from have
group by id, medclass;
create table want2 as
select
medclass
,mean(sum) as mean
from want1
group by medclass;
quit;

View solution in original post

7 REPLIES 7
novinosrin
Tourmaline | Level 20

Do you want two dataset outputs or both in one?

 

1                      0.8 - since this should be (0.5+1.1) / 2   /*where did you get the 1.1 from ?*/

heretolearn
Obsidian | Level 7

I'm guessing it might have to be two separate dataset outputs

 

The first output (creating the variable sum_txduration) would be sums of values from the first dataset.

The second output (creating the variable mean_txduration) would have averages of values from the first output.

 

So for (0.5 + 1.1) / 2, the 0.5 was the sum_txduration value for id 1 medclass 1, and 1.1 was the sum_txduration value for id 3 medclass 1. Essentially, this is the average of all the medclass 1 values from the first output.

 

Similarly, for the second output, for medclass 2, the mean_txduration is 1.35, since the two sum_txduration values from the first output were 1.7 (for id 2) and 1.0 (for id 3).

Astounding
PROC Star

These are not difficult, but I can only give you half the answer since there are questions about the right methods for the second question.

 

To get the first result:

 

proc summary data=have nway;

   class id medclass;

   var txduration;

   output out=want (keep=id medclass sum_txduration) sum=sum_txduration;

run;

 

To clarify the second question, consider this data:

 

ID  medclass   txduration

1        1              0.3

1        3              0.4

1        1              0.2

2        1              0.4

 

What should the result be, when a single ID has received the same MEDCLASS twice?  (There's more than one way to answer that.)

heretolearn
Obsidian | Level 7

Astounding,

 

Thanks for the code for the first set of data output! In response to your question, based on the dataset you made, this is what I'm looking for:

 

Id    medclass   sum_txduration

1     1                 0.5

1     3                 0.4

2     1                 0.4

 

medclass       mean_txduration

1                     0.45 (so (0.5 + 0.4) / 2)

3                     0.4

bobpep212
Quartz | Level 8
proc sort data=have;
by id medclass;
run;
data want1;
set have;
by id medclass;
sum+txduration;
if last.medclass then do;
output;
sum=.;
end;
run;
proc sql;
create table want2 as
select
	medclass,
	mean(sum) as mean
from want1
group by medclass;
quit;

/*Edit added to do both steps using proc sql*/
proc sql;
create table want1 as
select
id
,medclass
,sum(txduration) as sum
from have
group by id, medclass;
create table want2 as
select
medclass
,mean(sum) as mean
from want1
group by medclass;
quit;
bobpep212
Quartz | Level 8

My pleasure! I tried to edit my last code block to add a method to do it all in proc sql - you won't need the sort this way.

Here it is again so it is readable:

 

proc sql;
create table want1 as
select 
	id
	,medclass
	,sum(txduration) as sum
from have
group by id, medclass;
create table want2 as
select
	medclass
	,mean(sum) as mean
from want1
group by medclass;
quit;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2073 views
  • 3 likes
  • 4 in conversation