BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
heretolearn
Obsidian | Level 7

Hi! I have a question about how to use proc sql (or another SAS procedure) to do the following:

 

Below is my start dataset. Each individual can have received ≥ 1 medclasses for a certain duration (txduration). Sometimes they received the same medclass more than once.

 

ID     medclass     txduration

1       1                  0.2

1       1                  0.3

1       3                  0.8

2       2                  1.2

2       2                  0.5

2       3                  1.0

2       4                  0.3

3       1                  0.3

3       1                  0.8

3       2                  1.0

 

I would like to receive both of the following sets of output.

 

Here, I need the sum of the txduration for each medclass for each person:

 

ID     medclass     sum_txduration

1       1                  0.5 - since this should be 0.2 + 0.3

1       3                  0.8

2       2                  1.7 - since this should be 1.2 + 0.5

2       3                  1.0

2       4                  0.3

3       1                  1.1 - since this should be 0.3 + 0.8

3       2                  1.0

 

This is the last set of output that I need. It shows the mean of the sum_txduration values for the corresponding medclass values. So this output is not specific for each id.

 

medclass         mean_txduration

1                      0.8 - since this should be (0.5+1.1) / 2

2                      1.35 – since this should be (1.7 + 1.0) / 2

3                      0.9 – since this should be (0.8 + 1.0) / 2

4                      0.3

 

Any advice?

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
bobpep212
Quartz | Level 8
proc sort data=have;
by id medclass;
run;
data want1;
set have;
by id medclass;
sum+txduration;
if last.medclass then do;
output;
sum=.;
end;
run;
proc sql;
create table want2 as
select
	medclass,
	mean(sum) as mean
from want1
group by medclass;
quit;

/*Edit added to do both steps using proc sql*/
proc sql;
create table want1 as
select
id
,medclass
,sum(txduration) as sum
from have
group by id, medclass;
create table want2 as
select
medclass
,mean(sum) as mean
from want1
group by medclass;
quit;

View solution in original post

7 REPLIES 7
novinosrin
Tourmaline | Level 20

Do you want two dataset outputs or both in one?

 

1                      0.8 - since this should be (0.5+1.1) / 2   /*where did you get the 1.1 from ?*/

heretolearn
Obsidian | Level 7

I'm guessing it might have to be two separate dataset outputs

 

The first output (creating the variable sum_txduration) would be sums of values from the first dataset.

The second output (creating the variable mean_txduration) would have averages of values from the first output.

 

So for (0.5 + 1.1) / 2, the 0.5 was the sum_txduration value for id 1 medclass 1, and 1.1 was the sum_txduration value for id 3 medclass 1. Essentially, this is the average of all the medclass 1 values from the first output.

 

Similarly, for the second output, for medclass 2, the mean_txduration is 1.35, since the two sum_txduration values from the first output were 1.7 (for id 2) and 1.0 (for id 3).

Astounding
PROC Star

These are not difficult, but I can only give you half the answer since there are questions about the right methods for the second question.

 

To get the first result:

 

proc summary data=have nway;

   class id medclass;

   var txduration;

   output out=want (keep=id medclass sum_txduration) sum=sum_txduration;

run;

 

To clarify the second question, consider this data:

 

ID  medclass   txduration

1        1              0.3

1        3              0.4

1        1              0.2

2        1              0.4

 

What should the result be, when a single ID has received the same MEDCLASS twice?  (There's more than one way to answer that.)

heretolearn
Obsidian | Level 7

Astounding,

 

Thanks for the code for the first set of data output! In response to your question, based on the dataset you made, this is what I'm looking for:

 

Id    medclass   sum_txduration

1     1                 0.5

1     3                 0.4

2     1                 0.4

 

medclass       mean_txduration

1                     0.45 (so (0.5 + 0.4) / 2)

3                     0.4

bobpep212
Quartz | Level 8
proc sort data=have;
by id medclass;
run;
data want1;
set have;
by id medclass;
sum+txduration;
if last.medclass then do;
output;
sum=.;
end;
run;
proc sql;
create table want2 as
select
	medclass,
	mean(sum) as mean
from want1
group by medclass;
quit;

/*Edit added to do both steps using proc sql*/
proc sql;
create table want1 as
select
id
,medclass
,sum(txduration) as sum
from have
group by id, medclass;
create table want2 as
select
medclass
,mean(sum) as mean
from want1
group by medclass;
quit;
bobpep212
Quartz | Level 8

My pleasure! I tried to edit my last code block to add a method to do it all in proc sql - you won't need the sort this way.

Here it is again so it is readable:

 

proc sql;
create table want1 as
select 
	id
	,medclass
	,sum(txduration) as sum
from have
group by id, medclass;
create table want2 as
select
	medclass
	,mean(sum) as mean
from want1
group by medclass;
quit;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 949 views
  • 3 likes
  • 4 in conversation