Hello everyone:
I have a data set for which I have created a variable "onset" set equal to the time of first occurrence by id. I'm mostly satisfied with the output using PROC PRINT and the PROC MEANS output is exactly what I'm looking for. The problem I've encountered is when merging this data with other data. There is no value for onset for id # 1, which has no occurrence of "fever"; however, when merging with other data, each onset value moves up one, with id #10 missing. I would like to have "." for onset for id # 1 in the merged data. Any help or suggestions would be very much appreciated! I have done my best to copy my code and output below, along with the desired output at the very bottom.
title 'Summary of Onset of Fever by Group';
data work.prob3e;
set work.fever;
by id;
where fever=1;
if first.id;
onset=time;
label onset='day of first fever';
run;
proc print data=work.prob3e;
run;
proc means data=work.prob3e maxdec=1 nonobs n mean std min median max;
class group;
var onset;
run;
title 'Listing of New Variables';
data work.prob3g;
set work.prob3b;
set work.prob3c;
set work.duration;
set work.prob3e;
set work.prob3f;
by id;
run;
proc print data=work.prob3g;
id id;
run;
Here the first observation is id 2. Id 1 has no occurrence of fever:
Summary of Onset of Fever by Group |
Obs | temp | id | time | group | fever | onset |
1 | 101.626 | 2 | 4 | 1 | 1 | 4 |
2 | 100.728 | 3 | 3 | 1 | 1 | 3 |
3 | 100.850 | 4 | 9 | 1 | 1 | 9 |
4 | 101.173 | 5 | 9 | 1 | 1 | 9 |
5 | 101.163 | 6 | 1 | 2 | 1 | 1 |
6 | 101.730 | 7 | 1 | 2 | 1 | 1 |
7 | 101.314 | 8 | 1 | 2 | 1 | 1 |
8 | 103.964 | 9 | 1 | 2 | 1 | 1 |
9 | 101.370 | 10 | 3 | 2 | 1 | 3 |
All of this is correct:
The MEANS Procedure
Analysis Variable : onset day of first fever | ||||||
treatment group | N | Mean | Std Dev | Minimum | Median | Maximum |
1 | 4 | 6.3 | 3.2 | 3.0 | 6.5 | 9.0 |
2 | 5 | 1.4 | 0.9 | 1.0 | 1.0 | 3.0 |
Here's where things go awry. The onset for id 2 from previous has moved up one position to id 1, and so forth for every id, with id 10 missing. I would like onset=. for id 1.
Listing of New Variables |
id | temp | time | group | fever | nfev | ever | consecutive | duration | onset | auc |
1 | 101.626 | 4 | 1 | 1 | 0 | NO | 0 | 0 | 4 | 879.900 |
2 | 100.728 | 3 | 1 | 1 | 1 | YES | 0 | 1 | 3 | 881.142 |
3 | 100.850 | 9 | 1 | 1 | 2 | YES | 0 | 1 | 9 | 892.676 |
4 | 101.173 | 9 | 1 | 1 | 1 | YES | 0 | 1 | 9 | 872.479 |
5 | 101.163 | 1 | 1 | 1 | 1 | YES | 0 | 1 | 1 | 889.449 |
6 | 101.730 | 1 | 2 | 1 | 7 | YES | 0 | 2 | 1 | 913.268 |
7 | 101.314 | 1 | 2 | 1 | 6 | YES | 0 | 3 | 1 | 906.276 |
8 | 103.964 | 1 | 2 | 1 | 9 | YES | 0 | 7 | 1 | 915.809 |
9 | 101.370 | 3 | 2 | 1 | 9 | YES | 0 | 5 | 3 | 914.003 |
I can take care of the labels and variables output, but this is what I'm trying to produce:
Not only will your results be mismatched, but (as you have discovered) your data set will have no more observations than the smallest contributor among your SET statements. Learn about match-merging using the MERGE statement accompanied by a BY statement: take a look at the examples in SAS® 9.4 and SAS® Viya 3.5 Programming Documentation, MERGE Statement page.
Are you sure you want multiple SET statements which is an Interleave?
I would expect a merge instead but by group not by ID.
The different methods of combining data are outlined here, I think you're looking for a match merge
You'll have to provide more details but I suspect you'll need more than one merge for your data sets.
It looks like you're trying to add summary statistics to a main data set? For these type of calculations I find SQL much easier personally, as it can save you multiple steps. Determining the "FIRST" occurrence can also be done via PROC SUMMARY.
https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas
@itmightbeharry wrote:
Hello everyone:
I have a data set for which I have created a variable "onset" set equal to the time of first occurrence by id. I'm mostly satisfied with the output using PROC PRINT and the PROC MEANS output is exactly what I'm looking for. The problem I've encountered is when merging this data with other data. There is no value for onset for id # 1, which has no occurrence of "fever"; however, when merging with other data, each onset value moves up one, with id #10 missing. I would like to have "." for onset for id # 1 in the merged data. Any help or suggestions would be very much appreciated! I have done my best to copy my code and output below, along with the desired output at the very bottom.
title 'Summary of Onset of Fever by Group'; data work.prob3e; set work.fever; by id; where fever=1; if first.id; onset=time; label onset='day of first fever'; run; proc print data=work.prob3e; run; proc means data=work.prob3e maxdec=1 nonobs n mean std min median max; class group; var onset; run; title 'Listing of New Variables'; data work.prob3g; set work.prob3b; set work.prob3c; set work.duration; set work.prob3e; set work.prob3f; by id; run; proc print data=work.prob3g; id id; run;
Here the first observation is id 2. Id 1 has no occurrence of fever:
Summary of Onset of Fever by Group
Obs
temp
id
time
group
fever
onset
1
101.626
2
4
1
1
4
2
100.728
3
3
1
1
3
3
100.850
4
9
1
1
9
4
101.173
5
9
1
1
9
5
101.163
6
1
2
1
1
6
101.730
7
1
2
1
1
7
101.314
8
1
2
1
1
8
103.964
9
1
2
1
1
9
101.370
10
3
2
1
3
All of this is correct:The MEANS Procedure
Analysis Variable : onset day of first fever
treatment group
N
Mean
Std Dev
Minimum
Median
Maximum
1
4
6.3
3.2
3.0
6.5
9.0
2
5
1.4
0.9
1.0
1.0
3.0
Here's where things go awry. The onset for id 2 from previous has moved up one position to id 1, and so forth for every id, with id 10 missing. I would like onset=. for id 1.
Listing of New Variables
id
temp
time
group
fever
nfev
ever
consecutive
duration
onset
auc
1
101.626
4
1
1
0
NO
0
0
4
879.900
2
100.728
3
1
1
1
YES
0
1
3
881.142
3
100.850
9
1
1
2
YES
0
1
9
892.676
4
101.173
9
1
1
1
YES
0
1
9
872.479
5
101.163
1
1
1
1
YES
0
1
1
889.449
6
101.730
1
2
1
7
YES
0
2
1
913.268
7
101.314
1
2
1
6
YES
0
3
1
906.276
8
103.964
1
2
1
9
YES
0
7
1
915.809
9
101.370
3
2
1
9
YES
0
5
3
914.003
I can take care of the labels and variables output, but this is what I'm trying to produce:
Hello Reeza:
Thank you very much for reaching out! I've inserted my full code below, along with the output produced. For the final output, I would like the id number to be my first variable. I have not yet labeled the variables where I'm trying to combine data, but the ones I looking to display from the final output I'd copied are "id", "group", "nfev", "duration"
title 'Summary of Fever by Group and Day';
footnote 'Temperature >= 100.4 F is considered a fever.';
data prob3;
set cert.tempdata;
label fever='fever';
if temp>=100.4 then fever='YES';
else if temp<100.4 then fever='NO';
run;
proc freq data=prob3;
tables time*group*fever / nopercent nocol;
run;
title 'Summary of Number of Days with Fever by Group';
footnote;
proc sort data=cert.tempdata out=work.prob3b1;
by id;
run;
data work.prob3b;
set work.prob3b1;
by id;
if temp>=100.4 then fever=1;
else if temp<100.4 then fever=0;
if first.id then nfev=0;
nfev+fever;
if last.id;
run;
proc means data=work.prob3b nonobs n mean std min median max maxdec=1;
label nfev='# days with fever';
var nfev;
class group;
run;
title 'Summary of Ever Having Fever by Group';
data work.prob3c;
set work.prob3b;
label ever='ever had fever';
if nfev>=1 then ever='YES';
else if nfev=0 then ever='NO';
run;
proc freq data=work.prob3c;
tables group*ever / nopercent nocol;
run;
title 'Summary of Number of Days with Longest Consecutive Fever by Group';
data work.fever;
set cert.tempdata;
if temp>=100.4 then fever=1;
else if temp<100.4 then fever=0;
run;
data work.duration;
do until (last.id);
set work.fever;
by id;
if fever = 0 then consecutive = 0;
if fever then consecutive+1;
duration = max(duration, consecutive);
end;
consecutive=0;
label duration='# days with longest consecutive fever';
run;
proc means data = work.duration nway maxdec=1 nonobs n mean std min median max;
class group;
var duration;
run;
title 'Summary of Onset of Fever by Group';
data work.prob3e;
set work.fever;
by id;
where fever=1;
if first.id;
onset=time;
label onset='day of first fever';
run;
proc print data=work.prob3e;
run;
proc means data=work.prob3e maxdec=1 nonobs n mean std min median max;
class group;
var onset;
run;
title Summary of Area Under the Curve for Temperature by Group;
data work.prob3f;
do until(last.id);
set cert.tempdata; by id time;
if last.time then
auc = sum(auc, mean(temp, prevTemp) * range(time, prevTime));
prevTemp = temp;
prevTime = time;
end;
keep id auc group;
label auc='area under the curve for temperature';
run;
proc means data=work.prob3f maxdec=1 nonobs n mean std min median max;
class group;
var auc;
run;
title 'Listing of New Variables';
data work.prob3g;
set work.prob3b;
set work.prob3c;
set work.duration;
set work.prob3e;
set work.prob3f;
by id;
run;
proc print data=work.prob3g;
id id;
run;
Summary of Fever by Group and Day |
The FREQ Procedure
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Temperature >= 100.4 F is considered a fever. |
Summary of Number of Days with Fever by Group |
The MEANS Procedure
Analysis Variable : nfev # days with fever | ||||||
treatment group | N | Mean | Std Dev | Minimum | Median | Maximum |
1 | 5 | 1.0 | 0.7 | 0.0 | 1.0 | 2.0 |
2 | 5 | 7.4 | 1.5 | 6.0 | 7.0 | 9.0 |
Summary of Ever Having Fever by Group |
The FREQ Procedure
|
|
Summary of Number of Days with Longest Consecutive Fever by Group |
The MEANS Procedure
Analysis Variable : duration # days with longest consecutive fever | ||||||
treatment group | N | Mean | Std Dev | Minimum | Median | Maximum |
1 | 5 | 0.8 | 0.4 | 0.0 | 1.0 | 1.0 |
2 | 5 | 4.0 | 2.0 | 2.0 | 3.0 | 7.0 |
Summary of Onset of Fever by Group |
Obs | temp | id | time | group | fever | onset |
1 | 101.626 | 2 | 4 | 1 | 1 | 4 |
2 | 100.728 | 3 | 3 | 1 | 1 | 3 |
3 | 100.850 | 4 | 9 | 1 | 1 | 9 |
4 | 101.173 | 5 | 9 | 1 | 1 | 9 |
5 | 101.163 | 6 | 1 | 2 | 1 | 1 |
6 | 101.730 | 7 | 1 | 2 | 1 | 1 |
7 | 101.314 | 8 | 1 | 2 | 1 | 1 |
8 | 103.964 | 9 | 1 | 2 | 1 | 1 |
9 | 101.370 | 10 | 3 | 2 | 1 | 3 |
Summary of Onset of Fever by Group |
The MEANS Procedure
Analysis Variable : onset day of first fever | ||||||
treatment group | N | Mean | Std Dev | Minimum | Median | Maximum |
1 | 4 | 6.3 | 3.2 | 3.0 | 6.5 | 9.0 |
2 | 5 | 1.4 | 0.9 | 1.0 | 1.0 | 3.0 |
Summary of Area Under the Curve for Temperature by Group |
The MEANS Procedure
Analysis Variable : auc area under the curve for temperature | ||||||
treatment group | N | Mean | Std Dev | Minimum | Median | Maximum |
1 | 5 | 883.1 | 8.0 | 872.5 | 881.1 | 892.7 |
2 | 5 | 911.2 | 4.4 | 906.3 | 913.3 | 915.8 |
Listing of New Variables |
id | temp | time | group | fever | nfev | ever | consecutive | duration | onset | auc |
1 | 101.626 | 4 | 1 | 1 | 0 | NO | 0 | 0 | 4 | 879.900 |
2 | 100.728 | 3 | 1 | 1 | 1 | YES | 0 | 1 | 3 | 881.142 |
3 | 100.850 | 9 | 1 | 1 | 2 | YES | 0 | 1 | 9 | 892.676 |
4 | 101.173 | 9 | 1 | 1 | 1 | YES | 0 | 1 | 9 | 872.479 |
5 | 101.163 | 1 | 1 | 1 | 1 | YES | 0 | 1 | 1 | 889.449 |
6 | 101.730 | 1 | 2 | 1 | 7 | YES | 0 | 2 | 1 | 913.268 |
7 | 101.314 | 1 | 2 | 1 | 6 | YES | 0 | 3 | 1 | 906.276 |
8 | 103.964 | 1 | 2 | 1 | 9 | YES | 0 | 7 | 1 | 915.809 |
9 | 101.370 | 3 | 2 | 1 | 9 | YES | 0 | 5 | 3 | 914.003 |
@Reeza wrote:
Are you sure you want multiple SET statements which is an Interleave?
... stuff deleted ...
I don't think this particular OP example is an interleave. Interleave would require multiple dataset names in a single SET statement, accompanied by a BY statement. In the OP's case, there are multiple SET statements, each with a single dataset name. The code is structured like:
data want;
set a;
set b;
set c;
by id;
....
run;
The BY statement here is tied only to the SET C statement. It doesn't track the ID values in A or B. So this actually is a merge, but not a match-merge, as the OP has discovered.
Thanks for the correction @mkeintz !
Not only will your results be mismatched, but (as you have discovered) your data set will have no more observations than the smallest contributor among your SET statements. Learn about match-merging using the MERGE statement accompanied by a BY statement: take a look at the examples in SAS® 9.4 and SAS® Viya 3.5 Programming Documentation, MERGE Statement page.
Thank you for the assistance mkeintz! Just to make sure I'm understanding, if I use a merge statement, as opposed to multiple set statements, would I want to follow the merge statement with the data sets, i.e.
merge work.prob3b work.duration?
Thanks again!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.