BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
itmightbeharry
Fluorite | Level 6

Hello everyone:

 

I have a data set for which I have created a variable "onset" set equal to the time of first occurrence by id. I'm mostly satisfied with the output using PROC PRINT and the PROC MEANS output is exactly what I'm looking for. The problem I've encountered is when merging this data with other data. There is no value for onset for id # 1, which has no occurrence of "fever"; however, when merging with other data, each onset value moves up one, with id #10 missing. I would like to have "." for onset for id # 1 in the merged data. Any help or suggestions would be very much appreciated! I have done my best to copy my code and output below, along with the desired output at the very bottom.

title 'Summary of Onset of Fever by Group';
data work.prob3e;
  set work.fever;
  by id;
  where fever=1;
  if first.id;
  onset=time;
  label onset='day of first fever';
run;
proc print data=work.prob3e;
run;
proc means data=work.prob3e maxdec=1 nonobs n mean std min median max;
  class group;
  var onset;
run;
title 'Listing of New Variables';
data work.prob3g;
  set work.prob3b;
  set work.prob3c;
  set work.duration;
  set work.prob3e;
  set work.prob3f;
  by id;
run;
proc print data=work.prob3g;
id id;
run;

 

Here the first observation is id 2. Id 1 has no occurrence of fever:

Summary of Onset of Fever by Group

 

Obs

temp

id

time

group

fever

onset

1

101.626

2

4

1

1

4

2

100.728

3

3

1

1

3

3

100.850

4

9

1

1

9

4

101.173

5

9

1

1

9

5

101.163

6

1

2

1

1

6

101.730

7

1

2

1

1

7

101.314

8

1

2

1

1

8

103.964

9

1

2

1

1

9

101.370

10

3

2

1

3


All of this is correct:

The MEANS Procedure

Analysis Variable : onset day of first fever

treatment group

N

Mean

Std Dev

Minimum

Median

Maximum

1

4

6.3

3.2

3.0

6.5

9.0

2

5

1.4

0.9

1.0

1.0

3.0

 

Here's where things go awry. The onset for id 2 from previous has moved up one position to id 1, and so forth for every id, with id 10 missing. I would like onset=. for id 1.

Listing of New Variables

 

id

temp

time

group

fever

nfev

ever

consecutive

duration

onset

auc

1

101.626

4

1

1

0

NO

0

0

4

879.900

2

100.728

3

1

1

1

YES

0

1

3

881.142

3

100.850

9

1

1

2

YES

0

1

9

892.676

4

101.173

9

1

1

1

YES

0

1

9

872.479

5

101.163

1

1

1

1

YES

0

1

1

889.449

6

101.730

1

2

1

7

YES

0

2

1

913.268

7

101.314

1

2

1

6

YES

0

3

1

906.276

8

103.964

1

2

1

9

YES

0

7

1

915.809

9

101.370

3

2

1

9

YES

0

5

3

914.003

 

I can take care of the labels and variables output, but this is what I'm trying to produce:

SAS merge.png

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Not only will your results be mismatched, but (as you have discovered) your data set will have no more observations than the smallest contributor among your SET statements.  Learn about match-merging using the MERGE statement accompanied by a BY statement:  take a look at the examples in SAS® 9.4 and SAS® Viya 3.5 Programming Documentation, MERGE Statement page.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

8 REPLIES 8
Reeza
Super User

Are you sure you want multiple SET statements which is an Interleave?

I would expect a merge instead but by group not by ID. 

 

The different methods of combining data are outlined here, I think you're looking for a match merge

https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=lrcon&docsetTarget=n1tgk0...

 

You'll have to provide more details but I suspect you'll need more than one merge for your data sets. 

 

It looks like you're trying to add summary statistics to a main data set? For these type of calculations I find SQL much easier personally, as it can save you multiple steps. Determining the "FIRST" occurrence can also be done via PROC SUMMARY. 

https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas

 


@itmightbeharry wrote:

Hello everyone:

 

I have a data set for which I have created a variable "onset" set equal to the time of first occurrence by id. I'm mostly satisfied with the output using PROC PRINT and the PROC MEANS output is exactly what I'm looking for. The problem I've encountered is when merging this data with other data. There is no value for onset for id # 1, which has no occurrence of "fever"; however, when merging with other data, each onset value moves up one, with id #10 missing. I would like to have "." for onset for id # 1 in the merged data. Any help or suggestions would be very much appreciated! I have done my best to copy my code and output below, along with the desired output at the very bottom.

title 'Summary of Onset of Fever by Group';
data work.prob3e;
  set work.fever;
  by id;
  where fever=1;
  if first.id;
  onset=time;
  label onset='day of first fever';
run;
proc print data=work.prob3e;
run;
proc means data=work.prob3e maxdec=1 nonobs n mean std min median max;
  class group;
  var onset;
run;
title 'Listing of New Variables';
data work.prob3g;
  set work.prob3b;
  set work.prob3c;
  set work.duration;
  set work.prob3e;
  set work.prob3f;
  by id;
run;
proc print data=work.prob3g;
id id;
run;

 

Here the first observation is id 2. Id 1 has no occurrence of fever:

Summary of Onset of Fever by Group

 

Obs

temp

id

time

group

fever

onset

1

101.626

2

4

1

1

4

2

100.728

3

3

1

1

3

3

100.850

4

9

1

1

9

4

101.173

5

9

1

1

9

5

101.163

6

1

2

1

1

6

101.730

7

1

2

1

1

7

101.314

8

1

2

1

1

8

103.964

9

1

2

1

1

9

101.370

10

3

2

1

3


All of this is correct:

The MEANS Procedure

Analysis Variable : onset day of first fever

treatment group

N

Mean

Std Dev

Minimum

Median

Maximum

1

4

6.3

3.2

3.0

6.5

9.0

2

5

1.4

0.9

1.0

1.0

3.0

 

Here's where things go awry. The onset for id 2 from previous has moved up one position to id 1, and so forth for every id, with id 10 missing. I would like onset=. for id 1.

Listing of New Variables

 

id

temp

time

group

fever

nfev

ever

consecutive

duration

onset

auc

1

101.626

4

1

1

0

NO

0

0

4

879.900

2

100.728

3

1

1

1

YES

0

1

3

881.142

3

100.850

9

1

1

2

YES

0

1

9

892.676

4

101.173

9

1

1

1

YES

0

1

9

872.479

5

101.163

1

1

1

1

YES

0

1

1

889.449

6

101.730

1

2

1

7

YES

0

2

1

913.268

7

101.314

1

2

1

6

YES

0

3

1

906.276

8

103.964

1

2

1

9

YES

0

7

1

915.809

9

101.370

3

2

1

9

YES

0

5

3

914.003

 

I can take care of the labels and variables output, but this is what I'm trying to produce:

SAS merge.png


 

itmightbeharry
Fluorite | Level 6

Hello Reeza:

 

Thank you very much for reaching out! I've inserted my full code below, along with the output produced. For the final output, I would like the id number to be my first variable. I have not yet labeled the variables where I'm trying to combine data, but the ones I looking to display from the final output I'd copied are "id", "group", "nfev", "duration" 

title 'Summary of Fever by Group and Day';
footnote 'Temperature >= 100.4 F is considered a fever.';
data prob3;
  set cert.tempdata;
  label fever='fever';
  if temp>=100.4 then fever='YES';
    else if temp<100.4 then fever='NO';
run;
proc freq data=prob3;
  tables time*group*fever / nopercent nocol;
run;
title 'Summary of Number of Days with Fever by Group';
footnote;
proc sort data=cert.tempdata out=work.prob3b1;
  by id;
run;
data work.prob3b;
  set work.prob3b1;
  by id;
  if temp>=100.4 then fever=1;
    else if temp<100.4 then fever=0;
  if first.id then nfev=0;
  nfev+fever;
  if last.id;
run;
proc means data=work.prob3b nonobs n mean std min median max maxdec=1;
  label nfev='# days with fever';
  var nfev;
  class group;
run;
title 'Summary of Ever Having Fever by Group';
data work.prob3c;
  set work.prob3b;
  label ever='ever had fever';
  if nfev>=1 then ever='YES';
    else if nfev=0 then ever='NO';
run;
proc freq data=work.prob3c;
  tables group*ever / nopercent nocol;
run;
title 'Summary of Number of Days with Longest Consecutive Fever by Group';
data work.fever;
  set cert.tempdata;
  if temp>=100.4 then fever=1;
    else if temp<100.4 then fever=0;
run;
data work.duration;
  do until (last.id);
    set work.fever;
	by id;
	if fever = 0 then consecutive = 0;
	if fever then consecutive+1;
	duration = max(duration, consecutive);
  end;
  consecutive=0;
  label duration='# days with longest consecutive fever';
run;
proc means data = work.duration nway maxdec=1 nonobs n mean std min median max;
  class group;
  var duration;
run;
title 'Summary of Onset of Fever by Group';
data work.prob3e;
  set work.fever;
  by id;
  where fever=1;
  if first.id;
  onset=time;
  label onset='day of first fever';
run;
proc print data=work.prob3e;
run;
proc means data=work.prob3e maxdec=1 nonobs n mean std min median max;
  class group;
  var onset;
run;
title Summary of Area Under the Curve for Temperature by Group;
data work.prob3f;
do until(last.id);
  set cert.tempdata; by id time;
  if last.time then
    auc = sum(auc, mean(temp, prevTemp) * range(time, prevTime));
  prevTemp = temp;
  prevTime = time;
  end;
keep id auc group;
label auc='area under the curve for temperature';
run;
proc means data=work.prob3f maxdec=1 nonobs n mean std min median max;
  class group;
  var auc;
run;
title 'Listing of New Variables';
data work.prob3g;
  set work.prob3b;
  set work.prob3c;
  set work.duration;
  set work.prob3e;
  set work.prob3f;
  by id;
run;
proc print data=work.prob3g;
id id;
run;

Summary of Fever by Group and Day

 

The FREQ Procedure

Frequency

Row Pct

Table 1 of group by fever

Controlling for time=1

group(treatment group)

fever(fever)

NO

YES

Total

1

5

100.00

0

0.00

5

 

2

1

20.00

4

80.00

5

 

Total

6

4

10

 

Frequency

Row Pct

Table 2 of group by fever

Controlling for time=2

group(treatment group)

fever(fever)

NO

YES

Total

1

5

100.00

0

0.00

5

 

2

2

40.00

3

60.00

5

 

Total

7

3

10

 

Frequency

Row Pct

Table 3 of group by fever

Controlling for time=3

group(treatment group)

fever(fever)

NO

YES

Total

1

4

80.00

1

20.00

5

 

2

0

0.00

5

100.00

5

 

Total

4

6

10

 

Frequency

Row Pct

Table 4 of group by fever

Controlling for time=4

group(treatment group)

fever(fever)

NO

YES

Total

1

4

80.00

1

20.00

5

 

2

2

40.00

3

60.00

5

 

Total

6

4

10

 

Frequency

Row Pct

Table 5 of group by fever

Controlling for time=5

group(treatment group)

fever(fever)

NO

YES

Total

1

5

100.00

0

0.00

5

 

2

3

60.00

2

40.00

5

 

Total

8

2

10

 

Frequency

Row Pct

Table 6 of group by fever

Controlling for time=6

group(treatment group)

fever(fever)

NO

YES

Total

1

4

80.00

1

20.00

5

 

2

1

20.00

4

80.00

5

 

Total

5

5

10

 

Frequency

Row Pct

Table 7 of group by fever

Controlling for time=7

group(treatment group)

fever(fever)

NO

YES

Total

1

5

100.00

0

0.00

5

 

2

1

20.00

4

80.00

5

 

Total

6

4

10

 

Frequency

Row Pct

Table 8 of group by fever

Controlling for time=8

group(treatment group)

fever(fever)

NO

YES

Total

1

5

100.00

0

0.00

5

 

2

2

40.00

3

60.00

5

 

Total

7

3

10

 

Frequency

Row Pct

Table 9 of group by fever

Controlling for time=9

group(treatment group)

fever(fever)

NO

YES

Total

1

3

60.00

2

40.00

5

 

2

1

20.00

4

80.00

5

 

Total

4

6

10

 

Frequency

Row Pct

Table 10 of group by fever

Controlling for time=10

group(treatment group)

fever(fever)

NO

YES

Total

1

5

100.00

0

0.00

5

 

2

0

0.00

5

100.00

5

 

Total

5

5

10

 

Temperature >= 100.4 F is considered a fever.

 

 

 

Summary of Number of Days with Fever by Group

 

The MEANS Procedure

Analysis Variable : nfev # days with fever

treatment group

N

Mean

Std Dev

Minimum

Median

Maximum

1

5

1.0

0.7

0.0

1.0

2.0

2

5

7.4

1.5

6.0

7.0

9.0

 

 

 

Summary of Ever Having Fever by Group

 

The FREQ Procedure

Frequency

Row Pct

Table of group by ever

group(treatment group)

ever(ever had fever)

NO

YES

Total

1

1

20.00

4

80.00

5

 

2

0

0.00

5

100.00

5

 

Total

1

9

10

 

 

 

Summary of Number of Days with Longest Consecutive Fever by Group

 

The MEANS Procedure

Analysis Variable : duration # days with longest consecutive fever

treatment group

N

Mean

Std Dev

Minimum

Median

Maximum

1

5

0.8

0.4

0.0

1.0

1.0

2

5

4.0

2.0

2.0

3.0

7.0

 

 

 

Summary of Onset of Fever by Group

 

Obs

temp

id

time

group

fever

onset

1

101.626

2

4

1

1

4

2

100.728

3

3

1

1

3

3

100.850

4

9

1

1

9

4

101.173

5

9

1

1

9

5

101.163

6

1

2

1

1

6

101.730

7

1

2

1

1

7

101.314

8

1

2

1

1

8

103.964

9

1

2

1

1

9

101.370

10

3

2

1

3

 

 

 

Summary of Onset of Fever by Group

 

The MEANS Procedure

Analysis Variable : onset day of first fever

treatment group

N

Mean

Std Dev

Minimum

Median

Maximum

1

4

6.3

3.2

3.0

6.5

9.0

2

5

1.4

0.9

1.0

1.0

3.0

 

 

 

Summary of Area Under the Curve for Temperature by Group

 

The MEANS Procedure

Analysis Variable : auc area under the curve for temperature

treatment group

N

Mean

Std Dev

Minimum

Median

Maximum

1

5

883.1

8.0

872.5

881.1

892.7

2

5

911.2

4.4

906.3

913.3

915.8

 

 

 

Listing of New Variables

 

id

temp

time

group

fever

nfev

ever

consecutive

duration

onset

auc

1

101.626

4

1

1

0

NO

0

0

4

879.900

2

100.728

3

1

1

1

YES

0

1

3

881.142

3

100.850

9

1

1

2

YES

0

1

9

892.676

4

101.173

9

1

1

1

YES

0

1

9

872.479

5

101.163

1

1

1

1

YES

0

1

1

889.449

6

101.730

1

2

1

7

YES

0

2

1

913.268

7

101.314

1

2

1

6

YES

0

3

1

906.276

8

103.964

1

2

1

9

YES

0

7

1

915.809

9

101.370

3

2

1

9

YES

0

5

3

914.003

 

mkeintz
PROC Star

@Reeza wrote:

Are you sure you want multiple SET statements which is an Interleave?

... stuff deleted ...

@Reeza 

 

I don't think this particular OP example is an interleave.  Interleave would require multiple dataset names in a single SET statement, accompanied by a BY statement.   In the OP's case, there are multiple SET statements, each with a single dataset name.  The code is structured like:

 

data want;
   set a;
   set b;
   set c;
   by id;
  ....
run;

The BY statement here is tied only to the SET C statement.  It doesn't track the ID values in A or B.  So this actually is a merge, but not a match-merge, as the OP has discovered.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
itmightbeharry
Fluorite | Level 6
That's right! I was trying a one-to-one reading approach, without realizing the impact this approach would have upon the missing onset value for the first id. Once I used match-merging, what I was looking for was easily produced.
mkeintz
PROC Star

Not only will your results be mismatched, but (as you have discovered) your data set will have no more observations than the smallest contributor among your SET statements.  Learn about match-merging using the MERGE statement accompanied by a BY statement:  take a look at the examples in SAS® 9.4 and SAS® Viya 3.5 Programming Documentation, MERGE Statement page.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
itmightbeharry
Fluorite | Level 6

Thank you for the assistance mkeintz! Just to make sure I'm understanding, if I use a merge statement, as opposed to multiple set statements, would I want to follow the merge statement with the data sets, i.e.

merge work.prob3b work.duration?

 

Thanks again!

itmightbeharry
Fluorite | Level 6
I just tried this and that's exactly what I needed. Thank you very much to both you and Reeza! I greatly appreciate the documentation each of you provided and how prompt you both were in assisting me.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 955 views
  • 5 likes
  • 3 in conversation