Proc Report Date Problem

deleted_user · Posted 02-08-2009 09:18 AM

Hi there,

I'm having a problem displaying data in a specific listing format.

The dataset is of the form:

patient **** test ******** day *******result ***** date
101 monocytes screening 64 13MAR2008
101 monocytes day2 67 15MAR2008
101 monocytes day4 72 17MAR2008
101 monocytes day7 71 20MAR2008
101 hemoglobin screening 107 13MAR2008
101 hemoglobin day2 120 15MAR2008
101 hemoglobin day4 118 17MAR2008
101 hemoglobin day7 125 20MAR2008
............................................................................
........................................................................ etc

102 monocytes screening 80 14APR2008
102 monocytes day2 73 18APR2008
102 monocytes day4 75 20APR2008
102 monocytes day7 81 23APR2008
102 hemoglobin screening 112 14APR2008
102 hemoglobin day2 119 18APR2008
102 hemoglobin day4 121 20APR2008
102 hemoglobin day7 128 23APR2008
............................................................................
....................................................................... etc

The table is to be in the form:

************************** screening ******* day2 ***********day4 ********** day7
patient **** test ****** 13MAR2008 **15MAR2008 ** 17MAR2008 ** 20MAR2008
101 **** monocytes ****** 64 ************* 67 ************ 72 ************* 71
101 **** hemoglobin ***** 107 ************ 120 *********** 118 *********** 125
............................................................................................................
...................................................................................................... etc

************************** screening ******* day2 ***********day4 *********** day7
patient **** test ****** 14APR2008 **18APR2008 *** 20APR2008 ***** 23APR2008
102 **** monocytes ******* 80 ************ 73 ************* 75 ************** 81
102 **** hemoglobin ****** 112 **********119 ************ 121 ************ 128
.............................................................................................................
........................................................................................................ etc

I'm using proc report here and sending it to word files through ODS. I can read all the dates into macro variables but not sure if this helps with the code for the proc report, I was thinking along the lines of .....

proc report data = final nowd split = '!' ;
column patient test screening day2 day4 day 7;
define patient / display;
define test / display;
define screening / display 'screening ! &date1';
define day2 / display 'day2 ! &date2';
define day4 / display 'day4 ! &date3';
define day7 / display 'day7 ! &date4';
run;

This is fine for the first patient but the next patient will relate to macro variables &date5, &date6, &date7 and &date8 and so on for the next patients. Any ideas how to code the date info into the report??

Cynthia_sas · Posted 02-08-2009 02:47 PM

Hi:
Using macro variables is not required for this task, if your data is as you describe it. PROC REPORT has a usage of "ACROSS" that turns variable values into columns. Just as you have DISPLAY in your code below, you would define a usage of ACROSS for day and date. Your data (as shown) is long and skinny. It looks like you have a variable called DAY, whose possible values are "Screening", "Day2", "Day4", etc. So in your DEFINE statement, you would have only 1 DEFINE statement -- your DEFINE statement lists the variable -- NOT the variable values:
[pre]
define day / across order=data;
[/pre]

This way, PROC REPORT knows to make one unique column for every value of the DAY variable. You need order=data because the value 'Screening' needs to appear in the columns before the value 'Day 2'. Since you have a variable called DATE that already contains the dates you want, there is no problem with nesting the DATE variable under the DAY variable. Finally, you'd put RESULT into the mix. You indicate this nesting by using a comma in the COLUMN statement:
[pre]
column patient test result,day,date;
[/pre]

I'd list RESULT first, so that a single column header for this variable will span over all the days/dates. Or, if you blank out the header, then ODS HTML, RTF or PDF would suppress the blank line if it's at the top of all the nested column headers.

Then since you want each patient treated separately, that's an argument for using a BY PATIENT statement inside PROC REPORT. This will have the side effect of putting a header or caption above each table of: Patient=101, etc. If you don't want to see the byline information, you can turn if off with a SAS system option.

The other side effect of BY group processing would be for every patient to be on a separate page. If you have more tests for each patient, then this may be OK (for every patient to start on a separate page). If you want to have multiple patient tables on a page (assuming their tables are small enough), then you can investigate the STARTPAGE=NO option of the ODS RTF destination.

My suggestion is that you read more in the documentation about PROC REPORT and look at more examples of ACROSS usage. You can also search for user papers about PROC REPORT. In addition, Tech Support can help you if you run into snags.

The program below uses the data you gave as a sample. A patient could have more test results, and, as long as the test names are unique, each new test would be on a report row by itself. However, if one patient had 2 hemoglobin tests on the same days, then this approach might not work. You might need to manually transpose the long, skinny data into wide data, with a column for every date. But this may get you started.

cynthia

[pre]
data pt_data;
length patient 8 test $15 day $10 result 8 date 8;
infile datalines;
input patient test $ day $ result date : date9.;
return;
datalines;
101 monocytes screening 64 13MAR2008
101 monocytes day2 67 15MAR2008
101 monocytes day4 72 17MAR2008
101 monocytes day7 71 20MAR2008
101 hemoglobin screening 107 13MAR2008
101 hemoglobin day2 120 15MAR2008
101 hemoglobin day4 118 17MAR2008
101 hemoglobin day7 125 20MAR2008
102 monocytes screening 80 14APR2008
102 monocytes day2 73 18APR2008
102 monocytes day4 75 20APR2008
102 monocytes day7 81 23APR2008
102 hemoglobin screening 112 14APR2008
102 hemoglobin day2 119 18APR2008
102 hemoglobin day4 121 20APR2008
102 hemoglobin day7 128 23APR2008
;
run;

ods listing close;
ods rtf file='c:\temp\across.rtf';
proc report data=pt_data nowd nocompletecols;
title 'Using ACROSS Variables';
by patient;
column patient test result,day,date;
define patient / group;
define test /group ;
define day / across order=data ' ';
define date / across order=data f=date9. ' ';
define result / sum 'Results';
run;
ods rtf close;
[/pre]

deleted_user · Posted 02-09-2009 07:00 AM

Hi Cynthia,

Thanks very much for your response to my query!

Your code worked a treat on the data I provided although (as always!) there may be a spanner thrown in the works. It looks as though I'll have to display my results in character format as they want any repeat test results in brackets after the initial result so some of my data becomes:

patient **** test ******** day *******result ***** date
101 monocytes screening 64(66) 13MAR2008
101 monocytes day2 67 15MAR2008
101 monocytes day4 72(70) 17MAR2008
............................................................

Searching on the forum for using proc report with across variables and character data, it looks as though this isn't possible although there is mention of possibly using arrays or creating a dummy variable as a workaround in these previous threads ....

http://support.sas.com/forums/thread.jspa?messageID=8492ℬ
http://support.sas.com/forums/thread.jspa?messageID=11932⺜

I've already tried creating a 'dummy variable' for my problem but had no luck resolving the problem. Would the Array method be the best way to go??

data_null__ · Posted 02-09-2009 08:13 AM

This modification uses the dummy variable with character RESULT. Notice the changes in define statement for result and addition of define dummy.

[pre]
proc report data=pt_data nowd nocompletecols list;
title 'Using ACROSS Variables';
by patient;
column patient test result,day,date dummy;
define patient / group;
define test /group ;
define day / across order=data ' ';
define date / across order=data f=date9. ' ';
define result / display 'Results';
define dummy / noprint;
run;
[/pre]

I just looked at the threads you referenced and noticed the dummy thread creates a variable in the input data which is not necssary. PROC REPORT with treat dummy as calculated so all you need is noprint on the define. P.S.

Message was edited by: data _null_;

deleted_user · Posted 02-09-2009 12:59 PM

Thanks for the info re the character results!!

I've almost got the required output required although I'm left with one final problem .....

Whenever the first value for test is missing for an individual patient, then the ordering of the proc report gets messed up ie if the screening data for patient 101 where test = monocyte is missing as shown below:

patient **** test ******** day *******result ***** date
101 monocytes day2 67 15MAR2008
101 monocytes day4 72 17MAR2008
101 monocytes day7 71 20MAR2008
101 hemoglobin screening 107 13MAR2008
101 hemoglobin day2 120 15MAR2008
101 hemoglobin day4 118 17MAR2008
101 hemoglobin day7 125 20MAR2008

the resultant report is of the form:

************************** day2 ***********day4 ********** day7********screening*****
patient **** test *****15MAR2008 ** 17MAR2008 ** 20MAR2008**13MAR2008
101 **** hemoglobin **** 120 *********** 118 *********** 125 ***********107
101 **** monocytes ****** 67 ************ 72 ************* 71 *********** ******

Is there a way to preserve the date order within Proc Report?? If not I'm trying to find the best way to achieve this prior to the report step. Thanks again.

Cynthia_sas · Posted 02-09-2009 01:27 PM

Hi:
PROC REPORT is trying to do you a favor by ordering 'H' before 'M'. That's the same favor it's doing you by having 'D' before 'S'. I think if you sort your data by descending test and then by ascending date, you won't have that problem. You may have to put ORDER=DATA everywhere in PROC REPORT to tell it to leave your order alone, but when I have missing value for result on the screening date, I still get SCREENING first if the data are sorted.

Also, (to answer your other question), as long as you only have 2 test results on any date, I'd just use FIRST. and LAST. processing with BY variables to build the character version of RESULT. You may want to have
options missing=0; or options missing = ' '; so you don't see the '.' for the missing results -- in the final report.

cynthia

[pre]
data pt_data;
length patient 8 test $15 day $10 result 8 date 8;
infile datalines;
input patient test $ day $ result date : date9.;
orig_ord = _n_;
return;
datalines;
100 monocytes screening . 13MAR2008
100 monocytes day2 67 15MAR2008
100 monocytes day4 72 17MAR2008
100 monocytes day4 70 17MAR2008
100 monocytes day7 71 20MAR2008
100 hemoglobin screening 107 13MAR2008
100 hemoglobin day2 120 15MAR2008
100 hemoglobin day4 118 17MAR2008
100 hemoglobin day4 116 17MAR2008
100 hemoglobin day7 125 20MAR2008
101 monocytes screening 64 13MAR2008
101 monocytes screening 66 13MAR2008
101 monocytes day2 67 15MAR2008
101 monocytes day4 72 17MAR2008
101 monocytes day4 70 17MAR2008
101 monocytes day7 71 20MAR2008
101 hemoglobin screening 107 13MAR2008
101 hemoglobin day2 120 15MAR2008
101 hemoglobin day4 118 17MAR2008
101 hemoglobin day4 116 17MAR2008
101 hemoglobin day7 125 20MAR2008
102 monocytes screening 80 14APR2008
102 monocytes day2 73 18APR2008
102 monocytes day4 75 20APR2008
102 monocytes day7 81 23APR2008
102 monocytes day7 79 23APR2008
102 hemoglobin screening 112 14APR2008
102 hemoglobin day2 119 18APR2008
102 hemoglobin day2 116 18APR2008
102 hemoglobin day4 121 20APR2008
102 hemoglobin day4 119 20APR2008
102 hemoglobin day7 128 23APR2008
;
run;

proc sort data=pt_data out=pt_data;
by patient descending test date;
run;

/* if first.date=1 and last.date=1 then only 1 test/date/result */
/* if first.date=1 and last.date=0 then more than 1 test/date/result */
/* if first.date=0 and last.date=1 then have the last test/date/result */
/* assuming there are only TWO results for any test on a single date */
/* then if first.date, need to hold the first result */
/* if last.date and first.date, create cresult from just result */
/* if last.date and NOT first.date, then create cresult by */
/* concatenating hold (first result) to result on last date */

data newpt;
length cresult $10;
retain hold;
set pt_data;
by patient descending test date ;
if first.date then hold=result;
if last.date then do;
if first.date then cresult = put(result,3.0);
else cresult = put(hold,3.0)||' ('||compress(put(result,3.0))||')';
output;
end;
run;

ods listing close;
ods rtf file='c:\temp\across.rtf';
proc report data=newpt nowd nocompletecols;
title 'Using ACROSS Variables';
by patient;
column patient test cresult,day,date dummy;
define patient / group order=data;
define test /group order=data;
define day / across order=data ' ';
define date / across order=data f=date9. ' ';
define cresult / display 'Results'
style(column)={just=c};
define dummy / noprint;
run;
ods rtf close;
[/pre]

data_null__ · Posted 02-09-2009 01:27 PM

You would need to add the missing observations when using ORDER=DATA. That can be more of a burden than the benefit.

I would create a numeric variable representation for DAY and use that new variable where you now use DAY and specify ORDER=INTERNAL on the DEFINE statement.

deleted_user · Posted 02-09-2009 06:30 PM

Hiya,

thanks to you both, seeing different approaches is especially helpful as it gives me more ideas come the next problem I face 😉

I'm still getting some strange values in my report (don't know if it's just me!) when I run this code. I have removed the entire 'screening' observation (not just the RESULT value) for patient 101 where TEST = monocytes as this is consistent with the problem in my dataset ie I have removed the top line after the datalines statement from the original problem ........

data pat_data;
length patient 8 test $15 day $10 result 8 date 8;
infile datalines;
input patient test $ day $ result date : date9.;
return;
datalines;
101 monocytes day2 66 15MAR2008
101 monocytes day4 72 17MAR2008
101 monocytes day7 71 20MAR2008
101 hemoglobin screening 107 13MAR2008
101 hemoglobin day2 120 15MAR2008
101 hemoglobin day4 118 17MAR2008
101 hemoglobin day7 125 20MAR2008
102 monocytes screening 80 14APR2008
102 monocytes day2 73 18APR2008
102 monocytes day4 75 20APR2008
102 monocytes day7 81 23APR2008
102 hemoglobin screening 112 14APR2008
102 hemoglobin day2 119 18APR2008
102 hemoglobin day4 121 20APR2008
102 hemoglobin day7 128 23APR2008
;
run;

data pat_data1;
set pat_data;
if day = 'screening' then nday = 1;
else if day = 'day2' then nday = 2;
else if day = 'day4' then nday = 3;
else if day = 'day7' then nday = 4;
run;

proc format;
value no1day 1 = screening
2 = day2
3 = day4
4 = day7;
run;

options missing = ' ' nobyline;
ods listing close;
ods rtf file='c:\temp\acrossy.rtf';
proc report data=pat_data1 nowd nocompletecols;
title 'Using ACROSS Variables';
by patient;
column patient test result,nday,date;
define patient / group;
define test /group ;
define nday / across order=internal f=no1day. ' ';
define date / across order=data f=date9. ' ';
define result / sum '';
run;
ods rtf close;

It has kept the ordering of the date values which is great but there now appears strange values under TEST in the report for patient 101. Instead of 'monocytes' and 'hemoglobin' I now have values 1.448E166 and 5.31E180 respectively ... bizarre!

When I run this code also with the 'screening' observation removed, the column ordering for patient 100 has screening at the end .....

data pt_data;
length patient 8 test $15 day $10 result 8 date 8;
infile datalines;
input patient test $ day $ result date : date9.;
orig_ord = _n_;
return;
datalines;
100 monocytes day2 67 15MAR2008
100 monocytes day4 72 17MAR2008
100 monocytes day4 70 17MAR2008
100 monocytes day7 71 20MAR2008
100 hemoglobin screening 107 13MAR2008
100 hemoglobin day2 120 15MAR2008
100 hemoglobin day4 118 17MAR2008
100 hemoglobin day4 116 17MAR2008
100 hemoglobin day7 125 20MAR2008
101 monocytes screening 64 13MAR2008
101 monocytes screening 66 13MAR2008
101 monocytes day2 67 15MAR2008
101 monocytes day4 72 17MAR2008
101 monocytes day4 70 17MAR2008
101 monocytes day7 71 20MAR2008
101 hemoglobin screening 107 13MAR2008
101 hemoglobin day2 120 15MAR2008
101 hemoglobin day4 118 17MAR2008
101 hemoglobin day4 116 17MAR2008
101 hemoglobin day7 125 20MAR2008
102 monocytes screening 80 14APR2008
102 monocytes day2 73 18APR2008
102 monocytes day4 75 20APR2008
102 monocytes day7 81 23APR2008
102 monocytes day7 79 23APR2008
102 hemoglobin screening 112 14APR2008
102 hemoglobin day2 119 18APR2008
102 hemoglobin day2 116 18APR2008
102 hemoglobin day4 121 20APR2008
102 hemoglobin day4 119 20APR2008
102 hemoglobin day7 128 23APR2008
;
run;

proc sort data=pt_data out=pt_data;
by patient descending test date;
run;

/* if first.date=1 and last.date=1 then only 1 test/date/result */
/* if first.date=1 and last.date=0 then more than 1 test/date/result */
/* if first.date=0 and last.date=1 then have the last test/date/result */
/* assuming there are only TWO results for any test on a single date */
/* then if first.date, need to hold the first result */
/* if last.date and first.date, create cresult from just result */
/* if last.date and NOT first.date, then create cresult by */
/* concatenating hold (first result) to result on last date */

data newpt;
length cresult $10;
retain hold;
set pt_data;
by patient descending test date ;
if first.date then hold=result;
if last.date then do;
if first.date then cresult = put(result,3.0);
else cresult = put(hold,3.0)||' ('||compress(put(result,3.0))||')';
output;
end;
run;

ods listing close;
ods rtf file='c:\temp\across.rtf';
proc report data=newpt nowd nocompletecols;
title 'Using ACROSS Variables';
by patient;
column patient test cresult,day,date dummy;
define patient / group order=data;
define test /group order=data;
define day / across order=data ' ';
define date / across order=data f=date9. ' ';
define cresult / display 'Results'
style(column)={just=c};
define dummy / noprint;
run;
ods rtf close;

It would be possible for me to create a dataset that includes all PATIENT / TEST / DAY and DATE observations even when RESULT is missing although this seems like a very arduous way of going about it as the full dataset I'm working with is massive.

data_null__ · Posted 02-09-2009 11:20 PM

> Instead of 'monocytes' and 'hemoglobin' I now have values 1.448E166 and
> 5.31E180 respectively ...

That is a strange result in indeed. Those are some VERY large numbers.

But to fix it you need to make the order= options consistant.

You had order=data but it will be all good with order=internal
[pre]
define date / across order=internal f=date9. ' ';
[/pre]

As for the other order problem you have order=data on all the group and across variables so you should expect that the groups be ordered as requested.

Cynthia_sas · Posted 02-10-2009 02:43 AM

Hi:
This has turned into quite a data cleaning problem.

Ah, the problem data .... maybe there's a patient with no "screening" test for any of the tests, but an empty screening column to appear anyway; or a patient will have 2 different tests, each with a different screening date; or a patient could have "bad" data and if there are two tests on the same day with exactly the same result #, then one of the tests needs to drop instead of putting it in parentheses.

You may need to write a program to see how "clean" or "usable" the data is -- in order to avoid the "just one more problem" approach to program development. If you understand all the possible permutations and problems spots ahead of time, you'll be able to write a better program. And, you may need to put a "fake" observation into the data so that every patient has a screening observation, even if result is missing for the fake observation, You can do this programmatically, but it may be the only way to achieve what you want -- at least with this version of the data.

And sometimes, you may have to go back to the folks who are suppying the data and get them to "standardize" how the data is sent. There may be some permutations of the data that are "deal breakers" -- that you can't program around. You need to know whether that's what you're dealing with. It's possible that concentrating on PROC REPORT at this point is premature.

At any rate, the missing option won't help you and the preloadfmt option won't help you -- because you because you need NOCOMPLETECOLS so you only get the dates in the data for any single patient and not a column for EVERY date in the data.

Honestly, I'm stumped over the scientific notation. The TEST variable seems to be character. I don't understand how SAS could show you scientific notation for a character variable in the LISTING window or in ODS output. Very bizarre. That may be an issue for Tech Support.

cynthia

deleted_user · Posted 02-10-2009 10:28 AM

Hiya,

Sorry for the 'each post reveals a new problem' series of queries, I have got there with help from you both, many thanks!! Keeping the DATA = options consistent got rid of the scientific notation problem.

Re the data cleaning, the data is a lab dataset in CDISC format (finalised after inhouse and client QC) so I don't think this is as much of a problem as the way I'd phrased prior queries. There is never the situation where there is a screening observation with no RESULT present, just that whenever a RESULT is blank for a certain DAY/DATE, that observation would not exist in the final dataset.

I'm glad I stuck with proc report for this purpose as even though it required much help from yourselves and internet searching to get the format I needed, I think banging your head against a problem is sometimes the best way to learn for someone in my position. Wading through the documentation and reading many online proc report examples doesn't reveal the small nuances of this proc with it's myriad of options available compared with extensive practical use.

Anyhows, after getting a better idea of the power and limitations of proc report, I'll be better placed to make a judgement call up front before getting bogged down on a specific programming methodology next time!

Cheers again!

Proc Report Date Problem

Re: Proc Report Date Problem

Re: Proc Report Date Problem

Re: Proc Report Date Problem

Re: Proc Report Date Problem

Re: Proc Report Date Problem

Re: Proc Report Date Problem

Re: Proc Report Date Problem

Re: Proc Report Date Problem

Re: Proc Report Date Problem

Re: Proc Report Date Problem

Registration is open

SAS Training: Just a Click Away