I have the following input data :
SUBJECT CMENDTC ENDT
101 2011-09-08 08SEP2011
101 2011-09-08 08SEP2011
101 2011-09-08 08SEP2011
102 2011-11-21 21NOV2011
103
103
103
103 2011-10-29 29OCT2011
105
106 2012-01-09 29OCT2012
107 2011-09-23 23SEP2011
107 2011-09-25 25SEP2011
107 2011-10-15 15OCT2011
107 2011-09-26 26SEP2011
108 2011-10-15 15OCT2011
108 2011-11-14 14NOV2011
109
109 2011-09-20 20SEP2011
109 2011-09-25 25SEP2011
109 2011-09-13 13SEP2011
109
109 2011-09-25 25SEP2011
109 2011-09-25 25SEP2011
109
110
110 2011-10-12 12OCT2011
110 2011-10-23 23OCT2011
111 2011-10-21 21OCT2011
112
113
113
114 2011-10-05 05OCT2011
114 2011-10-02 02OCT2011
114 2011-10-02 02OCT2011
CMENDTC is character datatype and ENDT is numeric.
Need the following output: we need to check CMENDTC and if it is missing then ENDT is the latest date of CMENDTC for that subject.
If a subject has records with only missing CMEMDTC and ENDT is missing.
SUBJECT CMENDTC ENDT
101 2011-09-08 08SEP2011
101 2011-09-08 08SEP2011
101 2011-09-08 08SEP2011
102 2011-11-21 21NOV2011
103 29OCT2011
103 29OCT2011
103 29OCT2011
103 2011-10-29 29OCT2011
105
106 2012-01-09 29OCT2012
107 2011-09-23 23SEP2011
107 2011-09-25 25SEP2011
107 2011-10-15 15OCT2011
107 2011-09-26 26SEP2011
108 2011-10-15 15OCT2011
108 2011-11-14 14NOV2011
109 25SEP2011
109 2011-09-20 20SEP2011
109 2011-09-25 25SEP2011
109 2011-09-13 13SEP2011
109 25SEP2011
109 2011-09-25 25SEP2011
109 2011-09-25 25SEP2011
109 25SEP2011
110 23OCT2011
110 2011-10-12 12OCT2011
110 2011-10-23 23OCT2011
111 2011-10-21 21OCT2011
112
113
113
114 2011-10-05 05OCT2011
114 2011-10-02 02OCT2011
114 2011-10-02 02OCT2011
115 2011-08 31AUG2011
115 2011-06 30JUN2011
115 31DEC2011
115 2011 31DEC2011
115 31DEC2011
Please, help.
There was an error in my PROC SUMMARY. This is the correct code
proc summary data=have1 nway;
class subject;
var date;
output out=maxx max=max_endt;
run;
Please explain the logic by which 109 has values 25SEP2011 filled in (and not some other value) when CMENDTC is missing. Don't make us try to figure out what you want from an example — show us the example and TELL US the logic to get what you want.
Sorry, I missed this in the initial post and hence edited the post. But, I am still adding more details. Here, for 109 has the 25SEP2011 as it is the latest date value for that subject.
Thank you. Try this:
data have1;
set have;
date=input(cmendtc,yymmdd10.);
run;
proc summary data=have1 nway;
class subject;
var endt;
output out=maxx max=max_endt;
run;
data want;
merge have maxx;
by subject;
if missing(endt) then endt1=max_endt;
else endt1=endt;
format endt1 date9.;
run;
Thank you, but getting the following error:
proc summary data=have nway;
628 class subject;
629 var endt;
ERROR: Variable ENDT in list does not match type prescribed for this list.
630 output out=maxx max=max_endt;
631 run;
Actually, as date variable from have1 dataset is not being used anywhere and getting error as cmendtc has values of the form yyyy and yyyy-mm which requires imputation, hence I have modified the code as follows and ran but the getting the above error.
/*data have1;
set have;
date=input(cmendtc,yymmdd10.);
end;
run;*/
/*proc summary data=have1 nway;*/
proc summary data=have nway;
class subject;
var endt;
output out=maxx max=max_endt;
run;
data want;
/* merge have maxx;*/
set maxx;
by subject;
if missing(endt) then endt1=max_endt;
else endt1=endt;
format endt1 date9.;
run;
proc print data=want;run;
Can you please help.
Something is wrong here. You said that ENDDT was a DATE variable. So the PROC SUMMARY should work fine.
Did you somehow pass it a dataset where ENDDT was instead a character variable?
data need (keep=subject max_enddt) / view=need;
do until (last.subject);
set have (where=(enddt^=.));
by subject;
max_enddt=max(max_enddt,enddt);
end;
run;
data want (drop=max_enddt);
merge have need;
by subject;
cmenddt=coalesce(cmenddt,max_enddt);
run;
It's programmed as two steps, but because the first step is a data set view it's only activated when the view (named NEED) is called for in the second step. Reduces disk activity.
Thank you, but getting error. Since, have dataset has ENDT variable, I have modified the first part of the code as
follows and ran:
data need (keep=subject max_enddt) / view=need;
do until (last.subject);
/* set have (where=(enddt^=.));*/
set have (where=(endt^=.));
by subject;
/* max_enddt=max(max_enddt,enddt);*/
max_enddt=max(max_enddt,endt);
end;
run;
But, getting the following error:
1156 data need (keep=subject max_enddt) / view=need;
1157 do until (last.subject);
1158 /* set have (where=(enddt^=.));*/
1159 set have (where=(endt^=.));
ERROR: WHERE clause operator requires compatible variables.
1160 by subject;
1161 /* max_enddt=max(max_enddt,enddt);*/
1162 max_enddt=max(max_enddt,endt);
1163 end;
1164 run;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
Please, help
Again, that error means that ENDT is NOT a date variable. SAS stores date variables as NUMERIC. But the error message is saying that the where clause is mixing types. Since . is the NUMERIC missing value that means that ENDT is in fact a CHARACTER variable and not a DATE at all.
If you have ENDT as character in the style you showed then you can make a new NUMERIC variable by using the INPUT() function with the DATE informat. Make sure to attach the DATE format to the new variable so it prints the dates in a style that humans can recognize.
Thank you. I have corrected the ENDT variable from character to numeric format:-
data have;
infile datalines truncover;
input SUBJECT : $3. CMENDTC : $20. ENDT : anydtdte.;
format ENDT date9.;
datalines;
101 2011-09-08 08SEP2011
101 2011-09-08 08SEP2011
101 2011-09-08 08SEP2011
102 2011-11-21 21NOV2011
103 . .
103 . .
103 . .
103 2011-10-29 29OCT2011
105 . .
106 2012-01-09 29OCT2012
107 2011-09-23 23SEP2011
107 2011-09-25 25SEP2011
107 2011-10-15 15OCT2011
107 2011-09-26 26SEP2011
108 2011-10-15 15OCT2011
108 2011-11-14 14NOV2011
109 . .
109 2011-09-20 20SEP2011
109 2011-09-25 25SEP2011
109 2011-09-13 13SEP2011
109 . .
109 2011-09-25 25SEP2011
109 2011-09-25 25SEP2011
109 . .
110 . .
110 2011-10-12 12OCT2011
110 2011-10-23 23OCT2011
111 2011-10-21 21OCT2011
112 . .
113 . .
113 . .
114 2011-10-05 05OCT2011
114 2011-10-02 02OCT2011
114 2011-10-02 02OCT2011
115 2011-08 31AUG2011
115 2011-06 30JUN2011
115 . 31DEC2011
115 2011 31DEC2011
115 . 31DEC2011
;
run;
But, the output is not as expected except for the subjects 112, 113 and 115: Attaching the screenshots:
Please, help.
There was an error in my PROC SUMMARY. This is the correct code
proc summary data=have1 nway;
class subject;
var date;
output out=maxx max=max_endt;
run;
Thank you very very much PaigeMiller. I have tried with this new proc summary and got the required output. Once again thank you very much and I thank Tom and mkeintz and entire community for extending their helping hands. This SAS community is a wonderful community.
Yes, Tom, by mistake I was passing ENDT as character variable. Now, with the following input dataset, I ran the code given by PaigeMiller but output is not as required.
Input dataset:
data have;
infile datalines truncover;
input SUBJECT : $3. CMENDTC : $20. ENDT : anydtdte.;
format ENDT date9.;
datalines;
101 2011-09-08 08SEP2011
101 2011-09-08 08SEP2011
101 2011-09-08 08SEP2011
102 2011-11-21 21NOV2011
103 . .
103 . .
103 . .
103 2011-10-29 29OCT2011
105 . .
106 2012-01-09 29OCT2012
107 2011-09-23 23SEP2011
107 2011-09-25 25SEP2011
107 2011-10-15 15OCT2011
107 2011-09-26 26SEP2011
108 2011-10-15 15OCT2011
108 2011-11-14 14NOV2011
109 . .
109 2011-09-20 20SEP2011
109 2011-09-25 25SEP2011
109 2011-09-13 13SEP2011
109 . .
109 2011-09-25 25SEP2011
109 2011-09-25 25SEP2011
109 . .
110 . .
110 2011-10-12 12OCT2011
110 2011-10-23 23OCT2011
111 2011-10-21 21OCT2011
112 . .
113 . .
113 . .
114 2011-10-05 05OCT2011
114 2011-10-02 02OCT2011
114 2011-10-02 02OCT2011
115 2011-08 31AUG2011
115 2011-06 30JUN2011
115 . 31DEC2011
115 2011 31DEC2011
115 . 31DEC2011
;
run;
data have1;
set have;
date=input(cmendtc,yymmdd10.);
end;
run;
proc summary data=have1 nway;
class subject;
var endt;
output out=maxx max=max_endt;
run;
data want;
merge have maxx;
by subject;
if missing(endt) then endt1=max_endt;
else endt1=endt;
format endt1 date9.;
run;
proc print data=want;run;
Output screenshots:
Please, help
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.