Happy New Year,
I would like to perform the following using sas code, not proc sql. I know using the max function for two variables and a group by statement would work but I'm looking for another solution. Thank You in advance:
data have;
infile cards dsd;
informat date mmddyy10.;
format date date9.;
input acct $ date default $;
cards;
00001,01/01/2000,pd npl
00001,01/01/2000,pd
00001,01/01/2001,pd
00002,02/01/2000,pd npl
00002,02/01/2000,pd
00002,02/01/2010,pd
;
run;
data want;
infile cards dsd;
informat date mmddyy10.;
format date date9.;
input acct $ date default $;
cards;
00001,01/01/2001,pd npl
00002,02/01/2010,pd npl
;
run;
The answer from @stat@sas isn't a true max, but it takes the first record. Simply switching the first two records in your sample data will break the logic.
If you're looking for the true max from each field you can do two sorts/passes of the data or use first/last and manually calculate one. See an example below, of breaking the solution as well as an alternative. If your data was pre-sorted by acct already you could manually calculate each max similarly.
data have;
infile cards dsd;
informat date mmddyy10.;
format date date9.;
input acct $ date default $;
cards;
00001,01/01/2000,pd
00001,01/01/2000,pd npl
00001,01/01/2001,pd
00002,02/01/2000,pd npl
00002,02/01/2000,pd
00002,02/01/2010,pd
;
run;
proc sort data=have;
by acct date;
run;
data want(drop=default rename=default1=default);
set have;
by acct date default notsorted;
retain default1;
if first.acct then default1=default;
if last.acct;
run;
*Manually calculating max assuming a sorted dataset, by acct date;
data want2;
set have;
by acct date;
retain max_default;
if first.acct then max_default=default;
else if default>max_default then max_default=default;
if last.acct;
drop default;
rename max_default=default;
run;
*Manually calculating max, assuming sorted only by acct;
*You'll need to do the rename/dropping of variables;
data want3;
set have;
by acct notsorted;
retain max_date max_default;
if first.acct then do;
max_default=default;
max_date=date;
end;
else do;
if default>max_default then max_default=default;
if date>max_date then max_date=date;
end;
if last.acct;
run;
Gave you looked at
PROC MEANS
I have not, I am not familiar with getting this result using proc means.
I don't think proc means will work because of the text variables.
Whats the logic, max of each field?
I want the most recent date for the date field and the longest length for the default field. I believe the sql max() performs the same way.
Thanks,
proc sort data=have;
by acct date;
run;
data want(drop=default rename=default1=default);
set have;
by acct date default notsorted;
retain default1;
if first.acct then default1=default;
if last.acct;
run;
stat@sas, very nice. Thank you very much. If you wouldn't mind can you explain why this works the way that it does?
Thank You,
Mark
Hi Mark,
As dataset is sorted by acct and date. Last date within each value of acct variable will be the maximum one. Introduced a new variable default1 assigned maximum value of default variable at the first occurrence of each value of acct and retained until the last occurrence. Finally, if last condition outputs the last occurrence of each acct value along with maximum date and maximum value of default variable.
Hoping to see Steelers in super bowl this year.
Regards,
Naeem
haha, me too!
I am having one issue. The default field is null in some cases. Where this happens the result is always null.
The answer from @stat@sas isn't a true max, but it takes the first record. Simply switching the first two records in your sample data will break the logic.
If you're looking for the true max from each field you can do two sorts/passes of the data or use first/last and manually calculate one. See an example below, of breaking the solution as well as an alternative. If your data was pre-sorted by acct already you could manually calculate each max similarly.
data have;
infile cards dsd;
informat date mmddyy10.;
format date date9.;
input acct $ date default $;
cards;
00001,01/01/2000,pd
00001,01/01/2000,pd npl
00001,01/01/2001,pd
00002,02/01/2000,pd npl
00002,02/01/2000,pd
00002,02/01/2010,pd
;
run;
proc sort data=have;
by acct date;
run;
data want(drop=default rename=default1=default);
set have;
by acct date default notsorted;
retain default1;
if first.acct then default1=default;
if last.acct;
run;
*Manually calculating max assuming a sorted dataset, by acct date;
data want2;
set have;
by acct date;
retain max_default;
if first.acct then max_default=default;
else if default>max_default then max_default=default;
if last.acct;
drop default;
rename max_default=default;
run;
*Manually calculating max, assuming sorted only by acct;
*You'll need to do the rename/dropping of variables;
data want3;
set have;
by acct notsorted;
retain max_date max_default;
if first.acct then do;
max_default=default;
max_date=date;
end;
else do;
if default>max_default then max_default=default;
if date>max_date then max_date=date;
end;
if last.acct;
run;
Very nice, thank you.
As suggested solution was based on the given dataset and problem description. Reeza has raised on interesting point but not sure whether this is true for all of the text comparisons. A small modification in the syntax will fix the problem raised by Reeza as well as if the default field contains null values.
Thanks,
Naeem
proc sort data=have;
by acct date;
run;
data want(drop=default rename=default1=default);
set have;
by acct date default notsorted;
length default1 $8.;
retain default1;
if first.acct then default1='';
if default='pd npl' then default1=default;
if last.acct;
run;
@Naeem your solution is too specific to the sample data, not to the actual programming problem. What would happen if the text had an upper cased letter, extra leading or trailing space? And assume this is a sample of the problem that is meant to illustrate the problem, not the exact data/problem the OP is presenting.
The problem seems to be:
I know using the max function for two variables and a group by statement would work but I'm looking for another solution.
Or restated:
How to find the max of two variables within a group NOT using proc sql.
Hope that makes sense!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.