Hi, I've got a dataset (called 'RC.DATA_CHECK') that has two columns: Quarter and Rate. Please can someone provide me with the required code that would only keep one version of the Quarter value, based on the highest value of Rate? Note that 2021Q1 has three records, two of which are the same, but I would just like to keep one of these rows for this quarter. I have included a screenshot below of what I would like my dataset to look like after the dataset has been de-duplicated.
Find the maximum value of rate, by quarter:
proc summary data=have nway;
class quarter;
var rate;
output out=want max=;
run;
Thanks for your reply. I've just realised that I've got another dataset that has a similar problem to what I posted, only that it has about an extra 100 columns along with the two (quarter and rate). If I still wanted to do the same thing (keep one record of quarter based on the highest value of rate, how would the code change (i.e. I want to have all of the c.100 columns in my dataset, with the only change being that only one unique quarter (based on highest rate) is shown with all of the information from the other c.100 columns for this row)?
Hi, can someone please tell me how the code would change if I wanted the same criteria (keep only one quarter based on the highest value of rate), but there are 100 variables/columns in the dataset (rather than just the two)?
Sort the data and use FIRST. processing.
proc sort data=have out=sorted;
by quarter descending rate;
run;
data want;
set sorted;
by quarter;
if first.quarter;
run;
If there are variables you can use to help decide which one of the ties to pick add those to the BY statement in the proc sort step.
Your data appear to be sorted by quarter. If so, then:
data want (drop=max_rate);
do until (last.quarter);
set have;
by quarter;
max_rate=max(max_rate,rate);
end;
do until (last.quarter);
set have;
by quarter;
if rate=max_rate then output;
end;
run;
As long as no quarter has multiple observations with the maximum rate, this will produce what you want.
It works by reading all obs for each quarter twice. The first time to determine the maximum rate. The second time to re-read and output the obs with that maximum.
Thanks for your post. My example had 2021Q1 with two observations with the maximum rate (see original post), so does that impact on the code that you've just posted?
@Justin9 wrote:
Thanks for your post. My example had 2021Q1 with two observations with the maximum rate (see original post), so does that impact on the code that you've just posted?
Then which record with the tied rates do you want?
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.