Hello,
I have the following data table:
ID | Measure | Number | Date | Flag | Output |
1 | A | 5 | 11/1/2014 | 1 | 3 |
1 | A | 1 | 12/1/2014 | 1 | 3 |
1 | A | 3 | 1/1/2015 | 1 | 3 |
1 | A | 5 | 10/1/2015 | 1 | 5 |
1 | A | 6 | 11/1/2015 | 1 | 5 |
1 | A | 8 | 12/1/2015 | 1 | 5 |
1 | A | 3 | 1/1/2016 | 1 | 5 |
1 | A | 6 | 2/1/2016 | 1 | 5 |
1 | A | 1 | 8/1/2017 | 1 | 6 |
1 | A | 6 | 9/1/2017 | 1 | 6 |
1 | A | 10 | 10/1/2017 | 1 | 6 |
1 | A | 2 | 11/1/2017 | 1 | 6 |
1 | A | 14 | 12/1/2017 | 1 | 6 |
1 | A | 5 | 1/1/2018 | 1 | 6 |
I am trying to select the output variable with the greater output number, but earlier time period. For example, the output '5' is greater than '3', but the date is earlier than the output '6', so I would select that. However, I'm not sure how to write this coding. Eventually, I want to be able to select the following data only from the above table.
ID | Measure | Number | Date | Flag | Output |
1 | A | 5 | 10/1/2015 | 1 | 5 |
1 | A | 6 | 11/1/2015 | 1 | 5 |
1 | A | 8 | 12/1/2015 | 1 | 5 |
1 | A | 3 | 1/1/2016 | 1 | 5 |
1 | A | 6 | 2/1/2016 | 1 | 5 |
Thank you for any help in advance.
The only way I can interpret your request is: "Find the OUTPUT group which is not the smallest OUTPUT number and not the latest mean (or min, or max, etc) date." A solution would then be:
proc sql;
select id, measure, number, date, flag, output
from (select *, mean(date) as meanDate from have group by id, measure, output)
group by id, measure
having output > min(output) and meanDate < max(meanDate);
quit;
What role does the variable named NUMBER play in this process?
Why are none of the OUTPUT values of 6 selected? You seem to have an unstated rule other than larger value of output and date sequence.
I can get you desired data from that example with
data want;
set have;
where output=5;
run;
but that isn't exercising any rules. Perhaps you could provide example start and end data sets where the OUTPUT variable is a different value. Best is to provide examples as data step so we can be sure we are using data similar to yours. Paste it into a code box opened with the forum's {I} such as:
data have; input ID Measure $ Number Date :mmddyy10. Flag Output; format date mmddyy10.; datalines; 1 A 5 11/1/2014 1 3 1 A 1 12/1/2014 1 3 1 A 3 1/1/2015 1 3 1 A 5 10/1/2015 1 5 1 A 6 11/1/2015 1 5 1 A 8 12/1/2015 1 5 1 A 3 1/1/2016 1 5 1 A 6 2/1/2016 1 5 1 A 1 8/1/2017 1 6 1 A 6 9/1/2017 1 6 1 A 10 10/1/2017 1 6 1 A 2 11/1/2017 1 6 1 A 14 12/1/2017 1 6 1 A 5 1/1/2018 1 6 run;
By the way I am guessing that the first number in the date is the month.
The only way I can interpret your request is: "Find the OUTPUT group which is not the smallest OUTPUT number and not the latest mean (or min, or max, etc) date." A solution would then be:
proc sql;
select id, measure, number, date, flag, output
from (select *, mean(date) as meanDate from have group by id, measure, output)
group by id, measure
having output > min(output) and meanDate < max(meanDate);
quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.