@Pandu2 wrote:
This code shouldn't have if last.num cuz I need only 1st appearance of the id, roll, num. And also a small change in the 2nd case where I said if the 1st case fails, which means if the 1st appearance of Id, roll, num doesn't meet the condition DATE1 >= DATE2 then the increase in the amount field should be checked with prior and next value and if there's increase instead of taking max amount we must take the immediate increase row values.
Required Output:
01FEB2022 30JAN2022 1411 69 115 2
Previously it takes the row values which has amount value i e "3". Instead it should take immediate increase in amount field i.e "2".
Why do you want the observation with the amount of 2? The maximum amount within the group 1411/69/115 is 3, in the last observation.
If you do not want the maximum, but the first increase, you should have said so.
Quote from an earlier post of yours:
when there's an increase in the amount variable values then the row which has max amount value should be picked
(emphasis by me)
Picking up the first increase is in fact simpler, so this does it with data previously posted:
data WORK.have;
input DATE1 :DATE9. DATE2 :DATE9. ID:4. ROLL:2. NUM:3. AMOUNT:2.;
format date1 date2 yymmdd10.;
datalines;
01JAN2022 20APR2022 1411 69 115 1
01FEB2022 30JAN2022 1411 69 115 2
01FEB2022 30JAN2022 1411 69 115 3
03MAR2022 21FEB2022 1412 70 116 1
04MAR2022 29JAN2022 1412 70 116 2
;
data want;
set have;
by id roll num;
retain _flag _amount;
if first.num
then do;
_flag = 1;
_amount = .;
if date1 >= date2
then do;
_flag = 0;
output;
end;
end;
if not first.num and amount > _amount and date1 >= date2 and _flag
then do;
_flag = 0;
output;
end;
if date1 >= date2 then _amount = amount;
drop _:;
run;
proc print data=want noobs;
run;
Result:
DATE1 DATE2 ID ROLL NUM AMOUNT 2022-02-01 2022-01-30 1411 69 115 2 2022-03-03 2022-02-21 1412 70 116 1
Please describe CLEARLY what you mean by "interlinked".
Also rephrase this sentence:
So can we resolve this and get only the 1st appearance of Id, roll, num though for every id, roll, num has different dates in DATE1 variable.
Right now this is not English, and it makes no sense at all.
Do you mean that the dates also play a role in forming groups?
I perfectly understood your example data, and my first code solved this:
(quote from the post where you included usable example data)
01JAN2022 20APR2022 1411 69 115 1
01FEB2022 30JAN2022 1411 69 115 2
01FEB2022 30JAN2022 1411 69 115 3
03MAR2022 21FEB2022 1412 70 116 1
04MAR2022 29JAN2022 1412 70 116 2
In the given example above, the DATE1 value is not >= DATE2 for the 1st appearance of Id, roll, num so, it fails then for the next two values the amount is increased i.e 1,2,3 so, for this case the max amount value which is 3 that row should be picked and also the base condition is met 01FEB2022 >= 30JAN2022.
(once again, emphasis by me)
Then you changed your mind and wanted the first increase, which is what my second code solves.
If that does not work for you, post example data that includes all the current cases (for regression testing) and other cases that need to be dealt with, and which observations need to be kept from that.
Review the last code I posted, and see if it works for you.
Why do you sort by DATE1 first, when your groups are built on id, roll, num?
With the data given, it does not change the order; if the date sort is there for a given reason, you should also inlcude observations in your example data which illustrate the reason.
Which observations out of this given dataset should make it into the output?
Both of your rules are about groups built on id, roll, num. In your rules (as stated up to now), the dates play no role (I take it you may want the records in date order for each group, so you can add date1 at the end of the BY statement in PROC SORT; in the data step it is not needed).
If you expect a different result out of the code I gave you, show which observations need to go into the output.
This is my final post; until you show the expected result, I will not answer any more questions.
Please learn to use the Insert Code and Insert SAS Code buttons on the forum editor so you can make your posts more readable. Show you data as data steps! It is EASIER than just typing in the data and then having to try and explain what it is in words.
So you are starting with this data:
data have;
input date1 :date. date2 :date. id roll num amount ;
format date1 date2 date9.;
cards;
01JAN2022 20APR2022 1411 69 115 1
01FEB2022 30JAN2022 1411 69 115 2
01FEB2022 30JAN2022 1411 69 115 3
03MAR2022 21FEB2022 1412 70 116 1
04MAR2022 29JAN2022 1412 70 116 2
;
You mention three concepts you need to help you analyze this data.
1) You want to group by ID ROLL NUM.
2) You want to check if DATE1>=DATE2.
3) You want to check for INCREASE in AMOUNT.
So here is code to create flags for those things from this data.
data want;
set have;
by id roll num ;
group+first.num;
date_flag = date1>=date2;
increase = amount > lag(amount);
if first.num then increase=0;
run;
The result is this data:
date_ Obs date1 date2 id roll num amount group flag increase 1 01JAN2022 20APR2022 1411 69 115 1 1 0 0 2 01FEB2022 30JAN2022 1411 69 115 2 1 1 1 3 01FEB2022 30JAN2022 1411 69 115 3 1 1 1 4 03MAR2022 21FEB2022 1412 70 116 1 2 1 0 5 04MAR2022 29JAN2022 1412 70 116 2 2 1 1
So assuming you want to select ONE observation per GROUP which two observations do you want to select?
Why?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.