BookmarkSubscribeRSS Feed
Kurt_Bremser
Super User

@Pandu2 wrote:
This code shouldn't have if last.num cuz I need only 1st appearance of the id, roll, num. And also a small change in the 2nd case where I said if the 1st case fails, which means if the 1st appearance of Id, roll, num doesn't meet the condition DATE1 >= DATE2 then the increase in the amount field should be checked with prior and next value and if there's increase instead of taking max amount we must take the immediate increase row values.
Required Output:
01FEB2022 30JAN2022 1411 69 115 2
Previously it takes the row values which has amount value i e "3". Instead it should take immediate increase in amount field i.e "2".

Why do you want the observation with the amount of 2? The maximum amount within the group 1411/69/115 is 3, in the last observation.

If you do not want the maximum, but the first increase, you should have said so.

 

Quote from an earlier post of yours:

when there's an increase in the amount variable values then the row which has max amount value should be picked

(emphasis by me)

Pandu2
Obsidian | Level 7
I strongly apologize for saying so, please pardon me. I need the first increase and also I'm not sure why your code isn't taking the first appearance of Id, roll, num. FYI, the data which I provided in the datalines is very huge and it is sorted by 5 variables naming id, roll, num, DATE1 , product and I've said this earlier. So according to your code when I use only id, roll, num in the by statement it gave me an error saying by variables are not sorted bcuz the by variables which I used to sort in proc sort isn't the same count of by variables in your code. Could you please help me on this and getting only the first appearance of Id, roll, num. Your code is picking only records which meets the condition DATE1 >= DATE2 but alongwith that condition it should pick the 1st appearance. Thankyou. I apologise for not giving the right info cuz I was also one of you who don't the exact things to do. Now got the info I'm just passing it. Thankyou.
Kurt_Bremser
Super User

Picking up the first increase is in fact simpler, so this does it with data previously posted:

data WORK.have;
input DATE1 :DATE9. DATE2 :DATE9. ID:4. ROLL:2. NUM:3. AMOUNT:2.;
format date1 date2 yymmdd10.;
datalines;
01JAN2022 20APR2022 1411 69 115 1
01FEB2022 30JAN2022 1411 69 115 2
01FEB2022 30JAN2022 1411 69 115 3
03MAR2022 21FEB2022 1412 70 116 1
04MAR2022 29JAN2022 1412 70 116 2
;

data want;
set have;
by id roll num;
retain _flag _amount;
if first.num
then do;
  _flag = 1;
  _amount = .;
  if date1 >= date2
  then do;
    _flag = 0;
    output;
  end;
end;
if not first.num and amount > _amount and date1 >= date2 and _flag
then do;
  _flag = 0;
  output;
end;
if date1 >= date2 then _amount = amount;
drop _:;
run;

proc print data=want noobs;
run;

Result:

DATE1	DATE2	ID	ROLL	NUM	AMOUNT
2022-02-01	2022-01-30	1411	69	115	2
2022-03-03	2022-02-21	1412	70	116	1
Pandu2
Obsidian | Level 7
The reason behind your code isn't picking the 1st appearance of Id, roll, num is bcuz of
03MAR2022 21FEB2022 1412 70 116 1
04MAR2022 29JAN2022 1412 70 116 2
As you see the above two rows, the id, roll, num is same for these two rows but when you see DATE1 values i.e 03MAR2022 and 04MAR2022 they're not same so, that's why it isn't picking the 1st appearance.like I told you before based on id, roll, num everything is interlinked with eachother. So can we resolve this and get only the 1st appearance of Id, roll, num though for every id, roll, num has different dates in DATE1 variable. In the end, the DATE1 >= DATE2.
Thankyou.
Kurt_Bremser
Super User

Please describe CLEARLY what you mean by "interlinked".

Also rephrase this sentence:

So can we resolve this and get only the 1st appearance of Id, roll, num though for every id, roll, num has different dates in DATE1 variable.

Right now this is not English, and it makes no sense at all.

 

Do you mean that the dates also play a role in forming groups?

 

Pandu2
Obsidian | Level 7
Interlinked in the sense, there's a relation between those 5 records and yes dates do play major role. For every id, roll, num there's a relation with DATE1, AMOUNT, PRODUCT. If you're familiar with DBMS, then you should know about interlinked with eachother or having relation with eachother.if you hadn't understood my english, you would have understood my example data. Language is just a barrier when we have a clear understanding of what's been going rather than what's been said. If you're willing to help, please don't criticize others. Thankyou.
Kurt_Bremser
Super User

I perfectly understood your example data, and my first code solved this:

(quote from the post where you included usable example data)

01JAN2022 20APR2022 1411 69 115 1
01FEB2022 30JAN2022 1411 69 115 2
01FEB2022 30JAN2022 1411 69 115 3
03MAR2022 21FEB2022 1412 70 116 1
04MAR2022 29JAN2022 1412 70 116 2

In the given example above, the DATE1 value is not >= DATE2 for the 1st appearance of Id, roll, num so, it fails then for the next two values the amount is increased i.e 1,2,3 so, for this case the max amount value which is 3 that row should be picked and also the base condition is met 01FEB2022 >= 30JAN2022.

(once again, emphasis by me)

Then you changed your mind and wanted the first increase, which is what my second code solves.

If that does not work for you, post example data that includes all the current cases (for regression testing) and other cases that need to be dealt with, and which observations need to be kept from that.

 

Pandu2
Obsidian | Level 7
Like I said, I sincerely apologise for changing my mind but, after doing some testing I came up with another thought which suits the blueprint of the output dataset. Please do pick the rows which has first increase of amount. I'm again re-typing the things which I require from this below data.
01JAN2022 20APR2022 1411 69 115 1
01FEB2022 30JAN2022 1411 69 115 2
01FEB2022 30JAN2022 1411 69 115 3
03MAR2022 21FEB2022 1412 70 116 1
04MAR2022 29JAN2022 1412 70 116 2
1st rule:
The 1st appearance of Id, roll, num should be picked only after it meets the condition DATE1 >= DATE2. In this id, roll, num are primary keys and DATE1, DATE2, AMOUNT, PRODUCT has relation with Id,roll, num.
Explanation for 1st rule:
In the above given data, the 1st appearance of Id, roll, num doesn't meet the condition DATE1>= DATE2. This is where 2nd rules comes in :
If the 1st appearance of Id, roll, num (1411 69 115)doesn't meet the condition DATE1 >= DATE2, then when there's a increase in amount value compared to previous value then the first increase of amount row should be picked and that should also match DATE1 >= DATE2.
Continuing 1st rule : For the next id, roll, num i.e 1412 70 116 this should be picked cuz this is the 1st appearance of Id, roll, num and it meets the condition DATE1>= DATE2. But when I use your code it didn't pick the record which has date 03MAR2022 instead it picked both 03MAR2022 and 04MAR2022.
2nd rule explanation:
for the 1st id, roll, num (1411 69 115) for this it should pick the first increase in amount. I.e 01FEB2022 30JAN2022 1411 69 115 2 instead of 01FEB2022 30JAN2022 1411 69 115 3. I hope this makes more sense.
Pandu2
Obsidian | Level 7
data WORK.have;
input DATE1 :DATE9. DATE2 :DATE9. ID:4. ROLL:2. NUM:3. PRODUCT :$5. AMOUNT:2.;
format date1 date2 yymmdd10.;
datalines;
01JAN2022 20APR2022 1411 69 115 1 ABC
01FEB2022 30JAN2022 1411 69 115 2 ABC
01FEB2022 30JAN2022 1411 69 115 3 ABC
03MAR2022 21FEB2022 1412 70 116 1 DEF
04MAR2022 29JAN2022 1412 70 116 2 DEF
;
It is still giving me the way it was given before. Still giving values only based on condition DATE1 >= DATE2 instead of both this condition and 1st appearance of Id,roll,num.I sorted the whole data based on DATE1,ID,ROLL,NUM,PRODUCT.
Kurt_Bremser
Super User

Why do you sort by DATE1 first, when your groups are built on id, roll, num?

With the data given, it does not change the order; if the date sort is there for a given reason, you should also inlcude observations in your example data which illustrate the reason.

Which observations out of this given dataset should make it into the output?

Pandu2
Obsidian | Level 7
The data which I provided to you is a sorted one. And I used DATE1 in the proc sort to get the dates in a sorted order ascending to make the job easier for both the RULES.
The unsorted data :
data WORK.have;
input DATE1 :DATE9. DATE2 :DATE9. ID:4. ROLL:2. NUM:3. PRODUCT :$5. AMOUNT:2.;
format date1 date2 yymmdd10.;
datalines;
04MAR2022 29JAN2022 1412 70 116 2 GHI
01FEB2022 30JAN2022 1411 69 115 2 ABC
01FEB2022 30JAN2022 1411 69 115 3 DEF
03MAR2022 21FEB2022 1412 70 116 1 JKL
01JAN2022 20APR2022 1411 69 115 1 ABC
;
Kurt_Bremser
Super User

Both of your rules are about groups built on id, roll, num. In your rules (as stated up to now), the dates play no role (I take it you may want the records in date order for each group, so you can add date1 at the end of the BY statement in PROC SORT; in the data step it is not needed).

If you expect a different result out of the code I gave you, show which observations need to go into the output.

 

This is my final post; until you show the expected result, I will not answer any more questions.

 

Pandu2
Obsidian | Level 7
Please do read this, if this doesn't help, you can leave as-is without answering. Thankyou.

Like I said, I sincerely apologise for changing my mind but, after doing some testing I came up with another thought which suits the blueprint of the output dataset. Please do pick the rows which has first increase of amount. I'm again re-typing the things which I require from this below data.
01JAN2022 20APR2022 1411 69 115 1
01FEB2022 30JAN2022 1411 69 115 2
01FEB2022 30JAN2022 1411 69 115 3
03MAR2022 21FEB2022 1412 70 116 1
04MAR2022 29JAN2022 1412 70 116 2
1st rule:
The 1st appearance of Id, roll, num should be picked only after it meets the condition DATE1 >= DATE2. In this id, roll, num are primary keys and DATE1, DATE2, AMOUNT, PRODUCT has relation with Id,roll, num.
Explanation for 1st rule:
In the above given data, the 1st appearance of Id, roll, num doesn't meet the condition DATE1>= DATE2. This is where 2nd rules comes in :
If the 1st appearance of Id, roll, num (1411 69 115)doesn't meet the condition DATE1 >= DATE2, then when there's a increase in amount value compared to previous value then the first increase of amount row should be picked and that should also match DATE1 >= DATE2.
Continuing 1st rule : For the next id, roll, num i.e 1412 70 116 this should be picked cuz this is the 1st appearance of Id, roll, num and it meets the condition DATE1>= DATE2. But when I use your code it didn't pick the record which has date 03MAR2022 instead it picked both 03MAR2022 and 04MAR2022.
2nd rule explanation:
for the 1st id, roll, num (1411 69 115) for this it should pick the first increase in amount. I.e 01FEB2022 30JAN2022 1411 69 115 2 instead of 01FEB2022 30JAN2022 1411 69 115 3. I hope this makes more sense.
Tom
Super User Tom
Super User

Please learn to use the Insert Code and Insert SAS Code buttons on the forum editor so you can make your posts more readable. Show you data as data steps! It is EASIER than just typing in the data and then having to try and explain what it is in words.

So you are starting with this data:

data have;
  input date1 :date. date2 :date. id roll num amount ;
  format date1 date2 date9.;
cards;
01JAN2022 20APR2022 1411 69 115 1
01FEB2022 30JAN2022 1411 69 115 2
01FEB2022 30JAN2022 1411 69 115 3
03MAR2022 21FEB2022 1412 70 116 1
04MAR2022 29JAN2022 1412 70 116 2
;

You mention three concepts you need to help you analyze this data.

1) You want to group by ID ROLL NUM.

2) You want to check if DATE1>=DATE2.

3) You want to check for INCREASE in AMOUNT.

So here is code to create flags for those things from this data.

data want;
  set have;
  by id roll num ;
  group+first.num;
  date_flag = date1>=date2;
  increase = amount > lag(amount);
  if first.num then increase=0;
run;

The result is this data:

                                                       date_
    Obs     date1     date2  id  roll num amount group  flag increase

     1  01JAN2022 20APR2022 1411  69  115    1     1     0       0
     2  01FEB2022 30JAN2022 1411  69  115    2     1     1       1
     3  01FEB2022 30JAN2022 1411  69  115    3     1     1       1
     4  03MAR2022 21FEB2022 1412  70  116    1     2     1       0
     5  04MAR2022 29JAN2022 1412  70  116    2     2     1       1

So assuming you want to select ONE observation per GROUP which two observations do you want to select?

Why?

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 67 replies
  • 2418 views
  • 1 like
  • 6 in conversation