BookmarkSubscribeRSS Feed
Pandu2
Obsidian | Level 7
Hii Thanks for responding. I'm unable to find that option, that's why I was typing it as text. Would you mind walking me through that
Tom
Super User Tom
Super User

The formatting options are right above the text box where you enter your message.

Tom_0-1652279783863.png

The Insert Code icon looks like </> and the Insert SAS Code icon is just to its right.  They both will pop-up a window where you can type/paste your text/code.  If you need to edit the content make sure to place the cursor inside and hit the icon again so that you can edit in the pop-up window.  If you insert or delete line breaks without opening the pop-up window the formatting will get messed up.

Pandu2
Obsidian | Level 7
data have;
  input date1 :date. date2 :date. id:4. roll:2.  num: 3. amount:2. Product :$5.;
  format date1 date2 date9.;
cards;
01FEB2022 30JAN2022 1411 69 115 2 DEF
03MAR2022 21FEB2022 1412 70 116 1 JKL
01FEB2022 30JAN2022 1411 69 115 3 GHI
01JAN2022 20APR2022 1411 69 115 1 ABC 
04MAR2022 29JAN2022 1412 70 116 2  JKL
06MAR2022 06MAR2022 1412 70 117 3 XYZ
05JAN2022  03MAR2022 1413 91 213 1 WOP
07MAY2022 09FEB2022 1413 91 213  2 WXY
;
Required Output:

 
01FEB2022 30JAN2022 1411 69 115 2 DEF
03MAR2022 21FEB2022 1412 70 116 1 JKL
06MAR2022 06MAR2022 1412 70 117 3 XYZ
07MAY2022 09FEB2022 1413 91 213  2 WXY

Firstly this data is an unsorted one so, it must be sorted by id,roll,num, product,DATE1 variables prior to the two cases.


1st rule: The 1st appearance of Id, roll, num should be picked only after it meets the condition DATE1 >= DATE2. In this id, roll, num are primary keys and DATE1, DATE2, AMOUNT, PRODUCT has relation with Id,roll, num.

Explanation for 1st rule: In the above given data, If the 1st appearance of Id, roll, num doesn't meet the condition DATE1>= DATE2.

 

This is where 2nd rules comes in : If the 1st appearance of Id, roll, num doesn't meet the condition DATE1 >= DATE2, then when there's a increase in amount value compared to previous value then the first increase of amount row should be picked and that should also match DATE1 >= DATE2..

 

2nd rule explanation: for the 1st id, roll, num it should pick the first increase in amount.

Tom
Super User Tom
Super User

Why did you post the data not sorted?  It is trivial to sort 8 observations by hand.

Why make it more confusing by posting data with the variables not ordered so the primary sort keys are first?

 

One easy way to show in your example data which observations you want to select by adding a flag variable to your example data.  Then it is clearer which observations you want to find.

data have;
  length id roll num 8 product $5. date1 date2 amount want 8;
  input id -- want;
  informat date1 date2 date9.;
  format date1 date2 date9.;
cards;
1411 69 115 ABC 01JAN2022 20APR2022 1 0
1411 69 115 DEF 01FEB2022 30JAN2022 2 1
1411 69 115 GHI 01FEB2022 30JAN2022 3 0 
1412 70 116 JKL 03MAR2022 21FEB2022 1 1
1412 70 116 JKL 04MAR2022 29JAN2022 2 0
1412 70 117 XYZ 06MAR2022 06MAR2022 3 1
1413 91 213 WOP 05JAN2022 03MAR2022 1 0
1413 91 213 WXY 07MAY2022 09FEB2022 2 1
;

It also makes it easier to see whether the test data covers all contingencies.  For example your current data does not have any cases where the desired observation is NOT one of the first two observations in the group.  Nor any where none of the observations meet either criteria.

 

It really sounds like your rules are pretty simple. 

You will need to retain a flag variable to tell if you have found a case for the second rule.

data want ;
   set have ;
   by id roll num product date1 ;
   retain found;
   if first.num then found=0;
* Rule one ;
   if first.num and (date1 >=date2) then do;
      found=1;
      output;
   end;
* Rule two ;
   if not found and (amount > lag(amount)) and (date1 >=date2) then do; 
      found=1;
      output;
   end;       
   drop found;
run;

Results:

Obs    id    roll   num   product       date1       date2   amount   want

 1    1411    69    115     DEF     01FEB2022   30JAN2022      2       1
 2    1412    70    116     JKL     03MAR2022   21FEB2022      1       1
 3    1412    70    117     XYZ     06MAR2022   06MAR2022      3       1
 4    1413    91    213     WXY     07MAY2022   09FEB2022      2       1

 

Quentin
Super User

@Tom Quick thought (untested):

Maybe rule 2 should be:

  if (first.num=0)  and (not found) and (amount > lag(amount)) and (date1 >=date2) then do; 

to avoid the lag comparison crossing groups?  

 

I can't remember if my adding first.num=0 could potentially short-circuit the evaluation of the expression, causing the lag function accidentally conditional.  Usually when there is any risk of a lag becoming accidentally conditional, I just move the lag to an obviously non-conditional place in the step.

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.
Pandu2
Obsidian | Level 7
This 2nd case is working much better than TOM's. Thankyou very much both of you. Now I got the accurate output which I require. Your generosity overwhelmed me.
Pandu2
Obsidian | Level 7
It because of that's how my data is, previously the data which I sent is a sorted one, which I did this to be everything in sorted order to make the job easier for both the two rules. But I didn't get the required output which I posted that's why I just wanted to post the data as is.
Pandu2
Obsidian | Level 7
FYI, the data which I posted here was just a small piece of massive data. Approximately 90K records. May I know why did you use first.num only, at the back of my mind I thought concatenating id,roll, num as a single value and using that value in first. may works well. Doesn't that concatenated value helpful?. I request you to Please let me know. Thank you very much for providing the code
Tom
Super User Tom
Super User

Sample data is a GOOD thing.  But remember to think about all of the possible things that might break your algorithm when constructing your test data.

 

There is no need pre-process the data to create a single variable to group by multiple variables. You need to check out the meaning of the FIRST. and LAST. flags.   

If you use 

BY A B ;

Then the values are sorted by B within a given value of A.  So the meaning of FIRST.B is that this is the first observation of the group that has the current values of B and A.

When it is first.A then it is by definition also first.B since the value of B that is on the first observation for a given value of A is by definition the first observation for this value of B within this value of A since if it wasn't then it could NOT be the first observation for this value of A.

Kurt_Bremser
Super User

When you have a BY statement like this:

by id roll num;

Any change in id or roll will automatically imply a change in num (even if the actual value in num does not change from the previous row).

Quick example:

data have;
input id roll num;
datalines;
1 1 1
1 1 1
1 1 2
1 2 2
2 2 2
;
run;

data test;
set have;
by id roll num;
first_id = first.id;
first_roll = first.roll;
first_num = first.num;
run;

You will see that, although num does not change in the last 3 observations, first.num will always be true there. That's why testing for first.num is sufficient.

Pandu2
Obsidian | Level 7
Hii Tom, I've a small change in your code. Could you please add product as well as first.product in your code.
I sincerely apologise for coming back to you again. In the 1st rule, the data should be picked on 1st appearance of Id,roll,num as well 1st appearance of product too. Could you please make this small change. Thankyou.
Pandu2
Obsidian | Level 7
cards;
1411 69 115 ABC 01JAN2022 20APR2022 1
1411 69 115 ABC 01FEB2022 30JAN2022 2
1411 69 115 DEF 01FEB2022 30JAN2022 3
1411 69 115 DEF 09MAR2022 11APR2022 1
1411 69 117 DEF 24FEB2022 24FEB2022 2
1412 70 116 JKL 03JUN2022 21FEB2022 1
1412 70 116 JKL 04MAR2022 29JAN2022 2
1412 70 117 XYZ 01MAR2022 06MAR2022 3
1412 70 117 XYZ 03MAY2022 01JAN2022 4
1413 91 213 WOP 05JAN2022 03MAR2022 1
1413 91 213 WOP 05MAY2022 03MAR2022 3
1413 91 213 WXY 07MAY2022 09FEB2022 2
;
here's the sample data which I added some records to be suitable for adding first.product alongside first.num.
Req output :


1411 69 115 ABC 01FEB2022 30JAN2022 2
1411 69 115 DEF 01FEB2022 30JAN2022 3
1411 69 117 DEF 24FEB2022 24FEB2022 2
1412 70 116 JKL 03JUN2022 21FEB2022 1
1412 70 117 XYZ 03MAY2022 01JAN2022 4
1413 91 213 WOP 05MAY2022 03MAR2022 3
1413 91 213 WXY 07MAY2022 09FEB2022 2
;
After picking 1st appearance of Id, roll, num and product. Thankyou.

 
 
 
Tom
Super User Tom
Super User

Now that you understand how BY processing and FIRST. flags work you should be able to work it out yourself.

But one of your answers does not seem to be right.

Remember your rule implies the data is sorted by the key variables.

So the input should be in this order:

data have;
  length id roll num 8 product $5. date1 date2 amount want 8;
  input id -- want;
  informat date1 date2 date9.;
  format date1 date2 date9.;
cards;
1411 69 115 ABC 01JAN2022 20APR2022 1 0
1411 69 115 ABC 01FEB2022 30JAN2022 2 1
1411 69 115 DEF 01FEB2022 30JAN2022 3 1
1411 69 115 DEF 09MAR2022 11APR2022 1 0
1411 69 117 DEF 24FEB2022 24FEB2022 2 1
1412 70 116 JKL 04MAR2022 29JAN2022 2 0
1412 70 116 JKL 03JUN2022 21FEB2022 1 0
1412 70 117 XYZ 01MAR2022 06MAR2022 3 0
1412 70 117 XYZ 03MAY2022 01JAN2022 4 1
1413 91 213 WOP 05JAN2022 03MAR2022 1 0
1413 91 213 WOP 05MAY2022 03MAR2022 3 1
1413 91 213 WXY 07MAY2022 09FEB2022 2 1
;

Now the code to select the records will look like this. (Note I also changed the logic to properly calculate whether the AMOUNT has increased without getting confused by crossing group boundries.)

data want ;
   set have ;
   by id roll num product date1 ;
   retain found;
   increase = (amount > lag(amount));
   if first.product then found=0;
   if first.product then increase=0;
* Rule one ;
   if first.product and (date1 >=date2) then do;
      found=1;
      output;
   end;
* Rule two ;
   if not found and  increase and (date1 >=date2) then do; 
      found=1;
      output;
   end;       
   drop found;
run;

Results:

Tom_0-1652416002914.png

Why did you think that for the group defined by id=1412, roll=70, num=116 and product='JKL' that it should select the other record?

Pandu2
Obsidian | Level 7
Thanks for providing the code. You used want and increase for the sake of testing but when comes my data I can't use them bcuz the data which I provided was already in a table, from that I picked some data and posted here. Could you please ask your question again, I didn't get it and I didn't say it should select other record.
Pandu2
Obsidian | Level 7
Cards;
1414 67 114 RIG 08MAR2022 01FEB2022 1
1414 67 114 GIR 01FEB2022 30JAN2022 2
;

Is that possible not  to pick the 1st appearance of Id,roll, num but product name changes and DATE1 >= DATE2  meets but when a new product comes for same id, roll, num and if the amount for new product is > the amount of old product for s

Required output

1414 67 114 RIG 01FEB2022 30JAN2022 1

ame id, roll, num  that new product row shouldn't be picked. Could you please add this new condition to your code.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 67 replies
  • 1193 views
  • 1 like
  • 6 in conversation