BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Obsidian | Level 7

## Delete observations in a by variable group on condition

Hi,

My data looks like this,

 Customer_Id Transaction_No Transaction_Date Amount Sales_Credit 4699 80902288 22-Sep-15 55.3 2 4699 80686478 12-Sep-15 62.3 1 4699 80687203 12-Sep-15 55.3 1 4919 81735223 12-Oct-15 15.99 2 4919 80917293 22-Sep-15 15.99 1 4919 80917294 22-Sep-15 15.99 1 4919 80938352 22-Sep-15 23.99 1 6320 80926085 23-Sep-15 51.75 2 6320 80517713 09-Sep-15 51.75 1 6320 80512633 09-Sep-15 51.75 1

I need to find only the data with actual sales. Like in customer_id "4699" only the second observation remains. However with customer_id= "4919" I need to randomly delete either of the two observations having amount 15.99. Could you help me how to proceed with this? If statements with first.variable concept or proc sql?

1 ACCEPTED SOLUTION

Accepted Solutions
Onyx | Level 15

## Re: Delete observations in a by variable group on condition

Here could be solution using Hash:

```data have;
infile cards dlm='09'x truncover;
input (Customer_Id	Transaction_No)	(:\$20.) Transaction_Date :anydtdte20.	Amount	Sales_Credit ;
format Transaction_Date date9.;
cards;
4699	80902288	22-Sep-15	55.3	2
4699	80686478	12-Sep-15	62.3	1
4699	80687203	12-Sep-15	55.3	1
4919	81735223	12-Oct-15	15.99	2
4919	80917293	22-Sep-15	15.99	1
4919	80917294	22-Sep-15	15.99	1
4919	80938352	22-Sep-15	23.99	1
6320	80926085	23-Sep-15	51.75	2
6320	80517713	09-Sep-15	51.75	1
6320	80512633	09-Sep-15	51.75	1
;

data want;if _n_=1 then do;dcl hash h(dataset:'have(drop=Transaction_No rename=(Transaction_Date=_date Sales_Credit=_credit) where=(_Credit=2))', multidata&colon;'y');h.definekey('customer_id', 'amount');h.definedata(all: 'y');h.definedone();call missing (_credit, _date);end;set have(where=(Sales_Credit=1));if h.find()=0 and _date >= Transaction_Date then do;h.removedup();delete;end;drop _:;run;```

11 REPLIES 11
Tourmaline | Level 20

## Re: Delete observations in a by variable group on condition

Can you describe why you wish to perform those actions?

And define "actual sale", it isn't self explanatory.

Reducing records be randomly deletes, AND keep details (in your case transaction number) - why? It's "dangerous, since the details it's not a fact any more.

I would simply combine a filter with select distinct.

Data never sleeps
Obsidian | Level 7

## Re: Delete observations in a by variable group on condition

I am sorry to miss that information. When sales_credit= 2, it refers to a "Product return". Therefore by removing that observation and the observation with similar amount within the customer_id I am trying to achieve only those observations that show actual sales.

And yes, I agree the transaction_id is not required once I randomly select data.

Super User

## Re: Delete observations in a by variable group on condition

Can you guarantee that there will always be an exact match when it's a 2?

And do you need sale dates?

Obsidian | Level 7

## Re: Delete observations in a by variable group on condition

Yes, I can. Although there can be multiple 2's for one customer_id with its respective purchase transaction.

Like this:

 16759 80902332 22-Sep-15 19.99 2 16759 80902327 22-Sep-15 19.99 2 16759 80902328 22-Sep-15 19.99 2 16759 80902326 22-Sep-15 19.99 2 16759 80902335 22-Sep-15 19.99 2 16759 80159162 05-Sep-15 19.99 1 16759 80161144 05-Sep-15 19.99 1 16759 80160655 05-Sep-15 19.99 1 16759 80191135 06-Sep-15 19.99 1 16759 80191307 06-Sep-15 19.99 1

Would it be easier if I skip the the transaction_date?

Super User

## Re: Delete observations in a by variable group on condition

Sort of. I would consider simply marking any purchase that was 2 a negative (mulitply by negative 1) and then sum by ID. THat would give total purchase by user...not sure that's what you're after though.
Onyx | Level 15

## Re: Delete observations in a by variable group on condition

Here could be solution using Hash:

```data have;
infile cards dlm='09'x truncover;
input (Customer_Id	Transaction_No)	(:\$20.) Transaction_Date :anydtdte20.	Amount	Sales_Credit ;
format Transaction_Date date9.;
cards;
4699	80902288	22-Sep-15	55.3	2
4699	80686478	12-Sep-15	62.3	1
4699	80687203	12-Sep-15	55.3	1
4919	81735223	12-Oct-15	15.99	2
4919	80917293	22-Sep-15	15.99	1
4919	80917294	22-Sep-15	15.99	1
4919	80938352	22-Sep-15	23.99	1
6320	80926085	23-Sep-15	51.75	2
6320	80517713	09-Sep-15	51.75	1
6320	80512633	09-Sep-15	51.75	1
;

data want;if _n_=1 then do;dcl hash h(dataset:'have(drop=Transaction_No rename=(Transaction_Date=_date Sales_Credit=_credit) where=(_Credit=2))', multidata&colon;'y');h.definekey('customer_id', 'amount');h.definedata(all: 'y');h.definedone();call missing (_credit, _date);end;set have(where=(Sales_Credit=1));if h.find()=0 and _date >= Transaction_Date then do;h.removedup();delete;end;drop _:;run;```

Obsidian | Level 7

## Re: Delete observations in a by variable group on condition

This works great!! Thank you so much.

Opal | Level 21

## Re: Delete observations in a by variable group on condition

@Haikuo

The two things I would change in your code:

- A keep statement for the data set loaded into the hash so that you're sure not to load variables you don't want to load

- Loop over the "duplicates" in the hash and chose the oldest date where "_date >= Transaction_Date" is true. You can't be sure that the first match is the correct one, and it could also be possible that the condition is true for more than one return.

Onyx | Level 15

## Re: Delete observations in a by variable group on condition

@Patrick wrote:

@Haikuo

The two things I would change in your code:

- A keep statement for the data set loaded into the hash so that you're sure not to load variables you don't want to load

- Loop over the "duplicates" in the hash and chose the oldest date where "_date >= Transaction_Date" is true. You can't be sure that the first match is the correct one, and it could also be possible that the condition is true for more than one return.

@Patrick

1. Absolutely Agreed.

2. Not so sure about that. Customer behaves randomly, your suggestion is based on the assumption that 'first in - first out', which is not always necessarily the case. Without more information  to identify the paired transactions, this is as far as I would go. But I do believe  you have raised a valid point to OP.

Haikuo

Opal | Level 21

## Re: Delete observations in a by variable group on condition

@Haikuo

I believe if you don't select the earliest return date which satifies condition "_date >= Transaction_Date" then you are at risk to miss some returns. Consider the following:

 Rec_No Customer_Id Transaction_Date Amount Sales_Credit 1 1 1-Nov-15 5 1 2 1 2-Nov-15 5 2 3 1 3-Nov-15 5 1 4 1 4-Nov-15 5 2

If you use record 4 to remove record 1 then the pair record 2/record 3 won't satisfy condition "_date >= Transaction_Date" and you'll miss the removal of record 3.

Onyx | Level 15

## Re: Delete observations in a by variable group on condition

@Patrick wrote:

@Haikuo

I believe if you don't select the earliest return date which satifies condition "_date >= Transaction_Date" then you are at risk to miss some returns. Consider the following:

 Rec_No Customer_Id Transaction_Date Amount Sales_Credit 1 1 1-Nov-15 5 1 2 1 2-Nov-15 5 2 3 1 3-Nov-15 5 1 4 1 4-Nov-15 5 2

If you use record 4 to remove record 1 then the pair record 2/record 3 won't satisfy condition "_date >= Transaction_Date" and you'll miss the removal of record 3.

Thank you for your following up and your point is well taken. However, if OP unfortunately has data like that, the problem is beyond a simple code fix. Surely you will have your 'balance sheet', but you will never know if paired transations are the actual pairs without more information. If the purpose is just to get the net 'balance', then the condition of '_date >= Transaction_Date' can be completely ignored. Given the data as is, my approach is to get as far as the 'known' can stand, and stop right at the point where 'unknown' starts.

Just my 2cents,

Haikuo

Discussion stats
• 11 replies
• 1739 views
• 2 likes
• 5 in conversation