BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SASEnthusiast
Obsidian | Level 7

Hi, 

My data looks like this,

Customer_IdTransaction_NoTransaction_DateAmountSales_Credit
46998090228822-Sep-1555.32
46998068647812-Sep-1562.31
46998068720312-Sep-1555.31
49198173522312-Oct-1515.992
49198091729322-Sep-1515.991
49198091729422-Sep-1515.991
49198093835222-Sep-1523.991
63208092608523-Sep-1551.752
63208051771309-Sep-1551.751
63208051263309-Sep-1551.751

 

I need to find only the data with actual sales. Like in customer_id "4699" only the second observation remains. However with customer_id= "4919" I need to randomly delete either of the two observations having amount 15.99. Could you help me how to proceed with this? If statements with first.variable concept or proc sql? 

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

Here could be solution using Hash:

 

data have;
infile cards dlm='09'x truncover;
input (Customer_Id	Transaction_No)	(:$20.) Transaction_Date :anydtdte20.	Amount	Sales_Credit ;
format Transaction_Date date9.;
cards;
4699	80902288	22-Sep-15	55.3	2
4699	80686478	12-Sep-15	62.3	1
4699	80687203	12-Sep-15	55.3	1
4919	81735223	12-Oct-15	15.99	2
4919	80917293	22-Sep-15	15.99	1
4919	80917294	22-Sep-15	15.99	1
4919	80938352	22-Sep-15	23.99	1
6320	80926085	23-Sep-15	51.75	2
6320	80517713	09-Sep-15	51.75	1
6320	80512633	09-Sep-15	51.75	1
;

data want;
if _n_=1 then do;
dcl hash h(dataset:'have(drop=Transaction_No rename=(Transaction_Date=_date Sales_Credit=_credit) where=(_Credit=2))', multidata:'y');
h.definekey('customer_id', 'amount');
h.definedata(all: 'y');
h.definedone();
call missing (_credit, _date);
end;
set have(where=(Sales_Credit=1));
if h.find()=0 and _date >= Transaction_Date then do;
h.removedup();
delete;
end;
drop _:;
run;

 

View solution in original post

11 REPLIES 11
LinusH
Tourmaline | Level 20

Can you describe why you wish to perform those actions?

And define "actual sale", it isn't self explanatory.

Reducing records be randomly deletes, AND keep details (in your case transaction number) - why? It's "dangerous, since the details it's not a fact any more.

 

I would simply combine a filter with select distinct.

Data never sleeps
SASEnthusiast
Obsidian | Level 7

I am sorry to miss that information. When sales_credit= 2, it refers to a "Product return". Therefore by removing that observation and the observation with similar amount within the customer_id I am trying to achieve only those observations that show actual sales.

 

And yes, I agree the transaction_id is not required once I randomly select data.

Reeza
Super User

Can you guarantee that there will always be an exact match when it's a 2?

And do you need sale dates?

SASEnthusiast
Obsidian | Level 7

Yes, I can. Although there can be multiple 2's for one customer_id with its respective purchase transaction.

Like this:

16759 80902332 22-Sep-15 19.99 2
16759 80902327 22-Sep-15 19.99 2
16759 80902328 22-Sep-15 19.99 2
16759 80902326 22-Sep-15 19.99 2
16759 80902335 22-Sep-15 19.99 2
16759 80159162 05-Sep-15 19.99 1
16759 80161144 05-Sep-15 19.99 1
16759 80160655 05-Sep-15 19.99 1
16759 80191135 06-Sep-15 19.99 1
16759 80191307 06-Sep-15 19.99 1

 

Would it be easier if I skip the the transaction_date?

Reeza
Super User
Sort of. I would consider simply marking any purchase that was 2 a negative (mulitply by negative 1) and then sum by ID. THat would give total purchase by user...not sure that's what you're after though.
Haikuo
Onyx | Level 15

Here could be solution using Hash:

 

data have;
infile cards dlm='09'x truncover;
input (Customer_Id	Transaction_No)	(:$20.) Transaction_Date :anydtdte20.	Amount	Sales_Credit ;
format Transaction_Date date9.;
cards;
4699	80902288	22-Sep-15	55.3	2
4699	80686478	12-Sep-15	62.3	1
4699	80687203	12-Sep-15	55.3	1
4919	81735223	12-Oct-15	15.99	2
4919	80917293	22-Sep-15	15.99	1
4919	80917294	22-Sep-15	15.99	1
4919	80938352	22-Sep-15	23.99	1
6320	80926085	23-Sep-15	51.75	2
6320	80517713	09-Sep-15	51.75	1
6320	80512633	09-Sep-15	51.75	1
;

data want;
if _n_=1 then do;
dcl hash h(dataset:'have(drop=Transaction_No rename=(Transaction_Date=_date Sales_Credit=_credit) where=(_Credit=2))', multidata:'y');
h.definekey('customer_id', 'amount');
h.definedata(all: 'y');
h.definedone();
call missing (_credit, _date);
end;
set have(where=(Sales_Credit=1));
if h.find()=0 and _date >= Transaction_Date then do;
h.removedup();
delete;
end;
drop _:;
run;

 

SASEnthusiast
Obsidian | Level 7

This works great!! Thank you so much.

Patrick
Opal | Level 21

@Haikuo

The two things I would change in your code:

- A keep statement for the data set loaded into the hash so that you're sure not to load variables you don't want to load

- Loop over the "duplicates" in the hash and chose the oldest date where "_date >= Transaction_Date" is true. You can't be sure that the first match is the correct one, and it could also be possible that the condition is true for more than one return.

 

 

Haikuo
Onyx | Level 15

@Patrick wrote:

@Haikuo

The two things I would change in your code:

- A keep statement for the data set loaded into the hash so that you're sure not to load variables you don't want to load

- Loop over the "duplicates" in the hash and chose the oldest date where "_date >= Transaction_Date" is true. You can't be sure that the first match is the correct one, and it could also be possible that the condition is true for more than one return.

 

 


@Patrick

1. Absolutely Agreed.

2. Not so sure about that. Customer behaves randomly, your suggestion is based on the assumption that 'first in - first out', which is not always necessarily the case. Without more information  to identify the paired transactions, this is as far as I would go. But I do believe  you have raised a valid point to OP.

 

Haikuo

Patrick
Opal | Level 21

@Haikuo

I believe if you don't select the earliest return date which satifies condition "_date >= Transaction_Date" then you are at risk to miss some returns. Consider the following:

Rec_No Customer_Id Transaction_Date Amount Sales_Credit
1 1 1-Nov-15 5 1
2 1 2-Nov-15 5 2
3 1 3-Nov-15 5 1
4 1 4-Nov-15 5 2

 

If you use record 4 to remove record 1 then the pair record 2/record 3 won't satisfy condition "_date >= Transaction_Date" and you'll miss the removal of record 3.

Haikuo
Onyx | Level 15

 

@Patrick wrote:

@Haikuo

I believe if you don't select the earliest return date which satifies condition "_date >= Transaction_Date" then you are at risk to miss some returns. Consider the following:

Rec_NoCustomer_IdTransaction_DateAmountSales_Credit
111-Nov-1551
212-Nov-1552
313-Nov-1551
414-Nov-1552

 

If you use record 4 to remove record 1 then the pair record 2/record 3 won't satisfy condition "_date >= Transaction_Date" and you'll miss the removal of record 3.


@Patrick,

Thank you for your following up and your point is well taken. However, if OP unfortunately has data like that, the problem is beyond a simple code fix. Surely you will have your 'balance sheet', but you will never know if paired transations are the actual pairs without more information. If the purpose is just to get the net 'balance', then the condition of '_date >= Transaction_Date' can be completely ignored. Given the data as is, my approach is to get as far as the 'known' can stand, and stop right at the point where 'unknown' starts.

 

Just my 2cents,

Haikuo

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1693 views
  • 2 likes
  • 5 in conversation