Desktop productivity for business analysts and programmers

How to filter out all instances of a value but one?

Reply
N/A
Posts: 0

How to filter out all instances of a value but one?

I am working with a dataset containing customer invoice information. Some customers are represented mutiple times in the dataset on different dates/products. Every customer is identified with af unique numerical value. Now the problem is, I only need the latest customer information.

Lets say cutomer xxxx is represented 8 times in the dataset on different dates. I only need one instance of this particular customer, namely the latest.

I might add, that some times however, a customer will appear more than once on a given date. And in some cases, customers are even mistakenly registered more than once (duplicates).

So basically I am looking for some kind of logic that will remove 7 out of 8 rows containing a particular value (customer ID) preferable keeping the latest. Is that possible to do in EG?

Message was edited by: EGuser Message was edited by: EGuser
Super User
Posts: 5,385

Re: How to filter out all instances of a value but one?

This can be accomplished by using to sort steps. The first order you data by id and date (descending). The second will remove duplicate rows by id, which will keep the first row per id (with highest date value).

/Linus
Data never sleeps
Respected Advisor
Posts: 4,132

Re: How to filter out all instances of a value but one?

That's an example how to do it in a code window.
HTH
Patrick

/* Create some test data */
data have;
format InvoiceDate ddmmyy10.;
do ID=1 to 400;
CustomerId=ceil(ranuni(1)*50);
InvoiceDate=ceil(ranuni(2)*50)+17000;
InvoiceNr=put(ID,z8.);
output;
end;
run;

/* Create a sorted view */
proc sql;
create view Vhave as
select *
from have
order by CustomerId,InvoiceDate;
quit;

/* Pick the obs with the highest (latest) InvoiceDate per customer */
/* It will be only one - even if the obs with the highest InvoiceDate is a duplicate */
data have;
set Vhave;
by CustomerId InvoiceDate;
if last.CustomerId;
run; Message was edited by: Patrick
N/A
Posts: 0

Re: How to filter out all instances of a value but one?

Thanks! This was just what I needed. I guess this can't be done so easily in EG.
Respected Advisor
Posts: 4,132

Re: How to filter out all instances of a value but one?

Hi EGuser
O.K: I took this "point and click" challenge. It wasn't too difficult in the end - but understanding the generated code when I got it wrong first helped a lot....

What I did:
1. I used the code example I've sent you already to generate the test data.
2. I applied "Filter and Query" (EG 4.1):
Base window, tab "Select Data":
- Check the box "Select distinct rows only"
- Click button "edit groups"
- Choose "CustomerId" as group by variable
Base window, tab "Filter Data":
- In the lower part of the window "Filter the summarized data" click the "advanced expression filter" and add the rule: "max(HAVE.InvoiceDate)=HAVE.InvoiceDate "

This "clicking" resulted in the following SQL code:
PROC SQL;
CREATE TABLE WORK.Query_for_HAVE AS SELECT DISTINCT HAVE.InvoiceDate FORMAT=DDMMYY10.,
HAVE.ID,
HAVE.CustomerId,
HAVE.InvoiceNr
FROM WORK.HAVE AS HAVE
GROUP BY HAVE.CustomerId
HAVING max(HAVE.InvoiceDate)=HAVE.InvoiceDate
ORDER BY HAVE.CustomerId;
QUIT;

This does the job! - I still don't know how I would have been able to "click" the right things without understanding the generated code - may be I have to accept that I'm a bit of an old fashioned coder.... :-)

HTH
Patrick
Ask a Question
Discussion stats
  • 4 replies
  • 138 views
  • 0 likes
  • 3 in conversation