DATA Step, Macro, Functions and more

how to find duplicated observations by amount

Reply
Occasional Contributor
Posts: 18

how to find duplicated observations by amount

Hi, Pros:

Here is my senario:

Data structure as follows:
*******************************************
id date service type amt
1 12/1/2009 station1 A +5
1 12/1/2009 station1 A +15
1 12/1/2009 station1 A -15

2 12/1/2009 station1 B +5
2 12/1/2009 station1 B -5
2 12/1/2009 station1 B +25
2 12/1/2009 station1 B -25
2 12/1/2009 station1 B 20
*****************************************

Identify unique observation by sort by id date service type;

The following result is expected. Many thanks. Tulip
***************************************
id date service type amt
1 12/1/2009 station1 A +5
2 12/1/2009 station1 B 20
***************************************
Super Contributor
Super Contributor
Posts: 3,174

Re: how to find duplicated observations by amount

Please provide specific expectation so one will not make their own interpretation. Specifically, I see where you chose the "first" occurence of variables "id date service type" with the first grouping but you chose the "last" occurences of the 2nd group.

For this objective, regardless, you should explore using PROC SORT and a DATA step approach using a BY statement and IF FIRST. and/or IF LAST. to perform your data selection.

Suggested Google advanced search argument for related DOC:

data step by group processing site:sas.com


Scott Barry
SBBWorks, Inc.
N/A
Posts: 0

Re: how to find duplicated observations by amount

Hi tulip

I'm not sure if I understand the problem exactly, however this code perform expected results.

DATA station;
INPUT id date:$10. service $ type $ amt;
CARDS;
1 12/1/2009 station1 A +5
1 12/1/2009 station1 A +15
1 12/1/2009 station1 A -15
2 12/1/2009 station1 B +5
2 12/1/2009 station1 B -5
2 12/1/2009 station1 B +25
2 12/1/2009 station1 B -25
2 12/1/2009 station1 B 20
;
RUN;

proc sql;
select id, date, service, type, amt
from station
group by id, date, service, type, abs(amt)
having count(*) < 2
;
quit;

best regards

Tomaž
Occasional Contributor
Posts: 18

Re: how to find duplicated observations by amount

Hi, guys:

It is so nice of you. I appreciate your help very much.

The issue in this case can not be taken care of by using sort and by first....

The final observation needs to contain the correct amount.

Thanks Tom. Could you please try again the following case please?

DATA station;
INPUT id date:$10. service $ type $ amt;
CARDS;
1 12/1/2009 station1 A +5
1 12/1/2009 station1 A +15
1 12/1/2009 station1 A -15

2 12/1/2009 station1 B +5
2 12/1/2009 station1 B -5

2 12/1/2009 station1 B 20
2 12/1/2009 station1 B +25
2 12/1/2009 station1 B -25

3 12/1/2009 station1 B +55
3 12/1/2009 station1 B -55
3 12/1/2009 station1 B 55
;
RUN;

proc sql;
select id, date, service, type, amt
from station
group by id, date, service, type, abs(amt)
having count(*) < 2
;
quit;

final expects the following.
1 12/1/2009 station1 A +5

2 12/1/2009 station1 B 20

3 12/1/2009 station1 B 55 Message was edited by: tulip
Occasional Contributor
Posts: 18

Re: how to find duplicated observations by amount

Hi, Tomaz:

Could you please try again the following case additionally please?
there is length limit so I do not show all.
DATA station;
INPUT id date:$10. service $ type $ amt;
CARDS;
3 12/1/2009 station1 B +55
3 12/1/2009 station1 B -55
3 12/1/2009 station1 B 55
;
RUN;

proc sql;
select id, date, service, type, amt
from station
group by id, date, service, type, abs(amt)
having count(*) < 2
;
quit;

final expects the following.
1 12/1/2009 station1 A +5

2 12/1/2009 station1 B 20

3 12/1/2009 station1 B 55
Occasional Contributor
Posts: 18

Re: how to find duplicated observations by amount

final show is 3 12/1/2009 station1 B +55
N/A
Posts: 0

Re: how to find duplicated observations by amount

I see
maybe using sum instead abs is more appropriate.

Tomaž
Occasional Contributor
Posts: 18

Re: how to find duplicated observations by amount

Belated thanks, Tomaz. Further help is needed.

when I replace abs to sum function. Error message came up. Please advise. Thanks.

DATA station;
INPUT id date:$10. service $ type $ amt;
CARDS;
3 12/1/2009 station1 B +55
3 12/1/2009 station1 B -55
3 12/1/2009 station1 B +55
;
RUN;
proc sql;
select id, date, service, type, amt
from station
group by id, date, service, type, abs(amt)
having count(*) < 2 ;
;
quit; Message was edited by: tulip
Super Contributor
Super Contributor
Posts: 3,174

Re: how to find duplicated observations by amount

As I stated previously, you can use IF LAST. THEN OUTPUT; with a sorted file using a BY statement and being the last variable in the BY. From your explanation, I believe the BY variables are ID, DATE (suggest inputting as a SAS NUMERIC date, not character), SERVICE and TYPE. Use EQUALS on the SORT to ensure the incoming order is maintained for your AMT values.

Again, see the recommended SAS support site DOC references for details, and also the EQUALS parameter discussion for PROC SORT.

Scott Barry
SBBWorks, Inc.
Ask a Question
Discussion stats
  • 8 replies
  • 189 views
  • 0 likes
  • 3 in conversation