SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

revoming duplicate using proc sql when there is a certain rule.

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 14
Accepted Solution

revoming duplicate using proc sql when there is a certain rule.

Hi everyone, 

 

If I have a data like this. 

 

data t;INPUT id dis;cards;
1 0
1 1
2 0
3 1
3 0
;

 

There is a duplicate when ID = 1 and 3. 

 

I want to remove them but keep the dis status is 1. 

 

I can do to by sorting the data by ID and descending the variable of dis. Then I can use the proc sort with nodupkey statement to keep the first id in the data set. 

 

However, I want to learn how to do it by Proc SQL. 

 

I try use the following syntax. 

 

proc sql; select distinct ID, dis from t HAVING dis=1; quit;

 

But, it turns out that the observation 2 is not included because its dis status= 0. 

 

Further, how to do that when there are more complex relationships and more variables?

 

For example, if there are two rule. 

 

data t;INPUT id dis exp;cards;
1 0 1
1 1 2
2 0 3
3 1 4
3 0 5
;

 

I want to pic up the distinct ID when Dis stautus = 1 and Exp status = 2,3,4. 

 

Can anyone tell me how to do?

 

thx. 

 


Accepted Solutions
Solution
‎11-29-2015 10:18 PM
Frequent Contributor
Posts: 85

Re: revoming duplicate using proc sql when there is a certain rule.

In your simple case your sql should like this:

proc sql;
  create table want as
  select distinct id, dis
  from t
  group by id
  having dis = max(dis);

 

But for the more complicated examples you describe, the sql would get pretty ugly and, and I understand you're keen to explore sql, its really not the best option.

 

Sort and Data step is the simplest: 

proc sort data=t ;
  by id descending dis descending exp;
run; 
data want;
  set t;
  by id;
  if first.id;
run;

View solution in original post


All Replies
Solution
‎11-29-2015 10:18 PM
Frequent Contributor
Posts: 85

Re: revoming duplicate using proc sql when there is a certain rule.

In your simple case your sql should like this:

proc sql;
  create table want as
  select distinct id, dis
  from t
  group by id
  having dis = max(dis);

 

But for the more complicated examples you describe, the sql would get pretty ugly and, and I understand you're keen to explore sql, its really not the best option.

 

Sort and Data step is the simplest: 

proc sort data=t ;
  by id descending dis descending exp;
run; 
data want;
  set t;
  by id;
  if first.id;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 1 reply
  • 423 views
  • 0 likes
  • 2 in conversation