BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SASuserlot
Barite | Level 11

Hi, I am new to Proc SQL, I would like to know how we can remove the duplicates from dataset and achieve the exactly same thing that I can achieve through the PROC SORT. In my example I want achieve the output exactly  looks like CLASS1 dataset using PROC SQL.  I am trying with sql  but having difficulty to achieve the same., I am able to get age, and sex but not the remaining variable. Can you please let me know what I am doing wrong. Thanks in advance.

data class;
set sashelp.class;
run;

proc sort data= class out= class1 nodupkey; by sex age ; run;

proc sql;
create table clsql as select distinct age,sex
from class  group by age,sex order by age,sex;
quit;

SASuserlot_0-1645127590849.png

 

1 ACCEPTED SOLUTION

Accepted Solutions
SASKiwi
PROC Star

Here is an SQL approach for the rule I previously described:

proc sql;
  create table want as
  select A.*
  from sashelp.class as A
  inner join
  (select  age
          ,sex 
          ,max(height) as Max_Height
   from sashelp.class
   group by age
           ,sex
  ) as B 
  on A.age = B.age
  and A.sex = B.sex
  and A.height = B.Max_Height
  order by sex, age
  ;
quit;

View solution in original post

7 REPLIES 7
ballardw
Super User

Since Proc Sort will create different datasets given a different order of the data I think you need to consider and describe in much more excruciating detail what your real use case may be.

 

The initial order of the data set in Proc sort with NODUPKEY will affect the result. Here is an example.

data class;
set sashelp.class;
run;

proc sort data= class out= class1 nodupkey; 
   by sex age ; 
run;

proc sort data=class;
   by height;
run;

proc sort data= class out= class2 nodupkey; 
   by sex age ; 
run;

If Class1 and Class2 are the same when you run this code I would be very surprised.

 

SQL has a very similar data order issue as it is not designed to process data in any given sequence. Sometimes with large data sets processing the same code on unchanged data sets will result in different results if order is considered.

SASuserlot
Barite | Level 11

Hi @ballardw . Thank you for response. I am not pro with SQL,  So I am learning how I can I achieve the same that we can do by proc sort to know  how we can avoid duplicates. 

For your question, When I ran your code, I see  Class1 and Class2 are not same. 

Is it possible for you provide and example where we can remove the duplicates and achieve same using proc sort and proc sql, That will  be greatly appreciated. I think class dataset may not be good .Thanks again.

SASKiwi
PROC Star

You need to define a deduplication rule that precisely selects the rows you want so that it will work the same in PROC SORT and PROC SQL.

 

For example with the CLASS dataset you could say: create a table that contains the tallest student for each age and sex value combination. As long as the data has unique values for height (which I think it has), you now have a precise definition that you can code in PROC SORT and PROC SQL and you will get the same result.

SASuserlot
Barite | Level 11

Got it thank you. can you provide an example how to remove duplicates using the Proc sql, may at two variable level. I do have idea using 'distinct' for single variable level duplication removal. Thanks

Kurt_Bremser
Super User

If you want the same result in SORT and SQL, you need to design a rule for which duplicate to select, and then you can force both to implement that rule. Without forcing a specific rule, the results will be indeterminate (even in SORT, depending on storage engines).

So we first need that rule.

SASKiwi
PROC Star

Here is an SQL approach for the rule I previously described:

proc sql;
  create table want as
  select A.*
  from sashelp.class as A
  inner join
  (select  age
          ,sex 
          ,max(height) as Max_Height
   from sashelp.class
   group by age
           ,sex
  ) as B 
  on A.age = B.age
  and A.sex = B.sex
  and A.height = B.Max_Height
  order by sex, age
  ;
quit;
SASuserlot
Barite | Level 11

Thanks

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 4799 views
  • 4 likes
  • 4 in conversation