BookmarkSubscribeRSS Feed
Tom
Super User Tom
Super User

I ran a quick test to see what combination of BY variables would lead to 6 distinct combinations.

proc summary data=have chartype;
  class id--department / missing;
  output out=groups;
run;

proc freq order=freq; 
  tables _type_ / noprint out=counts;
run;

proc print data=counts;
  where count=6;
run;

And the result is:

Obs    _TYPE_    COUNT    PERCENT

 25    100010      6      1.76471
 26    100110      6      1.76471
 27    101010      6      1.76471
 28    101110      6      1.76471

So the minimum set is the first one, 100010, which means the first and fifth variables, ID AND REASON.

 

So sort by ID and REASON then filter by FIRST.REASON.  (Or sort by REASON and ID and filter by first.ID) and you will get 6 observations out.

 

To get the exact observations you picked manually you will also need to sort by some of the other variables so that the one you want to select is the first one for that combination of ID and REASON.

 

This list of BY variables works for your example.

proc sort data=have ;
  by id reason name ;
run;

proc sort data=expect; 
  by id reason name ;
run;

data want;
  set have;
  by id reason name ;
  if first.reason;
run;

proc compare data=want compare=expect;
run;
mkeintz
PROC Star

Let's assume you have data sorted by MAJOR_KEY, INTERMEDIATE_KEY, MINOR_KEY.   And for some of those triplets, there are duplicates you want to eliminate.   If you do this

 

data want;
   set mydata;
   by major_key intermediate_key minor_key;
   if first.minor_key;
run;

You will get one record per unique triplet.  It that your goal?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Reeza
Super User

@Akshaya_1397 wrote:

Hi,

I have a datset final in which there are columns like

 

Id,name,function,status,reason, department and has values

123,flowers-gift,nw,onboard,blank SLS,human resource

123,flowers-gifts,nw,onboard,missing country,Human resource

123,onboard gift,nw,onboard,missing country, 

 

I need to take unique rows so I performed first. In the dataset

Data new;

Set final;

If first.id and first.name and first.department then output;

Run;

 

The output I'm getting is only the first row that is with the reason blank SSL but I also need the second row with missing country.

 

Could anyone please help me on this

 

 


Assuming your comment in Red is true, SQL is a good option. 

You can also use SORT (see the examples below).

 

*make test data;

data class;
    set sashelp.class sashelp.class (obs=8);
run;

*select unique records;

proc sql;
    create table distinct_records as 
    select distinct * 
    from class;
quit;

proc sort data=class;
    by _all_;
run;

proc sort data=class noduprecs out=distinct_via_sort;
    by _all_;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 17 replies
  • 4370 views
  • 2 likes
  • 6 in conversation