BookmarkSubscribeRSS Feed
Tom
Super User Tom
Super User

I ran a quick test to see what combination of BY variables would lead to 6 distinct combinations.

proc summary data=have chartype;
  class id--department / missing;
  output out=groups;
run;

proc freq order=freq; 
  tables _type_ / noprint out=counts;
run;

proc print data=counts;
  where count=6;
run;

And the result is:

Obs    _TYPE_    COUNT    PERCENT

 25    100010      6      1.76471
 26    100110      6      1.76471
 27    101010      6      1.76471
 28    101110      6      1.76471

So the minimum set is the first one, 100010, which means the first and fifth variables, ID AND REASON.

 

So sort by ID and REASON then filter by FIRST.REASON.  (Or sort by REASON and ID and filter by first.ID) and you will get 6 observations out.

 

To get the exact observations you picked manually you will also need to sort by some of the other variables so that the one you want to select is the first one for that combination of ID and REASON.

 

This list of BY variables works for your example.

proc sort data=have ;
  by id reason name ;
run;

proc sort data=expect; 
  by id reason name ;
run;

data want;
  set have;
  by id reason name ;
  if first.reason;
run;

proc compare data=want compare=expect;
run;
mkeintz
PROC Star

Let's assume you have data sorted by MAJOR_KEY, INTERMEDIATE_KEY, MINOR_KEY.   And for some of those triplets, there are duplicates you want to eliminate.   If you do this

 

data want;
   set mydata;
   by major_key intermediate_key minor_key;
   if first.minor_key;
run;

You will get one record per unique triplet.  It that your goal?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Reeza
Super User

@Akshaya_1397 wrote:

Hi,

I have a datset final in which there are columns like

 

Id,name,function,status,reason, department and has values

123,flowers-gift,nw,onboard,blank SLS,human resource

123,flowers-gifts,nw,onboard,missing country,Human resource

123,onboard gift,nw,onboard,missing country, 

 

I need to take unique rows so I performed first. In the dataset

Data new;

Set final;

If first.id and first.name and first.department then output;

Run;

 

The output I'm getting is only the first row that is with the reason blank SSL but I also need the second row with missing country.

 

Could anyone please help me on this

 

 


Assuming your comment in Red is true, SQL is a good option. 

You can also use SORT (see the examples below).

 

*make test data;

data class;
    set sashelp.class sashelp.class (obs=8);
run;

*select unique records;

proc sql;
    create table distinct_records as 
    select distinct * 
    from class;
quit;

proc sort data=class;
    by _all_;
run;

proc sort data=class noduprecs out=distinct_via_sort;
    by _all_;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 1490 views
  • 2 likes
  • 6 in conversation