Re: First. And last .

Tom · Posted 06-30-2023 11:12 AM

I ran a quick test to see what combination of BY variables would lead to 6 distinct combinations.

proc summary data=have chartype;
  class id--department / missing;
  output out=groups;
run;

proc freq order=freq; 
  tables _type_ / noprint out=counts;
run;

proc print data=counts;
  where count=6;
run;

And the result is:

Obs    _TYPE_    COUNT    PERCENT

 25    100010      6      1.76471
 26    100110      6      1.76471
 27    101010      6      1.76471
 28    101110      6      1.76471

So the minimum set is the first one, 100010, which means the first and fifth variables, ID AND REASON.

So sort by ID and REASON then filter by FIRST.REASON. (Or sort by REASON and ID and filter by first.ID) and you will get 6 observations out.

To get the exact observations you picked manually you will also need to sort by some of the other variables so that the one you want to select is the first one for that combination of ID and REASON.

This list of BY variables works for your example.

proc sort data=have ;
  by id reason name ;
run;

proc sort data=expect; 
  by id reason name ;
run;

data want;
  set have;
  by id reason name ;
  if first.reason;
run;

proc compare data=want compare=expect;
run;

mkeintz · Posted 06-30-2023 08:00 PM

Let's assume you have data sorted by MAJOR_KEY, INTERMEDIATE_KEY, MINOR_KEY. And for some of those triplets, there are duplicates you want to eliminate. If you do this

data want;
   set mydata;
   by major_key intermediate_key minor_key;
   if first.minor_key;
run;

You will get one record per unique triplet. It that your goal?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Reeza · Posted 06-30-2023 11:27 AM

@Akshaya_1397 wrote:

Hi,

I have a datset final in which there are columns like

Id,name,function,status,reason, department and has values

123,flowers-gift,nw,onboard,blank SLS,human resource

123,flowers-gifts,nw,onboard,missing country,Human resource

123,onboard gift,nw,onboard,missing country,

I need to take unique rows so I performed first. In the dataset

Data new;

Set final;

If first.id and first.name and first.department then output;

Run;

The output I'm getting is only the first row that is with the reason blank SSL but I also need the second row with missing country.

Could anyone please help me on this

Assuming your comment in Red is true, SQL is a good option.

You can also use SORT (see the examples below).

*make test data;

data class;
    set sashelp.class sashelp.class (obs=8);
run;

*select unique records;

proc sql;
    create table distinct_records as 
    select distinct * 
    from class;
quit;

proc sort data=class;
    by _all_;
run;

proc sort data=class noduprecs out=distinct_via_sort;
    by _all_;
run;

Re: First. And last .

Re: First. And last .

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away