I ran a quick test to see what combination of BY variables would lead to 6 distinct combinations.
proc summary data=have chartype;
class id--department / missing;
output out=groups;
run;
proc freq order=freq;
tables _type_ / noprint out=counts;
run;
proc print data=counts;
where count=6;
run;
And the result is:
Obs _TYPE_ COUNT PERCENT 25 100010 6 1.76471 26 100110 6 1.76471 27 101010 6 1.76471 28 101110 6 1.76471
So the minimum set is the first one, 100010, which means the first and fifth variables, ID AND REASON.
So sort by ID and REASON then filter by FIRST.REASON. (Or sort by REASON and ID and filter by first.ID) and you will get 6 observations out.
To get the exact observations you picked manually you will also need to sort by some of the other variables so that the one you want to select is the first one for that combination of ID and REASON.
This list of BY variables works for your example.
proc sort data=have ;
by id reason name ;
run;
proc sort data=expect;
by id reason name ;
run;
data want;
set have;
by id reason name ;
if first.reason;
run;
proc compare data=want compare=expect;
run;
Let's assume you have data sorted by MAJOR_KEY, INTERMEDIATE_KEY, MINOR_KEY. And for some of those triplets, there are duplicates you want to eliminate. If you do this
data want;
set mydata;
by major_key intermediate_key minor_key;
if first.minor_key;
run;
You will get one record per unique triplet. It that your goal?
@Akshaya_1397 wrote:
Hi,
I have a datset final in which there are columns like
Id,name,function,status,reason, department and has values
123,flowers-gift,nw,onboard,blank SLS,human resource
123,flowers-gifts,nw,onboard,missing country,Human resource
123,onboard gift,nw,onboard,missing country,
I need to take unique rows so I performed first. In the dataset
Data new;
Set final;
If first.id and first.name and first.department then output;
Run;
The output I'm getting is only the first row that is with the reason blank SSL but I also need the second row with missing country.
Could anyone please help me on this
Assuming your comment in Red is true, SQL is a good option.
You can also use SORT (see the examples below).
*make test data;
data class;
set sashelp.class sashelp.class (obs=8);
run;
*select unique records;
proc sql;
create table distinct_records as
select distinct *
from class;
quit;
proc sort data=class;
by _all_;
run;
proc sort data=class noduprecs out=distinct_via_sort;
by _all_;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.