BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
danielchoi626
Calcite | Level 5

I am currently working on a dataset that has duplicates in the primary key as shown in the example table I have given below.

PROC SORT DATA = SAMPLE_TABLE NODUPKEY;
BY ID_NO;
RUN;

Dropping duplicates using the above snippet resulted in values being lost however. I would very much appreciate some advice on how to drop duplicates and preserve non-null values.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Sajid01
Meteorite | Level 14

Hello @danielchoi626 
This can be done as shown below. Modify as needed.
I have used your excel as the basis. In your  excel had amount as character. So I have added code to convert it to number.

/*Import your excel sheet */

PROC IMPORT DATAFILE='/home/yourfolder/SAMPLE DATA.xlsx'
	DBMS=XLSX
	OUT=WORK.IMPORT replace;
	GETNAMES=YES;
RUN;
/* Your amount was character.
I am converting  the amount to number. If in your original dataset amount is numeric, then this step not needed */
data import;
set import (Rename=(Amount=Amount_));
Amount=input(Amount_,best12.);
drop amount_;
run;

/* Remove duplicates*/
proc sql;
select distinct a.id_no, b.appl_no,b.Amount from import as a
left join import as b
on a.ID_no=b.ID_no
where b.amount is not missing
order by ID_no;
quit;

The  output will be as you wanted. The not null values from amount have been retained with no duplicates

Sajid01_0-1625588364899.png

Let me know if there are questions.

 

 

View solution in original post

4 REPLIES 4
Kurt_Bremser
Super User

In a PROC SORT that does change more than just the order, it is strongly recommended to use the OUT= option to create a new dataset and leave the source dataset untouched.

Additionally, you can use the DUPOUT= option to store the filtered observations.

andreas_lds
Jade | Level 19

Sorry, but from your description it is not clear what you want to keep and what to drop. If you want to drop all obs having a missing value in id_no, then use a where statement.

Astounding
PROC Star

It's relatively easy:

proc sort data=sample_table;
   by id_no;
run;

data want;
   update sample_table (obs=0) sample_table;
   by id_no;
run;

For each ID_NO, this technique will use the last non-missing value for each variable.  Note, however, that you may still lose data.  It is conceivable that a variable has two different non-missing values in different observations.

Sajid01
Meteorite | Level 14

Hello @danielchoi626 
This can be done as shown below. Modify as needed.
I have used your excel as the basis. In your  excel had amount as character. So I have added code to convert it to number.

/*Import your excel sheet */

PROC IMPORT DATAFILE='/home/yourfolder/SAMPLE DATA.xlsx'
	DBMS=XLSX
	OUT=WORK.IMPORT replace;
	GETNAMES=YES;
RUN;
/* Your amount was character.
I am converting  the amount to number. If in your original dataset amount is numeric, then this step not needed */
data import;
set import (Rename=(Amount=Amount_));
Amount=input(Amount_,best12.);
drop amount_;
run;

/* Remove duplicates*/
proc sql;
select distinct a.id_no, b.appl_no,b.Amount from import as a
left join import as b
on a.ID_no=b.ID_no
where b.amount is not missing
order by ID_no;
quit;

The  output will be as you wanted. The not null values from amount have been retained with no duplicates

Sajid01_0-1625588364899.png

Let me know if there are questions.

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1613 views
  • 1 like
  • 5 in conversation