BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
osi814
Obsidian | Level 7

Hello, I need to remove duplicates in a SAS dataset but don't know how to go about it. The point of the dataset is to have all CBSAs in the nation listed in the first column. Additional columns contain every CBSA which borders the first one. The problem is that the original CBSA is also always listed as a bordering one, and I need to remove it. The other problem is that the duplicate doesn't always appear in the same column. For example, I have:

 

CBSA          CBSA_BORD1          CBSA_BORD2          CBSA_BORD3

10140          10140                        16500                         36500

10180          10180                        15220                         45020   

28300          14180                        27860                         28300

28500          23240                        28500                         41700

 

The bold values are the duplicates, i.e. the ones I want to remove from the dataset. The problem is that I don't know which column the duplicate will necessarily be in since it changes row to row. Is there a way to search for duplicate values across rows and remove them? What I want the final dataset to look like is:

 

CBSA          CBSA_BORD1          CBSA_BORD2         

10140          16500                        36500

10180          15220                        45020   

28300          14180                        27860                         

28500          23240                        41700

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

 

 

It is generally better to post test data in the form of a datastep.  You can use arrays on this:

data want;
  cbsa=10140; cbsa_bord1=10140; cbsa_bord2=16500;  cbsa_bord3=36500;
  array cbsa_bord{3};
  do i=1 to 3;
    if cbsa_bord{i}=cbsa then cbsa_bord{i}=.;
  end;
run;

View solution in original post

2 REPLIES 2
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

 

 

It is generally better to post test data in the form of a datastep.  You can use arrays on this:

data want;
  cbsa=10140; cbsa_bord1=10140; cbsa_bord2=16500;  cbsa_bord3=36500;
  array cbsa_bord{3};
  do i=1 to 3;
    if cbsa_bord{i}=cbsa then cbsa_bord{i}=.;
  end;
run;
osi814
Obsidian | Level 7

Perfect, don't know why I didn't think of that. Thanks so much!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 1170 views
  • 0 likes
  • 2 in conversation