BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BrahmanandaRao
Lapis Lazuli | Level 10
data test;
input Empname $ ;
datalines;
ram
sita
ram
arjun
ram
sita
;
run;

Interview asked me a question without sorting how to remove duplicates 

using above dataset scenario he said donot chage order of empnames but remove duplicates only datastep method

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@BrahmanandaRao wrote:
data test;
input Empname $ ;
datalines;
ram
sita
ram
arjun
ram
sita
;
run;

Interview asked me a question without sorting how to remove duplicates 

using above dataset scenario he said donot chage order of empnames but remove duplicates only datastep method

 


In general a HASH (or some other method of remembering what values you have seen before) will do this.

data want;
  if _n_=1 then do;
   declare hash h();
   h.definekey('empname');
   h.definedone();
  end;
  set test ;
  if h.find() then do;
    output;
    h.add();
  end;
run;

But if the data is too large then HASH will not work (HASH needs to be in memory) as would any other DATA step only method.  In which case sorting is probably your best method. Either directly using PROC SORT or implicitly using PROC SQL code.  Just add a new variable to record the original order so it can be recreated.

data temp;
  row+1;
  set test;
run;
proc sql ;
create table want as
  select empname
  from temp
  group by empname
  having row=min(row)
  order by row
;
quit;

 

View solution in original post

16 REPLIES 16
PeterClemmensen
Tourmaline | Level 20

You can do that in a single pass using the hash object like this

 

data test;
input Empname $ ;
datalines;
ram
sita
ram
arjun
ram
sita
;
run;

data want;
   if _N_ = 1 then do;
      dcl hash h();
      h.definekey("Empname");
      h.definedone();
   end;

   set test;

   if h.add() = 0;
run;
BrahmanandaRao
Lapis Lazuli | Level 10
Hi PeterClemmensen
Thank you for your solution
tarheel13
Rhodochrosite | Level 12

You can just get a count of each name and only output where count is 1. Don't have to use hash object if you don't want to.

data test;
input Empname $ ;
datalines;
ram
sita
ram
arjun
ram
sita
;
run;

proc sort data=test;
	by empname;
run;

*get a count of each name;
data nodups;
	set test;
	by empname;
	if first.empname then count=0;
	count+1;
	*only output where count=1. if count >1 then it's a duplicate;
	if count=1 then output;
run;

proc print data=nodups;
run;
PeterClemmensen
Tourmaline | Level 20

That would require sorting?

Ksharp
Super User
data test;
input Empname $ ;
datalines;
ram
sita
ram
arjun
ram
sita
;
run;
data want;
 set test;
 array x{999} $ _temporary_;
 if Empname not in x then do;n+1;x{n}=Empname;output;end;
 drop n;
run;
BrahmanandaRao
Lapis Lazuli | Level 10
Hi Sharp
Thank you for your solution
Quentin
Super User

That's an interesting questions for an interview.  Any other good questions they asked you?

The Boston Area SAS Users Group is hosting free webinars!
Next up: Lisa Mendez & Richann Watson present Get Tipsy with Debugging Tips for SAS® Code: The After Party on Wednesday Jul 16.
Register now at https://www.basug.org/events.
tarheel13
Rhodochrosite | Level 12

A lot of times interviewers have asked what is the difference between nodup and nodupkey.

Ksharp
Super User

It is so interesting .

 

 

1    proc sort data=sashelp.class out=x nodup;
2    by sex;
3    run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: 0 duplicate observations were deleted.
NOTE: The data set WORK.X has 19 observations and 5 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.22 seconds
      cpu time            0.03 seconds


4
5    proc sort data=sashelp.class out=y nodupkey;
6    by sex;
7    run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: 17 observations with duplicate key values were deleted.
NOTE: The data set WORK.Y has 2 observations and 5 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.03 seconds
      cpu time            0.01 seconds



It seems that "nodup" equals "noduprecs"  .

 

Have other interesting question more ?

 

Quentin
Super User

@Ksharp wrote:

 

Have other interesting question more ?

 


Many years ago, I was asked in an interview "In the macro language, what are the differences between a keyword parameter vs positional parameter, and when/why would you use one rather than the other." I thought it was a good open-ended question. Can reveal not only someone's understanding of the rules of the macro language, but also how they think about it's use, design issues, users, etc.

 

Sometimes I ask a similar question about SQL vs DATA step.   Also ask "Do you prefer long narrow datasets or short wide datasets, why?"

 

When I interview, I tend to be less interested in knowledge of specific SAS features, and more interested in how they think about / understand the SAS language(s). Of course it varies by role.

 

I had a mentor who told me once that when he interviews people for entry-level SAS roles, the most important criterion he uses to judge them is not whether they can answer his tougher questions, it's whether they get excited when they hear the answers, and ask follow-up questions during the interview so that they can learn more.

The Boston Area SAS Users Group is hosting free webinars!
Next up: Lisa Mendez & Richann Watson present Get Tipsy with Debugging Tips for SAS® Code: The After Party on Wednesday Jul 16.
Register now at https://www.basug.org/events.
Ksharp
Super User
Quentin,
"In the macro language, what are the differences between a keyword parameter vs positional parameter, and when/why would you use one rather than the other."

That question is tough. I have no answer. But for me , I prefer keyword ,not position .
Quentin
Super User

Agree, it's a hard one.  I got it wrong during the interview, but they still hired me. : )

 

I think I said "keyword parameters are clearer, because in the call you can see the parameter name and the value."  My future boss pointed out that keyword parameters allow default values, which is a big design difference for macro developers.  I don't think I knew about default values when he interviewed me.

 

So it's an open-ended question which allowed him to assess my understanding of the macro language, and also started a discussion about macro programming, which included chatting about positional vs keyword parameters from both the perspective of macro developer and macro user.

 

As a developer, I use keyword parameters about 99% of the time.  One thing I like about positional parameters is that as a user, I can still choose to pass values to a positional parameter in a keyword style.  I often wish SAS functions had that flexibility.  I have to look up the parameter order for tranwrd just about every time I use it.  : )

The Boston Area SAS Users Group is hosting free webinars!
Next up: Lisa Mendez & Richann Watson present Get Tipsy with Debugging Tips for SAS® Code: The After Party on Wednesday Jul 16.
Register now at https://www.basug.org/events.
Ksharp
Super User

Your boss is lucky . Now you are absolutely sas expert . I think your boss made right decision to pick you up .

 

 

P.S. One thing I like keyword parameters more is you change the order of macro parameters and don't worry to pass the wrong macro parameter, and also you can delete some useless keyword parameters when you invoke a macro .

 

But for positional parameter, you have to obey the order of macro parameters and is unable to miss any one of them .

Quentin
Super User

Thanks for your kind words.  This was a boss about 20 years ago.  I was very lucky to work him.  Your boss is lucky too! (assuming you have a boss. : )

 

Agree, if there are more than one or two parameters, I find trying to remember the order of positional parameters too hard.

The Boston Area SAS Users Group is hosting free webinars!
Next up: Lisa Mendez & Richann Watson present Get Tipsy with Debugging Tips for SAS® Code: The After Party on Wednesday Jul 16.
Register now at https://www.basug.org/events.

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 16 replies
  • 2963 views
  • 17 likes
  • 6 in conversation