data test;
input Empname $ ;
datalines;
ram
sita
ram
arjun
ram
sita
;
run;
Interview asked me a question without sorting how to remove duplicates
using above dataset scenario he said donot chage order of empnames but remove duplicates only datastep method
@BrahmanandaRao wrote:
data test; input Empname $ ; datalines; ram sita ram arjun ram sita ; run;
Interview asked me a question without sorting how to remove duplicates
using above dataset scenario he said donot chage order of empnames but remove duplicates only datastep method
In general a HASH (or some other method of remembering what values you have seen before) will do this.
data want;
if _n_=1 then do;
declare hash h();
h.definekey('empname');
h.definedone();
end;
set test ;
if h.find() then do;
output;
h.add();
end;
run;
But if the data is too large then HASH will not work (HASH needs to be in memory) as would any other DATA step only method. In which case sorting is probably your best method. Either directly using PROC SORT or implicitly using PROC SQL code. Just add a new variable to record the original order so it can be recreated.
data temp;
row+1;
set test;
run;
proc sql ;
create table want as
select empname
from temp
group by empname
having row=min(row)
order by row
;
quit;
You can do that in a single pass using the hash object like this
data test;
input Empname $ ;
datalines;
ram
sita
ram
arjun
ram
sita
;
run;
data want;
if _N_ = 1 then do;
dcl hash h();
h.definekey("Empname");
h.definedone();
end;
set test;
if h.add() = 0;
run;
You can just get a count of each name and only output where count is 1. Don't have to use hash object if you don't want to.
data test;
input Empname $ ;
datalines;
ram
sita
ram
arjun
ram
sita
;
run;
proc sort data=test;
by empname;
run;
*get a count of each name;
data nodups;
set test;
by empname;
if first.empname then count=0;
count+1;
*only output where count=1. if count >1 then it's a duplicate;
if count=1 then output;
run;
proc print data=nodups;
run;
That would require sorting?
data test;
input Empname $ ;
datalines;
ram
sita
ram
arjun
ram
sita
;
run;
data want;
set test;
array x{999} $ _temporary_;
if Empname not in x then do;n+1;x{n}=Empname;output;end;
drop n;
run;
That's an interesting questions for an interview. Any other good questions they asked you?
A lot of times interviewers have asked what is the difference between nodup and nodupkey.
It is so interesting .
1 proc sort data=sashelp.class out=x nodup; 2 by sex; 3 run; NOTE: There were 19 observations read from the data set SASHELP.CLASS. NOTE: 0 duplicate observations were deleted. NOTE: The data set WORK.X has 19 observations and 5 variables. NOTE: PROCEDURE SORT used (Total process time): real time 0.22 seconds cpu time 0.03 seconds 4 5 proc sort data=sashelp.class out=y nodupkey; 6 by sex; 7 run; NOTE: There were 19 observations read from the data set SASHELP.CLASS. NOTE: 17 observations with duplicate key values were deleted. NOTE: The data set WORK.Y has 2 observations and 5 variables. NOTE: PROCEDURE SORT used (Total process time): real time 0.03 seconds cpu time 0.01 seconds
It seems that "nodup" equals "noduprecs" .
Have other interesting question more ?
@Ksharp wrote:
Have other interesting question more ?
Many years ago, I was asked in an interview "In the macro language, what are the differences between a keyword parameter vs positional parameter, and when/why would you use one rather than the other." I thought it was a good open-ended question. Can reveal not only someone's understanding of the rules of the macro language, but also how they think about it's use, design issues, users, etc.
Sometimes I ask a similar question about SQL vs DATA step. Also ask "Do you prefer long narrow datasets or short wide datasets, why?"
When I interview, I tend to be less interested in knowledge of specific SAS features, and more interested in how they think about / understand the SAS language(s). Of course it varies by role.
I had a mentor who told me once that when he interviews people for entry-level SAS roles, the most important criterion he uses to judge them is not whether they can answer his tougher questions, it's whether they get excited when they hear the answers, and ask follow-up questions during the interview so that they can learn more.
Agree, it's a hard one. I got it wrong during the interview, but they still hired me. : )
I think I said "keyword parameters are clearer, because in the call you can see the parameter name and the value." My future boss pointed out that keyword parameters allow default values, which is a big design difference for macro developers. I don't think I knew about default values when he interviewed me.
So it's an open-ended question which allowed him to assess my understanding of the macro language, and also started a discussion about macro programming, which included chatting about positional vs keyword parameters from both the perspective of macro developer and macro user.
As a developer, I use keyword parameters about 99% of the time. One thing I like about positional parameters is that as a user, I can still choose to pass values to a positional parameter in a keyword style. I often wish SAS functions had that flexibility. I have to look up the parameter order for tranwrd just about every time I use it. : )
Your boss is lucky . Now you are absolutely sas expert . I think your boss made right decision to pick you up .
P.S. One thing I like keyword parameters more is you change the order of macro parameters and don't worry to pass the wrong macro parameter, and also you can delete some useless keyword parameters when you invoke a macro .
But for positional parameter, you have to obey the order of macro parameters and is unable to miss any one of them .
Thanks for your kind words. This was a boss about 20 years ago. I was very lucky to work him. Your boss is lucky too! (assuming you have a boss. : )
Agree, if there are more than one or two parameters, I find trying to remember the order of positional parameters too hard.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.