- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The unique records can be identified with the any of the following
Approach 1
if first.var and last.var
Approach 2
if first.var=1 and last.var=1
Is there any difference between them in terms of efficiency and time?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
in my opinion both does the same evaluation so no.
other than that the only thing i can think of is during tokenisation sas processes 2 more tokens i.e =1 before it sends to the compiler for execution, which is very insignificant
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
in my opinion both does the same evaluation so no.
other than that the only thing i can think of is during tokenisation sas processes 2 more tokens i.e =1 before it sends to the compiler for execution, which is very insignificant
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try PROC SORT with the NODUPKEY option as well, since you often have to sort before using FIRST/LAST you can sometimes avoid extra steps which is more efficient.
@thesasuser wrote:
The unique records can be identified with the any of the following
Approach 1
if first.var and last.var
Approach 2
if first.var=1 and last.var=1
Is there any difference between them in terms of efficiency and time?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks.
This was an interview question. The interviewer informed that one of them is more efficient.
To me both looked identically efficient.
Wanted to know form the community if I am correct
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Well well well, does the interviewer think processing 2 more tokens makes it less efficient? Jeez!!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Interesting footnote to this question. Did you know that you can specify _null_ as an output file to proc sort? Before today, I didn't!
However, in trying to test the idea that the nouniquekey and uniqueout options might be faster than doing a sort and then using first. and last. in a datastep, I was bothered that I (thought I) had to create a dummy output file. Not so!
proc sort data=have out=_null_ nouniquekey noequals UNIQUEOUT=want;
by id;
run;
On a 1.9 million record file, with 19 unique ids, the above only took 0.49 seconds.
The sort alone of the initial file (needed, of course in order to use first. and last. processing, took 0.68 seconds.
Not that I've ever needed a file of records that don't have duplicates, but nice to know which would be the most efficient process.
Art, CEO, AnalystFinder.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I’m not fond of this, but sometimes people will ask obviously wrong questions to see how a candidate deals with this. People who get super defensive about a mistake or flustered are what they’re looking for in this case.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@novinosrin already answered your question, but there are still other ways of expressing the same thing. Other than using sort, they all run as efficiently as the others.
Like @Reeza mentioned, proc sort can also be used but, for what you want, in a slightly different manner than @Reeza mentioned:
data have;
set sashelp.class;
do i=1 to 100000;
if i eq 1 then id=_n_*100000;
else id=_n_*100000+1;
output;
end;
run;
data want;
set have;
by id;
if first.id and last.id;
run;
data want;
set have;
by id;
if first.id eq 1 and last.id eq 1;
run;
data want;
set have;
by id;
if first.id +last.id eq 2;
run;
data want;
set have;
by id;
if first.id *last.id;
run;
data want;
set have;
by id;
if first.id *last.id eq 1;
run;
proc sort data=have out=dontwant nouniquekey noequals UNIQUEOUT=want;
by id;
run;
And, as @Reeza mentioned, if the file has to be sorted first, it may actually end up being the fastest method.
Art, CEO, AnalystFinder.com