SAS Programming

thesasuser · Posted 05-20-2018 03:05 PM

The unique records can be identified with the any of the following

Approach 1

if first.var and last.var

Approach 2

if first.var=1 and last.var=1

Is there any difference between them in terms of efficiency and time?

novinosrin · Posted 05-20-2018 03:52 PM

in my opinion both does the same evaluation so no.

other than that the only thing i can think of is during tokenisation sas processes 2 more tokens i.e =1 before it sends to the compiler for execution, which is very insignificant

View solution in original post

novinosrin · Posted 05-20-2018 03:52 PM

in my opinion both does the same evaluation so no.

other than that the only thing i can think of is during tokenisation sas processes 2 more tokens i.e =1 before it sends to the compiler for execution, which is very insignificant

Reeza · Posted 05-20-2018 05:18 PM

Try PROC SORT with the NODUPKEY option as well, since you often have to sort before using FIRST/LAST you can sometimes avoid extra steps which is more efficient.

@thesasuser wrote:

The unique records can be identified with the any of the following

Approach 1

if first.var and last.var

Approach 2

if first.var=1 and last.var=1

Is there any difference between them in terms of efficiency and time?

thesasuser · Posted 05-20-2018 05:23 PM

Thanks.

This was an interview question. The interviewer informed that one of them is more efficient.

To me both looked identically efficient.
Wanted to know form the community if I am correct

novinosrin · Posted 05-20-2018 05:25 PM

Well well well, does the interviewer think processing 2 more tokens makes it less efficient? Jeez!!!

art297 · Posted 05-20-2018 06:19 PM

Interesting footnote to this question. Did you know that you can specify _null_ as an output file to proc sort? Before today, I didn't!

However, in trying to test the idea that the nouniquekey and uniqueout options might be faster than doing a sort and then using first. and last. in a datastep, I was bothered that I (thought I) had to create a dummy output file. Not so!

proc sort data=have out=_null_ nouniquekey noequals UNIQUEOUT=want;
  by id;
run;

On a 1.9 million record file, with 19 unique ids, the above only took 0.49 seconds.

The sort alone of the initial file (needed, of course in order to use first. and last. processing, took 0.68 seconds.

Not that I've ever needed a file of records that don't have duplicates, but nice to know which would be the most efficient process.

Art, CEO, AnalystFinder.com

Reeza · Posted 05-20-2018 06:19 PM

I’m not fond of this, but sometimes people will ask obviously wrong questions to see how a candidate deals with this. People who get super defensive about a mistake or flustered are what they’re looking for in this case.

art297 · Posted 05-20-2018 05:22 PM

@novinosrin already answered your question, but there are still other ways of expressing the same thing. Other than using sort, they all run as efficiently as the others.

Like @Reeza mentioned, proc sort can also be used but, for what you want, in a slightly different manner than @Reeza mentioned:

data have;
  set sashelp.class;
  do i=1 to 100000;
    if i eq 1 then id=_n_*100000;
    else id=_n_*100000+1;
    output;
  end;
run;
data want;
  set have;
  by id;
  if first.id and last.id;
run;
data want;
  set have;
  by id;
  if first.id eq 1 and last.id eq 1;
run;

data want;
  set have;
  by id;
  if first.id +last.id eq 2;
run;

data want;
  set have;
  by id;
  if first.id *last.id;
run;

data want;
  set have;
  by id;
  if first.id *last.id eq 1;
run;

proc sort data=have out=dontwant nouniquekey noequals UNIQUEOUT=want;
  by id;
run;

And, as @Reeza mentioned, if the file has to be sorted first, it may actually end up being the fastest method.

Art, CEO, AnalystFinder.com

SAS Programming

Efficient way to identify unique records

Re: Efficient way to identify unique records

Re: Efficient way to identify unique records

Re: Efficient way to identify unique records

Re: Efficient way to identify unique records

Re: Efficient way to identify unique records

Re: Efficient way to identify unique records

Re: Efficient way to identify unique records

Re: Efficient way to identify unique records

Identifying if a record changed during a certain time period

Let Copilot be your guide you to efficient coding

Keep only unique rows with unique column entries

Best way to transpose data with duplicate unique identifiers

ModelOps: Why Data Matters – Building Efficient Foundations Q&A, Slide...

Follow Us

What is...

SAS Programming

Join us for our biggest event of the year!

SAS Training: Just a Click Away

Follow Us

What is...