BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
thesasuser
Pyrite | Level 9

The unique records can be identified with the any of the following

Approach 1

if first.var and last.var

Approach 2

if first.var=1 and last.var=1

Is there any difference between them in terms of efficiency and time?

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

in my opinion both does the same evaluation so no.

 

other than that the only thing i can think of is during tokenisation sas processes 2 more tokens i.e =1 before it sends to the compiler for execution, which is very insignificant

View solution in original post

7 REPLIES 7
novinosrin
Tourmaline | Level 20

in my opinion both does the same evaluation so no.

 

other than that the only thing i can think of is during tokenisation sas processes 2 more tokens i.e =1 before it sends to the compiler for execution, which is very insignificant

Reeza
Super User

Try PROC SORT with the NODUPKEY option as well, since you often have to sort before using FIRST/LAST you can sometimes avoid extra steps which is more efficient. 

 


@thesasuser wrote:

The unique records can be identified with the any of the following

Approach 1

if first.var and last.var

Approach 2

if first.var=1 and last.var=1

Is there any difference between them in terms of efficiency and time?


 

thesasuser
Pyrite | Level 9

Thanks.

This was an interview question. The interviewer informed that one of them is more efficient.

To me both looked identically efficient. 
Wanted to know form the community if I am correct

novinosrin
Tourmaline | Level 20

Well well well, does the interviewer think processing 2 more tokens makes it less efficient? Jeez!!!

art297
Opal | Level 21

Interesting footnote to this question. Did you know that you can specify _null_ as an output file to proc sort? Before today, I didn't!

 

However, in trying to test the idea that the nouniquekey and uniqueout options might be faster than doing a sort and then using first. and last. in a datastep, I was bothered that I (thought I) had to create a dummy output file. Not so!

 

proc sort data=have out=_null_ nouniquekey noequals UNIQUEOUT=want;
  by id;
run;

On a 1.9 million record file, with 19 unique ids, the above only took 0.49 seconds.

 

The sort alone of the initial file (needed, of course in order to use first. and last. processing, took 0.68 seconds.

 

Not that I've ever needed a file of records that don't have duplicates, but nice to know which would be the most efficient process.

 

Art, CEO, AnalystFinder.com

 

 

Reeza
Super User

I’m not fond of this, but sometimes people will ask obviously wrong questions to see how a candidate deals with this. People who get super defensive about a mistake or flustered are what they’re looking for in this case. 

 

art297
Opal | Level 21

@novinosrin already answered your question, but there are still other ways of expressing the same thing. Other than using sort, they all run as efficiently as the others.

 

Like @Reeza mentioned, proc sort can also be used but, for what you want, in a slightly different manner than @Reeza mentioned:

data have;
  set sashelp.class;
  do i=1 to 100000;
    if i eq 1 then id=_n_*100000;
    else id=_n_*100000+1;
    output;
  end;
run;
data want;
  set have;
  by id;
  if first.id and last.id;
run;
data want;
  set have;
  by id;
  if first.id eq 1 and last.id eq 1;
run;

data want;
  set have;
  by id;
  if first.id +last.id eq 2;
run;

data want;
  set have;
  by id;
  if first.id *last.id;
run;

data want;
  set have;
  by id;
  if first.id *last.id eq 1;
run;

proc sort data=have out=dontwant nouniquekey noequals UNIQUEOUT=want;
  by id;
run;

And, as @Reeza mentioned, if the file has to be sorted first, it may actually end up being the fastest method.

 

Art, CEO, AnalystFinder.com

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 938 views
  • 2 likes
  • 4 in conversation