hi ... I use it as shorthand for "if I had given that a few more moments, I would have understood"
from Google (the source of all knowledge in the known universe) ... "the moment you realize that you don't know something or are just learning something most everyone knows"
ps is "known universe" redundant
: I really liked your approach to solving this problem, but I just read a 2007 SGF paper by Jane Stroupe http://support.sas.com/rnd/papers/sgf07/arrays1780.pdf that taught me something I didn't know before tonight and, not only was it applicable to your suggested code, but makes the code run 16 percent faster. Since the OP is dealing with 40 million records, I thought he would be interested.
NOTE: Based on subsequent feedback from Ksharp, it was discovered that the temporary array doesn't reinitialize as it isn't reinitialized with each iteration of the implicit loop. As such, the code will only work correctly as modified, below, and the performance difference is reversed. However, I've elected not to delete the post as now find the new information even more important to know.
However, I thought everyone else would be equally interested, as it is a useful bit of information and, for me, took all the guess work out of what your code was doing.
When one declares an array for variables that don't exist in the pdv, they don't have to assign any dummy variable names. I took that one step further and declared the array as _temporary_ so that it doesn't even have to be dropped.
Thus the code I ended up with was:
/*Create some test data*/
data have;
infile datalines dsd truncover;
input person $ phone1-phone3;
datalines;
A,111,222,111
B,,444,444
C,111,,222,
;
run;
data want(drop=i j);
set have;
array _p{*} phone1-phone3 ;
array _pp{3} _temporary_;
j=0;
do i=1 to dim(_p);
if _p{i} not in _pp then do;j+1;_pp{j}=_p{i};end;
end;
do i=1 to dim(_p);
_p{i}=_pp{i};
end;
call missing (of _pp(*));
run;
Arthur,
Yes. You are right. Using temperary array is faster and better .
But I think you need call missing of them at the end of data step, because they are retained during the data step.
Ksharp
Hi to all
I'm really impressed how many quality answers you've given me.
Ksharp: You nailed it! That's exactly what I need.
Like Hai Kuo I wasn't aware that an array can be addressed in the way you've done it. So besides of solving my problem you've also taught me something very valuable 
I couldn't mark all posts as helpful and I ended up to select different approaches. I consider every single answer in this thread as really interesting and helpful and I'm a disappointed that I can't express this through my marking.
Thanks to all of you.
Patrick
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
