BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
novinosrin
Tourmaline | Level 20

Sir @PGStats   True. This is for that very purpose indeed. The simple HAVE and WANT below demo works exactly as intended, but when reading company data, it isn't. 😞

 

For example:


data have;
 input acctno;
 cards;
 8
 3
 8
 3
 10
 3
 6
 5
 3
 ;

data want;
 do until(z);
  set have end=z;
  array t(9) _temporary_;
  if acctno in t then continue;
  n+1;
  t(n)=acctno;
  output;
 end;
 drop n;
run;

proc print noobs;run;
acctno
8
3
10
6
5

 

Astounding
PROC Star

OK, here's an approach that might be feasible, might not.  The issues .....

 

It treats account number as numeric.  While you can easily work around that, the cost for that is not clear.

 

It utilizes the direct array lookup that @mkeintz mentioned.

 

It needs a lot more memory.  Whether you can get that much is not clear ... on the order of 400 times what you are using now for the temporary array.  The array expands to 11 digits, since the largest account number you illustrated is 11 digits long.  The memory requirements might be shrinkable, since it appears that you don't really need a length of $30 for PORT_NAME.

 

I took out the DO loop, since I didn't see a good purpose for keeping it.  Perhaps its purpose is to maintain a top-to-bottom structure for the DATA step logic.

 

If all this is acceptable, it should be lightning fast at loading the values, and should take a bit more time to unload them.

 


data w;
  array t {99999999999} $ 30 _temporary_;
  if z then do account_number = 1 to 99999999999;
     if t{account_number} > ' ' then do;
        port_name = t{account_number};
        output;
     end;
  end;   
  set aqr.rp_portfolio_&min-aqr.rp_portfolio_&max 
  (keep=account_number portfolio cra_flag period id_investor)
  end=z;
  where period>= 1439 and id_investor not in ('014', '015', '091', '092') or
  period<1439;
  if t{account_number} > ' ' then return;
  select;
   when (portfolio='Mortgage') t{account_number}='Treasury';
   when (portfolio='Resi' and cra_flag='Y') t{account_number}='CRA';
   when (portfolio in('Resi','HELOAN Low/No') and cra_flag='N') 
         t{account_number}='Resi (No CRA)';
   when (portfolio='Specialty') t{account_number}='Specialty';
   otherwise; 
end; keep account_number port_name; run;
Tom
Super User Tom
Super User

Is there some reason for using a cryptic test for missing character values?

charvar > ' '

Why not use something that is clearer?

not missing(charvar)
charvar ne ' '

Note that the cyptic test will fail if the string is not empty but starts with a character that is before space in the ASCII sequence. Like a TAB.

Astounding
PROC Star

No reason, that's just how I've always done it.  As a general rule, calling a function takes longer, but I haven't tested for this case.

 

However, this logic looks a little too complex:

 

if t{account_number} > ' ' then return;
  select;
   when (portfolio='Mortgage') t{account_number}='Treasury';
   when (portfolio='Resi' and cra_flag='Y') t{account_number}='CRA';
   when (portfolio in('Resi','HELOAN Low/No') and cra_flag='N') 
         t{account_number}='Resi (No CRA)';
   when (portfolio='Specialty') t{account_number}='Specialty';
   otherwise;
   end;

It should probably be simplified to remove the RETURN statement:

 

if t{account_number} = ' ' then select;
   when (portfolio='Mortgage') t{account_number}='Treasury';
   when (portfolio='Resi' and cra_flag='Y') t{account_number}='CRA';
   when (portfolio in('Resi','HELOAN Low/No') and cra_flag='N') 
         t{account_number}='Resi (No CRA)';
   when (portfolio='Specialty') t{account_number}='Specialty';
   otherwise;  
end;

Is that the spot in the program you are referring to?

Tom
Super User Tom
Super User

I was just asking if there was some special reason why that test would be better than the normal check for non blank value. Note using > will fail if the string starts with a character that is before space in the ASCII coding sequence.

 

Note I have also seen other posts were someone used 

numvar > .

As if it meant not missing.  Again that test will fail for special missing since .A to .Z are all large than normal missing value.  .Z is the largest missing value.

Astounding
PROC Star

I guess whatever is fasted is best ... in this particular case.  The program itself assigns all those character values, so I'm not worried about strange values that might appear in the data.  But I certainly have been guilty of that in some programs that I have written. 

 

It reminds me of discussions about the range low - high for user-defined character formats.  There are definitely cases where you need to be aware of values lower than a blank.

novinosrin
Tourmaline | Level 20

@Astounding  Elegance personified with amazing presence of mind. Touch class thinking. Salute!!

 

Merci*1e6

novinosrin
Tourmaline | Level 20

To all who participated in this thread, here are the conclusions I drew from this, please correct me if I am wrong:

 

1. A one pass look up in a list within the specifics would only be possible if we choose rather a linear approach than a peek in the list. In other words, Pick, park, scan across and pick again. 

2. Hash will do, however within the specifics there is a danger if memory happened to become insufficient, can be dangerous

3. WHICH group, CHOOSE group functions are all nice should the list be small, and won't be of great use  if the list gets tooooo wide.

4. The tendency in wanting  to get stuff done in one pass when source is NOT sorted/indexed is generally a challenge. 

5. Proc Sort is boring, nonethless boring can't be an excuse and sometimes multiple passes are the way forward??(Sighs!)

 

PS By no means the last or least. My sincere thank you to all of you for your time. That really means a lot more than anything. 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 22 replies
  • 1369 views
  • 4 likes
  • 7 in conversation