BookmarkSubscribeRSS Feed
Ksharp
Super User

Did you check my code?

options compress=yes;
data want(keep=pan1 pan2 pan3 add1 household);
 declare hash ha(hashexp : 20,ordered : 'a');
 declare hiter hi('ha');
  ha.definekey('count');
  ha.definedata('count','pan1','pan2','pan3','add1');
  ha.definedone();
 declare hash _ha(hashexp: 20,ordered : 'a');
  _ha.definekey('key');
  _ha.definedata('_household');
  _ha.definedone();

do until(last);
 set test end=last; /*Remove missing obs firstly*/
 if cmiss(pan1,pan2,pan3,add1) lt 4 then do;
                                           count+1;
                                           ha.add();
                                         end;
end;

length key $ 40;
array h{4} $ 40 pan1 pan2 pan3 add1;
_rc=hi.first();
do while(_rc eq 0);
 household+1;_household=household;
 do i=1 to 4;
   if not missing(h{i}) then do; key=h{i}; _ha.replace();end;
 end;   
 do until(x=1);
    x=1;
    rc=hi.first();
    do while(rc=0);
      found=0;
      do j=1 to 4;
      key=h{j};rcc=_ha.check();
      if rcc =0 then found=1;
      end;
      if found then do;
                      do k=1 to 4;
                      if not missing(h{k}) then do; key=h{k};_ha.replace();end;
                      end;
                      output;x=0; _count=count; 
                    end;
      rc=hi.next();
      if found then rx=ha.remove(key : _count);
     end;
 end;  
_rc=hi.first();
end; 
run;

Ksharp

sas_Forum
Calcite | Level 5

Ksharp

i have checked this code on 40000 it was running only after 7 mins i have stopped it...

art297
Opal | Level 21

You never did let me know if my code ran on your system.  Plus I don't recall your ever mentioning which version of SAS you're using, on which operating system, with which processor, and with how much RAM.

sas_Forum
Calcite | Level 5

Art i have cheked your code also it was also runned ofr 7 mins on 80000 data and i have stopped it

i am using 9.2,using WIndows and runnig on server and my system is having 2gb if we run any job it will directly

execute in the server only..

FriedEgg
SAS Employee

You probably want to pursue DLing's method since your system seems undersuited for hash objects of the size you are attempting to use.  My tests I posted earlier only used 100MB of ram executing so unless you are storing very large text strings you maybe using too much swap space which can hinder performance.  Try running with options fullstimer.  Post the statistics from your datastep.  Also check your options for the memory and performance groups.

DLing
Obsidian | Level 7

Ksharp stores the entire dataset in a hash table, everything is in memory.

My approach stores only the unique values in a hash table, but will read the input dataset multiple times.  It should use less memory, but will take longer.

FriedEgg
SAS Employee

Yes, my assumption is that sas_Forum is experiencing the poor performance because he is either not allowing or does not have enough memory to store the full hash table in memory and is instead performing a high number of page swaps and context switches.

Ksharp
Super User

DLing.

The OP 's problem is speed not executable.

So if read obs from real dataset ,it will spend lots of time. Push the whole table into Hash Table is the best way I to promote speed I think.

And I am curious that It spends only five seconds at my old laptop for one hundred thousand obs.

I do not know why it wasts so lots of time for op's data.

Ksharp

art297
Opal | Level 21

Given that the OP only has 2 gig of memory, I can easily see how anything competing for that memory could quickly use it all up.  Of course, there is always the chance that he is also importing the data across a slow network.

sas_Forum
Calcite | Level 5

But art once you have given a code i ran that code for 8 lacks it has taken only 30-50 secs but that code will is satisfying only for pan1 only the pan1 obs are not going in to anoher household id but the remaining are going to anohter id.This is the u have given to me

data want(keep=pan1 add1 pan2 pan3  household);

  if _n_ eq 1 then do;

    declare hash ha(hashexp: 16);

    ha.definekey('key');

    ha.definedata('hhold');

    ha.definedone();

  end;

  set test;

  array _house{*} $ 40 pan1--pan3;

  do i=1 to dim(_house);

    key=_house{i};

    call missing(hhold);

    rc=ha.find();

    if rc=0 then do;

      found=1;

      household=hhold;

      leave;

   end;

  end;

  if not found then do;

    n+1;

    household=n;

  end;

  do j=1 to dim(_house);

    if not missing(_house{j}) then do;

     key=_house{j};

      hhold=household;

      ha.replace();

    end;

  end;

run;

art297
Opal | Level 21

SAS Forum,  I never did understand why you indicated that only one variable was being accounted for by that code.  The code, which was very close to code that KSharp had posted in a different thread, addresses all of the variables AS LONG AS THE VARIABLES REALLY ARE IN THE ORDER INDICATED IN THE ARRAY STATEMENT.

That particular version didn't account for reassigning households discovered via an iterative analysis of the data, but any such analysis would be totally irrelevant if the wrong data is being analyzed.  Would you please post the output of a proc contents on your data file?

sas_Forum
Calcite | Level 5

HI ART,

I am sending the proc contents of my dataset .


                                                       The CONTENTS Procedure

                  Data Set Name        WORK.HAVE                                        Observations          16
                  Member Type          DATA                                             Variables             4 
                  Engine               V9                                               Indexes               0 
                  Created              Monday, September 19, 2011 02:46:09 PM           Observation
                                                                                        Length    160
                  Last Modified        Monday, September 19, 2011 02:46:09 PM           Deleted Observations  0 
                  Protection                                                            Compressed            NO
                  Data Set Type                                                         Sorted                NO
                  Label                                                                            
                  Data Representation  HP_UX_64, RS_6000_AIX_64, SOLARIS_64, HP_IA64               
                  Encoding             latin1  Western (ISO)                                       
            


                                                 Engine/Host Dependent Information

         Data Set Page Size          16384                                                                                
         Number of Data Set Pages    1                                                             
         First Data Page             1                                                             
         Max Obs per Page            102                                                           
         Obs in First Data Page      16                                                            
         Number of Data Set Repairs  0                                                             
         Filename                    /sas/saswork/SAS_workF33E000/have.sas7bdat
         Release Created             9.0202M3                                                      
         Host Created                SunOS                                                         
         Inode Number                94223                                                        
         Access Permission           rw-r--r--                                                     
         Owner Name                  sasuser                                                    
          File Size (bytes)           24576                                                         
                      


                                             Alphabetic List of Variables and Attributes

                                                    #    Variable    Type    Len

                                                    2    add1        Char     40
                                                    1    pan1        Char     40
                                                    3    pan2        Char     40
                                                    4    pan3        Char     40

FriedEgg
SAS Employee

This does not appear to be a contents of your actual data but rather a very small example probably taken from the data provided in this topic.  Can you please do contents of your real data or one of the sample sets you have been using for testing.  Also, it appears that your executing enviornment is solaris 64-bit, not windows.  You say the server has at least 200GB space, I assume you mean disk space, I am more concerned with memory space.  I will also ask again that you provide the output of

proc options group=memory; run;

proc options group=performance; run; 

If you do not know about the memory space try running the following:

filename ram pipe "/usr/sbin/prtconf | grep Memory | awk 'print $3, $4'";

data _null_;

  infile ram;

  input size unit $;

  put 'Memory Size: ' size unit;

run;

art297
Opal | Level 21

I agree with FriedEgg!  If you only have 16 records you're comments about 70+ thousand don't make sense. The variables were in the order I had hoped for, but that doesn't mean that they are in that order in your real dataset which is the one I was hoping to see the proc contents run against.

sas_Forum
Calcite | Level 5

     The CONTENTS Procedure

                  Data Set Name        WORK.HAVE                                        Observations          79000
                  Member Type          DATA                                             Variables             4 
                  Engine               V9                                               Indexes               0 
                  Created              wenesday, September 21, 2011 09:48:10 AM           Observation
                                                                                        Length    160
                  Last Modified        wenesday, September 21, 2011 09:48:10 AM           Deleted Observations  0 
                  Protection                                                            Compressed            NO
                  Data Set Type                                                         Sorted                NO
                  Label                                                                            
                  Data Representation  HP_UX_64, RS_6000_AIX_64, SOLARIS_64, HP_IA64               
                  Encoding             latin1  Western (ISO)                                       
            


                                                 Engine/Host Dependent Information

         Data Set Page Size          16384                                                                                
         Number of Data Set Pages    1                                                             
         First Data Page             1                                                             
         Max Obs per Page            102                                                           
         Obs in First Data Page      16                                                            
         Number of Data Set Repairs  0                                                             
         Filename                    /sas/saswork/SAS_workF33E000/have.sas7bdat
         Release Created             9.0202M3                                                      
         Host Created                SunOS                                                         
         Inode Number                94223                                                        
         Access Permission           rw-r--r--                                                     
         Owner Name                  sasuser                                                    
          File Size (bytes)           1224576                                                         
                      


                                             Alphabetic List of Variables and Attributes


                                                    #    Variable    Type    Len

                                                    2    add1        Char     40
                                                    1    pan1        Char     40
                                                    3    pan2        Char     40
                                                    4    pan3        Char     40

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 124 replies
  • 2739 views
  • 4 likes
  • 7 in conversation