Giving one Unique No

Ksharp · Posted 09-15-2011 04:57 AM

Did you check my code?

options compress=yes;
data want(keep=pan1 pan2 pan3 add1 household);
 declare hash ha(hashexp : 20,ordered : 'a');
 declare hiter hi('ha');
  ha.definekey('count');
  ha.definedata('count','pan1','pan2','pan3','add1');
  ha.definedone();
 declare hash _ha(hashexp: 20,ordered : 'a');
  _ha.definekey('key');
  _ha.definedata('_household');
  _ha.definedone();

do until(last);
 set test end=last; /*Remove missing obs firstly*/
 if cmiss(pan1,pan2,pan3,add1) lt 4 then do;
                                           count+1;
                                           ha.add();
                                         end;
end;

length key $ 40;
array h{4} $ 40 pan1 pan2 pan3 add1;
_rc=hi.first();
do while(_rc eq 0);
 household+1;_household=household;
 do i=1 to 4;
   if not missing(h{i}) then do; key=h{i}; _ha.replace();end;
 end;   
 do until(x=1);
    x=1;
    rc=hi.first();
    do while(rc=0);
      found=0;
      do j=1 to 4;
      key=h{j};rcc=_ha.check();
      if rcc =0 then found=1;
      end;
      if found then do;
                      do k=1 to 4;
                      if not missing(h{k}) then do; key=h{k};_ha.replace();end;
                      end;
                      output;x=0; _count=count; 
                    end;
      rc=hi.next();
      if found then rx=ha.remove(key : _count);
     end;
 end;  
_rc=hi.first();
end; 
run;

Ksharp

sas_Forum · Posted 09-15-2011 05:13 AM

Ksharp

i have checked this code on 40000 it was running only after 7 mins i have stopped it...

art297 · Posted 09-15-2011 08:29 AM

You never did let me know if my code ran on your system. Plus I don't recall your ever mentioning which version of SAS you're using, on which operating system, with which processor, and with how much RAM.

sas_Forum · Posted 09-15-2011 09:08 AM

Art i have cheked your code also it was also runned ofr 7 mins on 80000 data and i have stopped it

i am using 9.2,using WIndows and runnig on server and my system is having 2gb if we run any job it will directly

execute in the server only..

FriedEgg · Posted 09-15-2011 10:58 AM

You probably want to pursue DLing's method since your system seems undersuited for hash objects of the size you are attempting to use. My tests I posted earlier only used 100MB of ram executing so unless you are storing very large text strings you maybe using too much swap space which can hinder performance. Try running with options fullstimer. Post the statistics from your datastep. Also check your options for the memory and performance groups.

DLing · Posted 09-15-2011 11:02 AM

Ksharp stores the entire dataset in a hash table, everything is in memory.

My approach stores only the unique values in a hash table, but will read the input dataset multiple times. It should use less memory, but will take longer.

FriedEgg · Posted 09-15-2011 11:31 AM

Yes, my assumption is that sas_Forum is experiencing the poor performance because he is either not allowing or does not have enough memory to store the full hash table in memory and is instead performing a high number of page swaps and context switches.

Ksharp · Posted 09-15-2011 09:57 PM

DLing.

The OP 's problem is speed not executable.

So if read obs from real dataset ,it will spend lots of time. Push the whole table into Hash Table is the best way I to promote speed I think.

And I am curious that It spends only five seconds at my old laptop for one hundred thousand obs.

I do not know why it wasts so lots of time for op's data.

Ksharp

art297 · Posted 09-15-2011 10:49 PM

Given that the OP only has 2 gig of memory, I can easily see how anything competing for that memory could quickly use it all up. Of course, there is always the chance that he is also importing the data across a slow network.

sas_Forum · Posted 09-16-2011 05:21 AM

But art once you have given a code i ran that code for 8 lacks it has taken only 30-50 secs but that code will is satisfying only for pan1 only the pan1 obs are not going in to anoher household id but the remaining are going to anohter id.This is the u have given to me

data want(keep=pan1 add1 pan2 pan3 household);

if _n_ eq 1 then do;

declare hash ha(hashexp: 16);

ha.definekey('key');

ha.definedata('hhold');

ha.definedone();

end;

set test;

array _house{*} $ 40 pan1--pan3;

do i=1 to dim(_house);

key=_house{i};

call missing(hhold);

rc=ha.find();

if rc=0 then do;

found=1;

household=hhold;

leave;

end;

if not found then do;

n+1;

household=n;

end;

do j=1 to dim(_house);

if not missing(_house{j}) then do;

key=_house{j};

hhold=household;

ha.replace();

end;

run;

art297 · Posted 09-16-2011 05:58 PM

SAS Forum, I never did understand why you indicated that only one variable was being accounted for by that code. The code, which was very close to code that KSharp had posted in a different thread, addresses all of the variables AS LONG AS THE VARIABLES REALLY ARE IN THE ORDER INDICATED IN THE ARRAY STATEMENT.

That particular version didn't account for reassigning households discovered via an iterative analysis of the data, but any such analysis would be totally irrelevant if the wrong data is being analyzed. Would you please post the output of a proc contents on your data file?

sas_Forum · Posted 09-19-2011 05:20 AM

HI ART,

I am sending the proc contents of my dataset .

The CONTENTS Procedure

                  Data Set Name        WORK.HAVE                                        Observations          16
                  Member Type          DATA                                             Variables             4
                  Engine               V9                                               Indexes               0
                  Created              Monday, September 19, 2011 02:46:09 PM           Observation
                                                                                        Length    160
                  Last Modified        Monday, September 19, 2011 02:46:09 PM           Deleted Observations 0
                  Protection                                                            Compressed            NO
                  Data Set Type                                                         Sorted                NO
                  Label
                  Data Representation HP_UX_64, RS_6000_AIX_64, SOLARIS_64, HP_IA64
                  Encoding             latin1 Western (ISO)

Engine/Host Dependent Information

         Data Set Page Size          16384
         Number of Data Set Pages    1
         First Data Page             1
         Max Obs per Page            102
         Obs in First Data Page      16
         Number of Data Set Repairs 0
         Filename                    /sas/saswork/SAS_workF33E000/have.sas7bdat
         Release Created             9.0202M3
         Host Created                SunOS
         Inode Number                94223
         Access Permission           rw-r--r--
         Owner Name                  sasuser
          File Size (bytes)           24576

Alphabetic List of Variables and Attributes

# Variable Type Len

                                                    2    add1        Char     40
                                                    1    pan1        Char     40
                                                    3    pan2        Char     40
                                                    4    pan3        Char     40

FriedEgg · Posted 09-19-2011 05:51 PM

This does not appear to be a contents of your actual data but rather a very small example probably taken from the data provided in this topic. Can you please do contents of your real data or one of the sample sets you have been using for testing. Also, it appears that your executing enviornment is solaris 64-bit, not windows. You say the server has at least 200GB space, I assume you mean disk space, I am more concerned with memory space. I will also ask again that you provide the output of

proc options group=memory; run;

proc options group=performance; run;

If you do not know about the memory space try running the following:

filename ram pipe "/usr/sbin/prtconf | grep Memory | awk 'print $3, $4'";

data _null_;

infile ram;

input size unit $;

put 'Memory Size: ' size unit;

run;

art297 · Posted 09-19-2011 06:01 PM

I agree with FriedEgg! If you only have 16 records you're comments about 70+ thousand don't make sense. The variables were in the order I had hoped for, but that doesn't mean that they are in that order in your real dataset which is the one I was hoping to see the proc contents run against.

sas_Forum · Posted 09-21-2011 02:28 AM

The CONTENTS Procedure

                  Data Set Name        WORK.HAVE                                        Observations          79000
                  Member Type          DATA                                             Variables             4
                  Engine               V9                                               Indexes               0
                  Created              wenesday, September 21, 2011 09:48:10 AM           Observation
                                                                                        Length    160
                  Last Modified        wenesday, September 21, 2011 09:48:10 AM           Deleted Observations 0
                  Protection                                                            Compressed            NO
                  Data Set Type                                                         Sorted                NO
                  Label
                  Data Representation HP_UX_64, RS_6000_AIX_64, SOLARIS_64, HP_IA64
                  Encoding             latin1 Western (ISO)

Engine/Host Dependent Information

         Data Set Page Size          16384
         Number of Data Set Pages    1
         First Data Page             1
         Max Obs per Page            102
         Obs in First Data Page      16
         Number of Data Set Repairs 0
         Filename                    /sas/saswork/SAS_workF33E000/have.sas7bdat
         Release Created             9.0202M3
         Host Created                SunOS
         Inode Number                94223
         Access Permission           rw-r--r--
         Owner Name                  sasuser
          File Size (bytes)           1224576

Alphabetic List of Variables and Attributes

# Variable Type Len

                                                    2    add1        Char     40
                                                    1    pan1        Char     40
                                                    3    pan2        Char     40
                                                    4    pan3        Char     40

Re: Giving one Unique No

Re: Giving one Unique No

Re: Giving one Unique No