Did you check my code?
options compress=yes; data want(keep=pan1 pan2 pan3 add1 household); declare hash ha(hashexp : 20,ordered : 'a'); declare hiter hi('ha'); ha.definekey('count'); ha.definedata('count','pan1','pan2','pan3','add1'); ha.definedone(); declare hash _ha(hashexp: 20,ordered : 'a'); _ha.definekey('key'); _ha.definedata('_household'); _ha.definedone(); do until(last); set test end=last; /*Remove missing obs firstly*/ if cmiss(pan1,pan2,pan3,add1) lt 4 then do; count+1; ha.add(); end; end; length key $ 40; array h{4} $ 40 pan1 pan2 pan3 add1; _rc=hi.first(); do while(_rc eq 0); household+1;_household=household; do i=1 to 4; if not missing(h{i}) then do; key=h{i}; _ha.replace();end; end; do until(x=1); x=1; rc=hi.first(); do while(rc=0); found=0; do j=1 to 4; key=h{j};rcc=_ha.check(); if rcc =0 then found=1; end; if found then do; do k=1 to 4; if not missing(h{k}) then do; key=h{k};_ha.replace();end; end; output;x=0; _count=count; end; rc=hi.next(); if found then rx=ha.remove(key : _count); end; end; _rc=hi.first(); end; run;
Ksharp
Ksharp
i have checked this code on 40000 it was running only after 7 mins i have stopped it...
You never did let me know if my code ran on your system. Plus I don't recall your ever mentioning which version of SAS you're using, on which operating system, with which processor, and with how much RAM.
Art i have cheked your code also it was also runned ofr 7 mins on 80000 data and i have stopped it
i am using 9.2,using WIndows and runnig on server and my system is having 2gb if we run any job it will directly
execute in the server only..
You probably want to pursue DLing's method since your system seems undersuited for hash objects of the size you are attempting to use. My tests I posted earlier only used 100MB of ram executing so unless you are storing very large text strings you maybe using too much swap space which can hinder performance. Try running with options fullstimer. Post the statistics from your datastep. Also check your options for the memory and performance groups.
Ksharp stores the entire dataset in a hash table, everything is in memory.
My approach stores only the unique values in a hash table, but will read the input dataset multiple times. It should use less memory, but will take longer.
Yes, my assumption is that sas_Forum is experiencing the poor performance because he is either not allowing or does not have enough memory to store the full hash table in memory and is instead performing a high number of page swaps and context switches.
DLing.
The OP 's problem is speed not executable.
So if read obs from real dataset ,it will spend lots of time. Push the whole table into Hash Table is the best way I to promote speed I think.
And I am curious that It spends only five seconds at my old laptop for one hundred thousand obs.
I do not know why it wasts so lots of time for op's data.
Ksharp
Given that the OP only has 2 gig of memory, I can easily see how anything competing for that memory could quickly use it all up. Of course, there is always the chance that he is also importing the data across a slow network.
But art once you have given a code i ran that code for 8 lacks it has taken only 30-50 secs but that code will is satisfying only for pan1 only the pan1 obs are not going in to anoher household id but the remaining are going to anohter id.This is the u have given to me
data want(keep=pan1 add1 pan2 pan3 household);
if _n_ eq 1 then do;
declare hash ha(hashexp: 16);
ha.definekey('key');
ha.definedata('hhold');
ha.definedone();
end;
set test;
array _house{*} $ 40 pan1--pan3;
do i=1 to dim(_house);
key=_house{i};
call missing(hhold);
rc=ha.find();
if rc=0 then do;
found=1;
household=hhold;
leave;
end;
end;
if not found then do;
n+1;
household=n;
end;
do j=1 to dim(_house);
if not missing(_house{j}) then do;
key=_house{j};
hhold=household;
ha.replace();
end;
end;
run;
SAS Forum, I never did understand why you indicated that only one variable was being accounted for by that code. The code, which was very close to code that KSharp had posted in a different thread, addresses all of the variables AS LONG AS THE VARIABLES REALLY ARE IN THE ORDER INDICATED IN THE ARRAY STATEMENT.
That particular version didn't account for reassigning households discovered via an iterative analysis of the data, but any such analysis would be totally irrelevant if the wrong data is being analyzed. Would you please post the output of a proc contents on your data file?
HI ART,
I am sending the proc contents of my dataset .
The CONTENTS Procedure
Data Set Name WORK.HAVE Observations 16
Member Type DATA Variables 4
Engine V9 Indexes 0
Created Monday, September 19, 2011 02:46:09 PM Observation
Length 160
Last Modified Monday, September 19, 2011 02:46:09 PM Deleted Observations 0
Protection Compressed NO
Data Set Type Sorted NO
Label
Data Representation HP_UX_64, RS_6000_AIX_64, SOLARIS_64, HP_IA64
Encoding latin1 Western (ISO)
Engine/Host Dependent Information
Data Set Page Size 16384
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 102
Obs in First Data Page 16
Number of Data Set Repairs 0
Filename /sas/saswork/SAS_workF33E000/have.sas7bdat
Release Created 9.0202M3
Host Created SunOS
Inode Number 94223
Access Permission rw-r--r--
Owner Name sasuser
File Size (bytes) 24576
Alphabetic List of Variables and Attributes
# Variable Type Len
2 add1 Char 40
1 pan1 Char 40
3 pan2 Char 40
4 pan3 Char 40
This does not appear to be a contents of your actual data but rather a very small example probably taken from the data provided in this topic. Can you please do contents of your real data or one of the sample sets you have been using for testing. Also, it appears that your executing enviornment is solaris 64-bit, not windows. You say the server has at least 200GB space, I assume you mean disk space, I am more concerned with memory space. I will also ask again that you provide the output of
proc options group=memory; run;
proc options group=performance; run;
If you do not know about the memory space try running the following:
filename ram pipe "/usr/sbin/prtconf | grep Memory | awk 'print $3, $4'";
data _null_;
infile ram;
input size unit $;
put 'Memory Size: ' size unit;
run;
I agree with FriedEgg! If you only have 16 records you're comments about 70+ thousand don't make sense. The variables were in the order I had hoped for, but that doesn't mean that they are in that order in your real dataset which is the one I was hoping to see the proc contents run against.
The CONTENTS Procedure
Data Set Name WORK.HAVE Observations 79000
Member Type DATA Variables 4
Engine V9 Indexes 0
Created wenesday, September 21, 2011 09:48:10 AM Observation
Length 160
Last Modified wenesday, September 21, 2011 09:48:10 AM Deleted Observations 0
Protection Compressed NO
Data Set Type Sorted NO
Label
Data Representation HP_UX_64, RS_6000_AIX_64, SOLARIS_64, HP_IA64
Encoding latin1 Western (ISO)
Engine/Host Dependent Information
Data Set Page Size 16384
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 102
Obs in First Data Page 16
Number of Data Set Repairs 0
Filename /sas/saswork/SAS_workF33E000/have.sas7bdat
Release Created 9.0202M3
Host Created SunOS
Inode Number 94223
Access Permission rw-r--r--
Owner Name sasuser
File Size (bytes) 1224576
Alphabetic List of Variables and Attributes
# Variable Type Len
2 add1 Char 40
1 pan1 Char 40
3 pan2 Char 40
4 pan3 Char 40
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.