08-26-2016 11:31 AM
Hi, I just tried using the code from: https://communities.sas.com/t5/SAS-Procedures/Fuzzy-match-using-a-string-variable-between-two-large-.... The code is:
/*reduce dcandh dataset to one record per BankName*/
proc sort data=dcandh(keep=bankname) out=banks nodupkey;
by BankName;
run;
/* get number of records in bank info dataset */
data _null_;
if 0 then set bankinfo nobs=nobs;
call symput('numrec',nobs);
stop;
run;
/* Build format dataset: read bank names into array and use Compged */
/* function to find closest matches for the customer bank names */
data fmtDataset (keep=fmtname start label type);
retain fmtname '$banks' type 'C';
array bank(&numrec) $57;
do i=1 to &numrec; /*load the array with bank names*/
set bankinfo;
bank(i)=BankName;
end;
do until (eof); /* read banks names from summarized customer file*/
set banks (rename=(BankName=start)) end=eof;
if length(start) le 4 then label=start; /*ignore if acronym*/
else do;
lowscore=5000;
do i=1 to &numrec; /*find lowest generalized edit distance*/
score= compged(start,bank(i));
if score le lowscore then do;
lowscore=score;
closest=i; /*keep index with lowest value*/
end;
end;
label=bank(closest);
end;
output;
end;
run;
proc format cntlin=fmtDataset;
run;
I'm using dataset dcandh of 2,000 records versus dataset bankinfo of 6,000,000 records, but after 8 hours of procesing got the message: An error ocurred executing the workspace job "Ejecución".The server is disconnected. I had an empty log as the image below. I'm using SAS EG 5.1 on windows 7 Professional.
anybody has a solution for that?
Thanks in advanced..
P.S.: got the same message using cartesian product in PROC SQL and using SPEDIS function.
08-26-2016 08:32 PM
The most likely cause of your disconnection is a timeout limit on connections to your SAS server. Check with your SAS server administrator to confirm what limits may be set.
08-26-2016 08:32 PM
The most likely cause of your disconnection is a timeout limit on connections to your SAS server. Check with your SAS server administrator to confirm what limits may be set.
08-29-2016 04:52 PM
Thank you for your reply. The SAS administrator is checking that.
One more question, what is the total length of an array? can I create an array with 6 millions of elements?
08-29-2016 04:57 PM
From online documentation:
Starting with SAS 9.1, the maximum number of variables can be greater than 32,767. The maximum number depends on your environment and the file's attributes. For example, the maximum number of variables depends on the total length of all the variables and cannot exceed the maximum page size.
So you need to know the maximum page size your system supports and the sizes of the variables you are attempting to create.
Likely you are running out of memory.
08-29-2016 06:39 PM
Hi ballardw for your reply.
Where can I find my page size in my system?
Need further help from the community? Please ask a new question.