- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello
My raw data set have 30,000,000 rows (30 million).
I run this simple code and get error:
ERROR: Index TEMPINDX cannot be created on file WORK.'SASTMP-000000417'n because the length of the index value (26448 bytes) is too
large. The index page size (currently 32767) must be large enough to store three index values plus a small overhead
structure. Reduce the length of the index value or use the IBUFSIZE= option to increase the index page size (up to a maximum
of 32,767 bytes).
ERROR: :Unable to create temporary index while processing summary functions.
what is the way to solve it?
proc sql;
create table want as
select Shop_name,
count(distinct shop_ID) as nr_Dist
from tbl
group by Shop_name
having nr_Dist>=2
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please go back and change your subject to something meaningful that actually describes the problem. "Error" could apply to virtually any coding post.
Please provide the ENTIRE log for the step that has the error; we do not want to see partial logs. Do not make us ask for this, provide the ENTIRE log for the step with the error, without us asking in the future.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please show the length of variable shop_name.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So the error is saying that PROC SQL cannot use an INDEX to figure out how many distinct values of SHOP_ID there are because the LENGTH of the variable SHOP_IP is larger than one third of 32,767 (which is 10,922).
The most obvious solution is to define SHOP_ID with a length smaller than 10,922.
What the heck are you going to do with SHOP_ID values longer than that anyway?
The work around if there is no way to actually reduce the length of the variable is to just count the distinct values yourself instead of asking PROC SQL to do it.
proc sort data=tbl(keep=shop_name shop_id) out=want nodupkey;
by shop_name shop_id;
run;
data want;
do nr_dist=1 by 1 until(last.shop_name);
set want;
by shop_name;
end;
if nr_dist > 1;
drop shop_id;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Run this step on your input dataset and post the results:
proc contents data = tbl;
run;