Anything wrong with this simple proc sql code?

fengyuwuzu — Thu, 01 Sep 2016 16:07:44 GMT

proc sql;
create table new_sample as
    select a.*
	    from total_sample as a,
		     selected_3000
	    where a.ID in (select ID from selected_3000 );
quit;

Total_sample has 79 million data rows, from 79,000 unique IDs. File size 90G.

selected_3000 only has 3000 rows with 3000 unique IDs.

Now I want to select those whose IDs are in selected_3000 from the total_sample, using the above proc sql code.

However, it generated a huge file >200G and I had to terminate the procedure. I checked the output huge file and

found the same row was repeated so many times.

What could be the problem in this proc sql code?

Re: Anything wrong with this simple proc sql code?

set_all__ — Thu, 01 Sep 2016 16:23:50 GMT

Try this:

proc sql;
create table new_sample as
    select *
	    from total_sample
	    where a.ID in (select ID from selected_3000 );
quit;

You don't need to have selected_3000 in the main query and also in the subquery. Since you had it in the main query, it was creating a cartesian product. It was returning 3000 times as many rows as you needed.

topic Anything wrong with this simple proc sql code? in SAS Programming

Anything wrong with this simple proc sql code?

Re: Anything wrong with this simple proc sql code?