BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
fengyuwuzu
Pyrite | Level 9
proc sql;
create table new_sample as
    select a.*
	    from total_sample as a,
		     selected_3000
	    where a.ID in (select ID from selected_3000 );
quit;

 

Total_sample has 79 million data rows, from 79,000 unique IDs. File size 90G.

selected_3000 only has 3000 rows with 3000 unique IDs.

 

Now I want to select those whose IDs are in selected_3000 from the total_sample, using the above proc sql code.

However, it generated a huge file >200G and I had to terminate the procedure. I checked the output huge file and

found the same row was repeated so many times.

 

What could be the problem in this  proc sql code?

1 ACCEPTED SOLUTION

Accepted Solutions
set_all__
Fluorite | Level 6

 

Try this:

proc sql;
create table new_sample as
    select *
	    from total_sample
	    where a.ID in (select ID from selected_3000 );
quit;

You don't need to have selected_3000 in the main query and also in the subquery. Since you had it in the main query, it was creating a cartesian product. It was returning 3000 times as many rows as you needed.

 

View solution in original post

1 REPLY 1
set_all__
Fluorite | Level 6

 

Try this:

proc sql;
create table new_sample as
    select *
	    from total_sample
	    where a.ID in (select ID from selected_3000 );
quit;

You don't need to have selected_3000 in the main query and also in the subquery. Since you had it in the main query, it was creating a cartesian product. It was returning 3000 times as many rows as you needed.

 

Catch up on SAS Innovate 2026

Dive into keynotes, announcements and breakthroughs on demand.

Explore Now →
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 1575 views
  • 0 likes
  • 2 in conversation