Thank you very much for your thoughts in advance!
I use SAS 9.4. I have a data set with 17170 rows, and I want to create an array based on one of variables in this data set. Then, I want to use that array in another data set to filter its observations (i.e., more than 1 million observations). So, if any observation in the new data set is not among those 17170 different values, then that specific row has to be removed.
Here is the code I used (where libsas refers to my local directory):
data libsas.new_data;
set libsas.data1; /* data1 has 17170 rows */
array schedule(*) X1; /* I want to form array "schedule" based on variable X1 in data1 */
set libsas.data2; /* data2 is the second data set in which I want to filter observations */
if X2 in schedule; /* X2 is the variable in data2 where I want to filter observations */
run;
However, what I get is a data with 17170 rows (i.e., the same size as my first data), while it should be much more that. Would you please help me to understand what's wrong? Thanks!
You got that result because that is the way SAS works unless told otherwise. If you had read both datasets within separate DO UNTIL loops, you'd have gotten what you wanted. e.g.,
data data1; input x1; cards; 3 4 5 7 ; data data2; input x2 z; cards; 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 3 3 4 4 5 5 6 6 7 7 3 3 4 4 5 5 6 6 7 7 ; data new_data (drop=i x1); array schedule(20000) _temporary_; do until (eof1); set data1 end=eof1; i+1; schedule(i)=x1; end; do until (eof2); set data2 end=eof2; if X2 in schedule then output; end; run;
Art, CEO, AnalystFinder.com
Your array contains one element named X1. It does not contain thousands of elements, just one.
SAS contains many ways to select matches. Here is one:
proc sql;
create table libsas.new_data as select * from libsas.data2
where x2 in (select distinct x1 from libsas.data1);
quit;
@Astounding Thank you for your feedback!
You got that result because that is the way SAS works unless told otherwise. If you had read both datasets within separate DO UNTIL loops, you'd have gotten what you wanted. e.g.,
data data1; input x1; cards; 3 4 5 7 ; data data2; input x2 z; cards; 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 3 3 4 4 5 5 6 6 7 7 3 3 4 4 5 5 6 6 7 7 ; data new_data (drop=i x1); array schedule(20000) _temporary_; do until (eof1); set data1 end=eof1; i+1; schedule(i)=x1; end; do until (eof2); set data2 end=eof2; if X2 in schedule then output; end; run;
Art, CEO, AnalystFinder.com
@art297 Thank you very much!
@Reeza: @Alireza_Boloori's subject line indicated NO COMBINE of MERGE.
And I don't know if I'd say the SQL solution is easier but, in this case, should run about 100 times faster than the datastep approach.
Art, CEO, AnalystFinder.com
@Reeza Thank you for your feedback! I was thinking the same about using SQL. I guess I have to start learning it.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.