Write and run SAS programs in your web browser

Using two data sets to form another data set (no combine or merge)

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 10
Accepted Solution

Using two data sets to form another data set (no combine or merge)

[ Edited ]

Thank you very much for your thoughts in advance!

 

I use SAS 9.4. I have a data set with 17170 rows, and I want to create an array based on one of variables in this data set. Then, I want to use that array in another data set to filter its observations (i.e., more than 1 million observations). So, if any observation in the new data set is not among those 17170 different values, then that specific row has to be removed. 

 

Here is the code I used (where libsas refers to my local directory):

 

data libsas.new_data;
     set libsas.data1; /* data1 has 17170 rows */
     array schedule(*) X1; /* I want to form array "schedule" based on variable X1 in data1 */

     set libsas.data2; /* data2 is the second data set in which I want to filter observations */
     if X2 in schedule; /* X2 is the variable in data2 where I want to filter observations */
run;

 

However, what I get is a data with 17170 rows (i.e., the same size as my first data), while it should be much more that. Would you please help me to understand what's wrong? Thanks!


Accepted Solutions
Solution
‎07-13-2017 09:40 PM
PROC Star
Posts: 7,356

Re: Using two data sets to form another data set (no combine or merge)

You got that result because that is the way SAS works unless told otherwise. If you had read both datasets within separate DO UNTIL loops, you'd have gotten what you wanted. e.g., 

data data1;
  input x1;
  cards;
3
4
5
7
;

data data2;
  input x2 z;
  cards;
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
3 3
4 4
5 5
6 6
7 7
3 3
4 4
5 5
6 6
7 7
;

data new_data (drop=i x1);
  array schedule(20000) _temporary_; 
  do until (eof1);
    set data1 end=eof1;
    i+1;
    schedule(i)=x1;
  end;
  do until (eof2);
    set data2 end=eof2;
    if X2 in schedule then output;
  end;
run;

Art, CEO, AnalystFinder.com

 

View solution in original post


All Replies
Super User
Posts: 5,072

Re: Using two data sets to form another data set (no combine or merge)

Your array contains one element named X1.  It does not contain thousands of elements, just one.

 

SAS contains many ways to select matches.  Here is one:

 

proc sql;

create table libsas.new_data as select * from libsas.data2

where x2 in (select distinct x1 from libsas.data1);

quit;

Occasional Contributor
Posts: 10

Re: Using two data sets to form another data set (no combine or merge)

@Astounding Thank you for your feedback!

Solution
‎07-13-2017 09:40 PM
PROC Star
Posts: 7,356

Re: Using two data sets to form another data set (no combine or merge)

You got that result because that is the way SAS works unless told otherwise. If you had read both datasets within separate DO UNTIL loops, you'd have gotten what you wanted. e.g., 

data data1;
  input x1;
  cards;
3
4
5
7
;

data data2;
  input x2 z;
  cards;
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
3 3
4 4
5 5
6 6
7 7
3 3
4 4
5 5
6 6
7 7
;

data new_data (drop=i x1);
  array schedule(20000) _temporary_; 
  do until (eof1);
    set data1 end=eof1;
    i+1;
    schedule(i)=x1;
  end;
  do until (eof2);
    set data2 end=eof2;
    if X2 in schedule then output;
  end;
run;

Art, CEO, AnalystFinder.com

 

Occasional Contributor
Posts: 10

Re: Using two data sets to form another data set (no combine or merge)

@art297 Thank you very much!

Super User
Posts: 17,750

Re: Using two data sets to form another data set (no combine or merge)

SQL is the easiest way to solve this. See @Astounding solution. A SAS data step doesn't process data all at once, it reads a line at a time.But this means it doesn't have information on the records either ahead or below, though there are ways around that. But in general, in SAS, an array is simply a way to reference variables in the same row.
PROC Star
Posts: 7,356

Re: Using two data sets to form another data set (no combine or merge)

@Reeza@Alireza_Boloori's subject line indicated NO COMBINE of MERGE.

 

And I don't know if I'd say the SQL solution is easier but, in this case, should run about 100 times faster than the datastep approach.

 

Art, CEO, AnalystFinder.com

 

Occasional Contributor
Posts: 10

Re: Using two data sets to form another data set (no combine or merge)

@Reeza Thank you for your feedback! I was thinking the same about using SQL. I guess I have to start learning it.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 310 views
  • 1 like
  • 4 in conversation