DATA Step, Macro, Functions and more

Looping over a dataset to check values in another dataset

Accepted Solution Solved
Reply
Regular Contributor
Posts: 188
Accepted Solution

Looping over a dataset to check values in another dataset

[ Edited ]

I have a dataset like this:

Primary   col1   col2   
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2    

I want to loop over this dataset for each rowxcol combination and count all cases e.g. where (nex1>=1 and nex2>=1) in another dataset(data_y) which contains these nex1-nex3000 variables. Then move to second case(data_x) where (nex1>=1 and nex3>=1) until the end of row. Now go to the second row of and repeat, til the end of row. So data_x is just to get all the possible cases. I want to get the count for these cases from another dataset data_y

Data_y is like this:

nex1 nex2 nex3.....nex3000
1     0    1     3
0     0    0     0
3     1    0     1
1     2    1     0
0     0    1     0

So final dataset will be having counts for these cases(each case from data_x for rowxcol combination):

 col1   col2...
  2      2
  2      1
  2      1...        

so pseudo code is like this:

for each row in data_x:
     for each col in data_x:
          if &row_val>=1 and &col_val>=1 in data_y:
              select count(*)

Accepted Solutions
Solution
‎09-07-2016 04:55 AM
Super User
Posts: 9,681

Re: Looping over a dataset to check values in another dataset

Here is IML code. I believe it would be very fast.



data data_x;
input (Primary   col1   col2) ($);
cards;  
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2   
;
run;

data data_y;
input nex1 nex2 nex3;
cards;
1     0    1     
0     0    0     
3     1    0     
1     2    1     
0     0    1   
;
run;
proc iml;
use data_x nobs nobs_x;
read all var _all_;
close;
use data_y;
read all var _all_;
close;

want=j(nobs_x,2,.);
do i=1 to nobs_x;
 p=value(Primary[i]);
 temp_p=(p >= 1);
 
 c1=value(col1[i]);
 temp_c1=(c1 >= 1);
 want[i,1]=sum((temp_p+temp_c1)=2);

 c2=value(col2[i]);
 temp_c2=(c2 >= 1);
 want[i,2]=sum((temp_p+temp_c2)=2);
 
end;

create want from want;
append from want;
close;  
quit;


View solution in original post


All Replies
Super User
Posts: 10,500

Re: Looping over a dataset to check values in another dataset

To clarify: The Values you are showing are Varaible Names in another data set?

Will ANY of the values in your example data set every occur in different columns?

 

What will the interpretation of the total in Col1 be at the end? You say it is a crosstab. So Cross tab of what?

 

I think you better walk us through a much smaller example, say with 5 or 6 variables, show the result desired for each step. You kind of jumped from  a "row" result to a cross tab with explaining how to get there. And please provide example data of the result at each step that matches the example data provided for the small example.

Regular Contributor
Posts: 188

Re: Looping over a dataset to check values in another dataset

1. yes
2. Could not understand your question

3. its basically getting all the counts(from data_y) for primaryxcol: cases(looping over each row and col combination in data_x

I updated the question. Let me know, if you need further clarification
Super User
Posts: 17,829

Re: Looping over a dataset to check values in another dataset

Please provide a small worked example as indicated by @ballardw

We do not understand your requirements. 

Regular Contributor
Posts: 188

Re: Looping over a dataset to check values in another dataset

I updated the question, now the counts can be produced for final result using data_x and data_y
Super User
Posts: 17,829

Re: Looping over a dataset to check values in another dataset

How big are your datasets?

Super User
Posts: 17,829

Re: Looping over a dataset to check values in another dataset

If you want code post a worked example, specifically sample data as a data step, as requested previously. I'm not typing out data. As is, here's my algorithm.

 

1. Change data_y to a dataset that's 0/1, where 1 is representative of anything greater than 0. Data step using an array would work.

2. Use Proc Corr to create a distance matrix - https://gist.github.com/statgeek/a5184a4e1678d81e2643

3. Transpose output from 2 to a long format, Proc Transpose

4. Use result from 3 as a lookup to Table 2 via a sql merge. You could transpose data_x to a long and merge to get your results and then transpose back. Or you may be able to do a single sql step depending on how many columns is in data_x. Again you haven't provided the sizes of your dataset.

 

Good Luck. 

 

Super User
Posts: 10,500

Re: Looping over a dataset to check values in another dataset

This example data is why I asked about the duplication of values in different variables:

Primary   col1   col2   
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2    

Since nex2 appears in col1 And in col2 that means to get a count of nex2 there will need to be an indicator variable to hold that specific indication. Which will require either creating 3000 indicator variables or something to store which match is which because otherwise I cannot see how you wll combine the counts of col1 matches and col2 matches for the same variable.

 

And Having Nex2 in the "primary" isn'tgoing to help much. I believe that given your requirement to match all rows in the first set with the second that you will generate an overcount. For example you second and third row of the the first set will count nex1 twice from row 4 in data_y.

 

 

 

And I still cannot figur out what this means in relation to either of the data sets:

col1   col2...
  2      2
  2      1
  2      1...     

You are skipping at least one intermediate step a far as I can tell. SHOW the immediate result of the comparisons between  dataset 1 and data_y.

 

Respected Advisor
Posts: 3,777

Re: Looping over a dataset to check values in another dataset

You can use the number part of the variable name as an index into the array of NEX1-NEX3000.
Super User
Posts: 9,681

Re: Looping over a dataset to check values in another dataset

data data_x;
input (Primary   col1   col2) ($);
cards;  
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2   
;
run;

data data_y;
input nex1 nex2 nex3;
cards;
1     0    1     
0     0    0     
3     1    0     
1     2    1     
0     0    1   
;
run;
data _null_;
 set data_x end=last;
 if _n_=1 then call execute('proc sql;create table want as');
 call execute(cat('select 
 (select count(*) from data_y where ',Primary,' ge 1 and ',col1,' ge 1) as col1, 
 (select count(*) from data_y where ',Primary,' ge 1 and ',col2,' ge 1) as col2
 from data_y(obs=1)'));
 if last then call execute(';quit;');
  else call execute('union all');
run;

Regular Contributor
Posts: 188

Re: Looping over a dataset to check values in another dataset

Thanks Xia! But i am having a problem with this, it is throwing the error of maximum limitation of  256 tables processing. I think i will have some 1000x9= 9000 such cases. Is there any other method of processing this? Like creating loops of multiple of 256?

Super User
Posts: 9,681

Re: Looping over a dataset to check values in another dataset

Could you use my IML code ?



data data_x;
input (Primary   col1   col2) ($);
cards;  
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2   
;
run;

data data_y;
input nex1 nex2 nex3;
cards;
1     0    1     
0     0    0     
3     1    0     
1     2    1     
0     0    1   
;
run;
data _null_;
 set data_x ;
 call execute(cat('proc sql;create table want_',put(_n_,best.-l),' as select 
 (select count(*) from data_y where ',Primary,' ge 1 and ',col1,' ge 1) as col1, 
 (select count(*) from data_y where ',Primary,' ge 1 and ',col2,' ge 1) as col2
 from data_y(obs=1) quit;'));
run;

data want;
 set want_:;
run;


Super User
Posts: 9,681

Re: Looping over a dataset to check values in another dataset

Or this code could get you faster.



data data_x;
input (Primary   col1   col2) ($);
cards;  
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2   
;
run;

data data_y;
input nex1 nex2 nex3;
cards;
1     0    1     
0     0    0     
3     1    0     
1     2    1     
0     0    1   
;
run;
data _null_;
 set data_x end=last;
 if _n_=1 then call execute('proc sql;');
 call execute(cat('create table want_',put(_n_,best.-l),' as select 
 (select count(*) from data_y where ',Primary,' ge 1 and ',col1,' ge 1) as col1, 
 (select count(*) from data_y where ',Primary,' ge 1 and ',col2,' ge 1) as col2
 from data_y(obs=1);'));
 if last then call execute('quit;');
run;

data want;
 set want_: ;
run;

Super User
Posts: 9,681

Re: Looping over a dataset to check values in another dataset

I noticed a problem. the order of union data is not right.
Try this one :




data data_x;
input (Primary   col1   col2) ($);
cards;  
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2   
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2 
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2 
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2 
;
run;

data data_y;
input nex1 nex2 nex3;
cards;
1     0    1     
0     0    0     
3     1    0     
1     2    1     
0     0    1   
;
run;
data _null_;
 set data_x end=last;
 if _n_=1 then call execute('proc sql;');
 call execute(cat('create table want_',put(_n_,best.-l),' as select 
 (select count(*) from data_y where ',Primary,' ge 1 and ',col1,' ge 1) as col1, 
 (select count(*) from data_y where ',Primary,' ge 1 and ',col2,' ge 1) as col2
 from data_y(obs=1);'));
 if last then do;
  call execute('quit;');
  call symputx('n',_n_);
 end;
run;

data want;
 set want_1-want_&n ;
run;

Regular Contributor
Posts: 188

Re: Looping over a dataset to check values in another dataset

Does this overcome the limitation of 256 dataset processing? as it will be fetching counts from same dataset multiple times.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 17 replies
  • 646 views
  • 2 likes
  • 5 in conversation