Solved: Looping over a dataset to check values in another dataset

munitech4u · Posted 09-06-2016 10:38 AM

I have a dataset like this:

Primary   col1   col2   
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2

I want to loop over this dataset for each rowxcol combination and count all cases e.g. where (nex1>=1 and nex2>=1) in another dataset(data_y) which contains these nex1-nex3000 variables. Then move to second case(data_x) where (nex1>=1 and nex3>=1) until the end of row. Now go to the second row of and repeat, til the end of row. So data_x is just to get all the possible cases. I want to get the count for these cases from another dataset data_y

Data_y is like this:

nex1 nex2 nex3.....nex3000
1     0    1     3
0     0    0     0
3     1    0     1
1     2    1     0
0     0    1     0

So final dataset will be having counts for these cases(each case from data_x for rowxcol combination):

 col1   col2...
  2      2
  2      1
  2      1...

so pseudo code is like this:

for each row in data_x:
     for each col in data_x:
          if &row_val>=1 and &col_val>=1 in data_y:
              select count(*)

Ksharp · Posted 09-06-2016 10:18 PM

Here is IML code. I believe it would be very fast.



data data_x;
input (Primary   col1   col2) ($);
cards;  
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2   
;
run;

data data_y;
input nex1 nex2 nex3;
cards;
1     0    1     
0     0    0     
3     1    0     
1     2    1     
0     0    1   
;
run;
proc iml;
use data_x nobs nobs_x;
read all var _all_;
close;
use data_y;
read all var _all_;
close;

want=j(nobs_x,2,.);
do i=1 to nobs_x;
 p=value(Primary[i]);
 temp_p=(p >= 1);
 
 c1=value(col1[i]);
 temp_c1=(c1 >= 1);
 want[i,1]=sum((temp_p+temp_c1)=2);

 c2=value(col2[i]);
 temp_c2=(c2 >= 1);
 want[i,2]=sum((temp_p+temp_c2)=2);
 
end;

create want from want;
append from want;
close;  
quit;

View solution in original post

ballardw · Posted 09-06-2016 11:26 AM

To clarify: The Values you are showing are Varaible Names in another data set?

Will ANY of the values in your example data set every occur in different columns?

What will the interpretation of the total in Col1 be at the end? You say it is a crosstab. So Cross tab of what?

I think you better walk us through a much smaller example, say with 5 or 6 variables, show the result desired for each step. You kind of jumped from a "row" result to a cross tab with explaining how to get there. And please provide example data of the result at each step that matches the example data provided for the small example.

munitech4u · Posted 09-06-2016 11:43 AM

1. yes
2. Could not understand your question

3. its basically getting all the counts(from data_y) for primaryxcol: cases(looping over each row and col combination in data_x

I updated the question. Let me know, if you need further clarification

Reeza · Posted 09-06-2016 12:51 PM

Please provide a small worked example as indicated by @ballardw

We do not understand your requirements.

munitech4u · Posted 09-06-2016 12:54 PM

I updated the question, now the counts can be produced for final result using data_x and data_y

Reeza · Posted 09-06-2016 01:52 PM

How big are your datasets?

Reeza · Posted 09-06-2016 08:10 PM

If you want code post a worked example, specifically sample data as a data step, as requested previously. I'm not typing out data. As is, here's my algorithm.

1. Change data_y to a dataset that's 0/1, where 1 is representative of anything greater than 0. Data step using an array would work.

2. Use Proc Corr to create a distance matrix - https://gist.github.com/statgeek/a5184a4e1678d81e2643

3. Transpose output from 2 to a long format, Proc Transpose

4. Use result from 3 as a lookup to Table 2 via a sql merge. You could transpose data_x to a long and merge to get your results and then transpose back. Or you may be able to do a single sql step depending on how many columns is in data_x. Again you haven't provided the sizes of your dataset.

Good Luck.

ballardw · Posted 09-06-2016 04:28 PM

This example data is why I asked about the duplication of values in different variables:

Primary   col1   col2   
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2

Since nex2 appears in col1 And in col2 that means to get a count of nex2 there will need to be an indicator variable to hold that specific indication. Which will require either creating 3000 indicator variables or something to store which match is which because otherwise I cannot see how you wll combine the counts of col1 matches and col2 matches for the same variable.

And Having Nex2 in the "primary" isn'tgoing to help much. I believe that given your requirement to match all rows in the first set with the second that you will generate an overcount. For example you second and third row of the the first set will count nex1 twice from row 4 in data_y.

And I still cannot figur out what this means in relation to either of the data sets:

col1   col2...
  2      2
  2      1
  2      1...

You are skipping at least one intermediate step a far as I can tell. SHOW the immediate result of the comparisons between dataset 1 and data_y.

data_null__ · Posted 09-06-2016 11:31 AM

You can use the number part of the variable name as an index into the array of NEX1-NEX3000.

Ksharp · Posted 09-06-2016 10:07 PM

data data_x;
input (Primary   col1   col2) ($);
cards;  
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2   
;
run;

data data_y;
input nex1 nex2 nex3;
cards;
1     0    1     
0     0    0     
3     1    0     
1     2    1     
0     0    1   
;
run;
data _null_;
 set data_x end=last;
 if _n_=1 then call execute('proc sql;create table want as');
 call execute(cat('select 
 (select count(*) from data_y where ',Primary,' ge 1 and ',col1,' ge 1) as col1, 
 (select count(*) from data_y where ',Primary,' ge 1 and ',col2,' ge 1) as col2
 from data_y(obs=1)'));
 if last then call execute(';quit;');
  else call execute('union all');
run;

munitech4u · Posted 09-07-2016 01:35 AM

Thanks Xia! But i am having a problem with this, it is throwing the error of maximum limitation of 256 tables processing. I think i will have some 1000x9= 9000 such cases. Is there any other method of processing this? Like creating loops of multiple of 256?

Ksharp · Posted 09-07-2016 02:36 AM

Could you use my IML code ?



data data_x;
input (Primary   col1   col2) ($);
cards;  
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2   
;
run;

data data_y;
input nex1 nex2 nex3;
cards;
1     0    1     
0     0    0     
3     1    0     
1     2    1     
0     0    1   
;
run;
data _null_;
 set data_x ;
 call execute(cat('proc sql;create table want_',put(_n_,best.-l),' as select 
 (select count(*) from data_y where ',Primary,' ge 1 and ',col1,' ge 1) as col1, 
 (select count(*) from data_y where ',Primary,' ge 1 and ',col2,' ge 1) as col2
 from data_y(obs=1) quit;'));
run;

data want;
 set want_:;
run;

Ksharp · Posted 09-07-2016 02:42 AM

Or this code could get you faster.



data data_x;
input (Primary   col1   col2) ($);
cards;  
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2   
;
run;

data data_y;
input nex1 nex2 nex3;
cards;
1     0    1     
0     0    0     
3     1    0     
1     2    1     
0     0    1   
;
run;
data _null_;
 set data_x end=last;
 if _n_=1 then call execute('proc sql;');
 call execute(cat('create table want_',put(_n_,best.-l),' as select 
 (select count(*) from data_y where ',Primary,' ge 1 and ',col1,' ge 1) as col1, 
 (select count(*) from data_y where ',Primary,' ge 1 and ',col2,' ge 1) as col2
 from data_y(obs=1);'));
 if last then call execute('quit;');
run;

data want;
 set want_: ;
run;

Ksharp · Posted 09-07-2016 02:50 AM

I noticed a problem. the order of union data is not right.
Try this one :




data data_x;
input (Primary   col1   col2) ($);
cards;  
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2   
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2 
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2 
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2 
;
run;

data data_y;
input nex1 nex2 nex3;
cards;
1     0    1     
0     0    0     
3     1    0     
1     2    1     
0     0    1   
;
run;
data _null_;
 set data_x end=last;
 if _n_=1 then call execute('proc sql;');
 call execute(cat('create table want_',put(_n_,best.-l),' as select 
 (select count(*) from data_y where ',Primary,' ge 1 and ',col1,' ge 1) as col1, 
 (select count(*) from data_y where ',Primary,' ge 1 and ',col2,' ge 1) as col2
 from data_y(obs=1);'));
 if last then do;
  call execute('quit;');
  call symputx('n',_n_);
 end;
run;

data want;
 set want_1-want_&n ;
run;

munitech4u · Posted 09-07-2016 04:08 AM

Does this overcome the limitation of 256 dataset processing? as it will be fetching counts from same dataset multiple times.

Looping over a dataset to check values in another dataset

Re: Looping over a dataset to check values in another dataset

Re: Looping over a dataset to check values in another dataset

Re: Looping over a dataset to check values in another dataset

Re: Looping over a dataset to check values in another dataset

Re: Looping over a dataset to check values in another dataset

Re: Looping over a dataset to check values in another dataset

Re: Looping over a dataset to check values in another dataset

Re: Looping over a dataset to check values in another dataset

Re: Looping over a dataset to check values in another dataset

Re: Looping over a dataset to check values in another dataset

Re: Looping over a dataset to check values in another dataset

Re: Looping over a dataset to check values in another dataset

Re: Looping over a dataset to check values in another dataset

Re: Looping over a dataset to check values in another dataset

Re: Looping over a dataset to check values in another dataset

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away