BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
munitech4u
Quartz | Level 8

I have a dataset like this:

Primary   col1   col2   
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2    

I want to loop over this dataset for each rowxcol combination and count all cases e.g. where (nex1>=1 and nex2>=1) in another dataset(data_y) which contains these nex1-nex3000 variables. Then move to second case(data_x) where (nex1>=1 and nex3>=1) until the end of row. Now go to the second row of and repeat, til the end of row. So data_x is just to get all the possible cases. I want to get the count for these cases from another dataset data_y

Data_y is like this:

nex1 nex2 nex3.....nex3000
1     0    1     3
0     0    0     0
3     1    0     1
1     2    1     0
0     0    1     0

So final dataset will be having counts for these cases(each case from data_x for rowxcol combination):

 col1   col2...
  2      2
  2      1
  2      1...        

so pseudo code is like this:

for each row in data_x:
     for each col in data_x:
          if &row_val>=1 and &col_val>=1 in data_y:
              select count(*)
1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
Here is IML code. I believe it would be very fast.



data data_x;
input (Primary   col1   col2) ($);
cards;  
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2   
;
run;

data data_y;
input nex1 nex2 nex3;
cards;
1     0    1     
0     0    0     
3     1    0     
1     2    1     
0     0    1   
;
run;
proc iml;
use data_x nobs nobs_x;
read all var _all_;
close;
use data_y;
read all var _all_;
close;

want=j(nobs_x,2,.);
do i=1 to nobs_x;
 p=value(Primary[i]);
 temp_p=(p >= 1);
 
 c1=value(col1[i]);
 temp_c1=(c1 >= 1);
 want[i,1]=sum((temp_p+temp_c1)=2);

 c2=value(col2[i]);
 temp_c2=(c2 >= 1);
 want[i,2]=sum((temp_p+temp_c2)=2);
 
end;

create want from want;
append from want;
close;  
quit;


View solution in original post

17 REPLIES 17
ballardw
Super User

To clarify: The Values you are showing are Varaible Names in another data set?

Will ANY of the values in your example data set every occur in different columns?

 

What will the interpretation of the total in Col1 be at the end? You say it is a crosstab. So Cross tab of what?

 

I think you better walk us through a much smaller example, say with 5 or 6 variables, show the result desired for each step. You kind of jumped from  a "row" result to a cross tab with explaining how to get there. And please provide example data of the result at each step that matches the example data provided for the small example.

munitech4u
Quartz | Level 8
1. yes
2. Could not understand your question

3. its basically getting all the counts(from data_y) for primaryxcol: cases(looping over each row and col combination in data_x

I updated the question. Let me know, if you need further clarification
Reeza
Super User

Please provide a small worked example as indicated by @ballardw

We do not understand your requirements. 

munitech4u
Quartz | Level 8
I updated the question, now the counts can be produced for final result using data_x and data_y
Reeza
Super User

How big are your datasets?

Reeza
Super User

If you want code post a worked example, specifically sample data as a data step, as requested previously. I'm not typing out data. As is, here's my algorithm.

 

1. Change data_y to a dataset that's 0/1, where 1 is representative of anything greater than 0. Data step using an array would work.

2. Use Proc Corr to create a distance matrix - https://gist.github.com/statgeek/a5184a4e1678d81e2643

3. Transpose output from 2 to a long format, Proc Transpose

4. Use result from 3 as a lookup to Table 2 via a sql merge. You could transpose data_x to a long and merge to get your results and then transpose back. Or you may be able to do a single sql step depending on how many columns is in data_x. Again you haven't provided the sizes of your dataset.

 

Good Luck. 

 

ballardw
Super User

This example data is why I asked about the duplication of values in different variables:

Primary   col1   col2   
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2    

Since nex2 appears in col1 And in col2 that means to get a count of nex2 there will need to be an indicator variable to hold that specific indication. Which will require either creating 3000 indicator variables or something to store which match is which because otherwise I cannot see how you wll combine the counts of col1 matches and col2 matches for the same variable.

 

And Having Nex2 in the "primary" isn'tgoing to help much. I believe that given your requirement to match all rows in the first set with the second that you will generate an overcount. For example you second and third row of the the first set will count nex1 twice from row 4 in data_y.

 

 

 

And I still cannot figur out what this means in relation to either of the data sets:

col1   col2...
  2      2
  2      1
  2      1...     

You are skipping at least one intermediate step a far as I can tell. SHOW the immediate result of the comparisons between  dataset 1 and data_y.

 

data_null__
Jade | Level 19
You can use the number part of the variable name as an index into the array of NEX1-NEX3000.
Ksharp
Super User
data data_x;
input (Primary   col1   col2) ($);
cards;  
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2   
;
run;

data data_y;
input nex1 nex2 nex3;
cards;
1     0    1     
0     0    0     
3     1    0     
1     2    1     
0     0    1   
;
run;
data _null_;
 set data_x end=last;
 if _n_=1 then call execute('proc sql;create table want as');
 call execute(cat('select 
 (select count(*) from data_y where ',Primary,' ge 1 and ',col1,' ge 1) as col1, 
 (select count(*) from data_y where ',Primary,' ge 1 and ',col2,' ge 1) as col2
 from data_y(obs=1)'));
 if last then call execute(';quit;');
  else call execute('union all');
run;

munitech4u
Quartz | Level 8

Thanks Xia! But i am having a problem with this, it is throwing the error of maximum limitation of  256 tables processing. I think i will have some 1000x9= 9000 such cases. Is there any other method of processing this? Like creating loops of multiple of 256?

Ksharp
Super User
Could you use my IML code ?



data data_x;
input (Primary   col1   col2) ($);
cards;  
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2   
;
run;

data data_y;
input nex1 nex2 nex3;
cards;
1     0    1     
0     0    0     
3     1    0     
1     2    1     
0     0    1   
;
run;
data _null_;
 set data_x ;
 call execute(cat('proc sql;create table want_',put(_n_,best.-l),' as select 
 (select count(*) from data_y where ',Primary,' ge 1 and ',col1,' ge 1) as col1, 
 (select count(*) from data_y where ',Primary,' ge 1 and ',col2,' ge 1) as col2
 from data_y(obs=1) quit;'));
run;

data want;
 set want_:;
run;


Ksharp
Super User
Or this code could get you faster.



data data_x;
input (Primary   col1   col2) ($);
cards;  
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2   
;
run;

data data_y;
input nex1 nex2 nex3;
cards;
1     0    1     
0     0    0     
3     1    0     
1     2    1     
0     0    1   
;
run;
data _null_;
 set data_x end=last;
 if _n_=1 then call execute('proc sql;');
 call execute(cat('create table want_',put(_n_,best.-l),' as select 
 (select count(*) from data_y where ',Primary,' ge 1 and ',col1,' ge 1) as col1, 
 (select count(*) from data_y where ',Primary,' ge 1 and ',col2,' ge 1) as col2
 from data_y(obs=1);'));
 if last then call execute('quit;');
run;

data want;
 set want_: ;
run;

Ksharp
Super User
I noticed a problem. the order of union data is not right.
Try this one :




data data_x;
input (Primary   col1   col2) ($);
cards;  
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2   
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2 
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2 
nex1    nex2    nex3    
nex2    nex1    nex3    
nex3    nex1    nex2 
;
run;

data data_y;
input nex1 nex2 nex3;
cards;
1     0    1     
0     0    0     
3     1    0     
1     2    1     
0     0    1   
;
run;
data _null_;
 set data_x end=last;
 if _n_=1 then call execute('proc sql;');
 call execute(cat('create table want_',put(_n_,best.-l),' as select 
 (select count(*) from data_y where ',Primary,' ge 1 and ',col1,' ge 1) as col1, 
 (select count(*) from data_y where ',Primary,' ge 1 and ',col2,' ge 1) as col2
 from data_y(obs=1);'));
 if last then do;
  call execute('quit;');
  call symputx('n',_n_);
 end;
run;

data want;
 set want_1-want_&n ;
run;

munitech4u
Quartz | Level 8

Does this overcome the limitation of 256 dataset processing? as it will be fetching counts from same dataset multiple times.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 2999 views
  • 2 likes
  • 5 in conversation