I have a dataset like this:
Primary col1 col2
nex1 nex2 nex3
nex2 nex1 nex3
nex3 nex1 nex2
I want to loop over this dataset for each rowxcol combination and count all cases e.g. where (nex1>=1 and nex2>=1) in another dataset(data_y) which contains these nex1-nex3000 variables. Then move to second case(data_x) where (nex1>=1 and nex3>=1) until the end of row. Now go to the second row of and repeat, til the end of row. So data_x is just to get all the possible cases. I want to get the count for these cases from another dataset data_y
Data_y is like this:
nex1 nex2 nex3.....nex3000
1 0 1 3
0 0 0 0
3 1 0 1
1 2 1 0
0 0 1 0
So final dataset will be having counts for these cases(each case from data_x for rowxcol combination):
col1 col2...
2 2
2 1
2 1...
so pseudo code is like this:
for each row in data_x:
for each col in data_x:
if &row_val>=1 and &col_val>=1 in data_y:
select count(*)
Here is IML code. I believe it would be very fast. data data_x; input (Primary col1 col2) ($); cards; nex1 nex2 nex3 nex2 nex1 nex3 nex3 nex1 nex2 ; run; data data_y; input nex1 nex2 nex3; cards; 1 0 1 0 0 0 3 1 0 1 2 1 0 0 1 ; run; proc iml; use data_x nobs nobs_x; read all var _all_; close; use data_y; read all var _all_; close; want=j(nobs_x,2,.); do i=1 to nobs_x; p=value(Primary[i]); temp_p=(p >= 1); c1=value(col1[i]); temp_c1=(c1 >= 1); want[i,1]=sum((temp_p+temp_c1)=2); c2=value(col2[i]); temp_c2=(c2 >= 1); want[i,2]=sum((temp_p+temp_c2)=2); end; create want from want; append from want; close; quit;
To clarify: The Values you are showing are Varaible Names in another data set?
Will ANY of the values in your example data set every occur in different columns?
What will the interpretation of the total in Col1 be at the end? You say it is a crosstab. So Cross tab of what?
I think you better walk us through a much smaller example, say with 5 or 6 variables, show the result desired for each step. You kind of jumped from a "row" result to a cross tab with explaining how to get there. And please provide example data of the result at each step that matches the example data provided for the small example.
Please provide a small worked example as indicated by @ballardw
We do not understand your requirements.
How big are your datasets?
If you want code post a worked example, specifically sample data as a data step, as requested previously. I'm not typing out data. As is, here's my algorithm.
1. Change data_y to a dataset that's 0/1, where 1 is representative of anything greater than 0. Data step using an array would work.
2. Use Proc Corr to create a distance matrix - https://gist.github.com/statgeek/a5184a4e1678d81e2643
3. Transpose output from 2 to a long format, Proc Transpose
4. Use result from 3 as a lookup to Table 2 via a sql merge. You could transpose data_x to a long and merge to get your results and then transpose back. Or you may be able to do a single sql step depending on how many columns is in data_x. Again you haven't provided the sizes of your dataset.
Good Luck.
This example data is why I asked about the duplication of values in different variables:
Primary col1 col2
nex1 nex2 nex3
nex2 nex1 nex3
nex3 nex1 nex2
Since nex2 appears in col1 And in col2 that means to get a count of nex2 there will need to be an indicator variable to hold that specific indication. Which will require either creating 3000 indicator variables or something to store which match is which because otherwise I cannot see how you wll combine the counts of col1 matches and col2 matches for the same variable.
And Having Nex2 in the "primary" isn'tgoing to help much. I believe that given your requirement to match all rows in the first set with the second that you will generate an overcount. For example you second and third row of the the first set will count nex1 twice from row 4 in data_y.
And I still cannot figur out what this means in relation to either of the data sets:
col1 col2...
2 2
2 1
2 1...
You are skipping at least one intermediate step a far as I can tell. SHOW the immediate result of the comparisons between dataset 1 and data_y.
data data_x; input (Primary col1 col2) ($); cards; nex1 nex2 nex3 nex2 nex1 nex3 nex3 nex1 nex2 ; run; data data_y; input nex1 nex2 nex3; cards; 1 0 1 0 0 0 3 1 0 1 2 1 0 0 1 ; run; data _null_; set data_x end=last; if _n_=1 then call execute('proc sql;create table want as'); call execute(cat('select (select count(*) from data_y where ',Primary,' ge 1 and ',col1,' ge 1) as col1, (select count(*) from data_y where ',Primary,' ge 1 and ',col2,' ge 1) as col2 from data_y(obs=1)')); if last then call execute(';quit;'); else call execute('union all'); run;
Thanks Xia! But i am having a problem with this, it is throwing the error of maximum limitation of 256 tables processing. I think i will have some 1000x9= 9000 such cases. Is there any other method of processing this? Like creating loops of multiple of 256?
Could you use my IML code ? data data_x; input (Primary col1 col2) ($); cards; nex1 nex2 nex3 nex2 nex1 nex3 nex3 nex1 nex2 ; run; data data_y; input nex1 nex2 nex3; cards; 1 0 1 0 0 0 3 1 0 1 2 1 0 0 1 ; run; data _null_; set data_x ; call execute(cat('proc sql;create table want_',put(_n_,best.-l),' as select (select count(*) from data_y where ',Primary,' ge 1 and ',col1,' ge 1) as col1, (select count(*) from data_y where ',Primary,' ge 1 and ',col2,' ge 1) as col2 from data_y(obs=1) quit;')); run; data want; set want_:; run;
Or this code could get you faster. data data_x; input (Primary col1 col2) ($); cards; nex1 nex2 nex3 nex2 nex1 nex3 nex3 nex1 nex2 ; run; data data_y; input nex1 nex2 nex3; cards; 1 0 1 0 0 0 3 1 0 1 2 1 0 0 1 ; run; data _null_; set data_x end=last; if _n_=1 then call execute('proc sql;'); call execute(cat('create table want_',put(_n_,best.-l),' as select (select count(*) from data_y where ',Primary,' ge 1 and ',col1,' ge 1) as col1, (select count(*) from data_y where ',Primary,' ge 1 and ',col2,' ge 1) as col2 from data_y(obs=1);')); if last then call execute('quit;'); run; data want; set want_: ; run;
I noticed a problem. the order of union data is not right. Try this one : data data_x; input (Primary col1 col2) ($); cards; nex1 nex2 nex3 nex2 nex1 nex3 nex3 nex1 nex2 nex1 nex2 nex3 nex2 nex1 nex3 nex3 nex1 nex2 nex1 nex2 nex3 nex2 nex1 nex3 nex3 nex1 nex2 nex1 nex2 nex3 nex2 nex1 nex3 nex3 nex1 nex2 ; run; data data_y; input nex1 nex2 nex3; cards; 1 0 1 0 0 0 3 1 0 1 2 1 0 0 1 ; run; data _null_; set data_x end=last; if _n_=1 then call execute('proc sql;'); call execute(cat('create table want_',put(_n_,best.-l),' as select (select count(*) from data_y where ',Primary,' ge 1 and ',col1,' ge 1) as col1, (select count(*) from data_y where ',Primary,' ge 1 and ',col2,' ge 1) as col2 from data_y(obs=1);')); if last then do; call execute('quit;'); call symputx('n',_n_); end; run; data want; set want_1-want_&n ; run;
Does this overcome the limitation of 256 dataset processing? as it will be fetching counts from same dataset multiple times.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.