Hi Everyone,
I have: 1 data-file with variable (a b c d...) and 1 condition-file.
Each row in the condition file contains certain condition (say a=1 and b=3 and c=5)
I want to count the number of records in the data-file that meet each condition in condition-file
So for the first condition: a =4 b =4 c =5 , there are 2 record in data-file meet that criteria.
and for the first condition: b =5 c =9 d =6 , there are 1 record in data-file meet that criteria.
I try to change a PROC IML code but it will not work.
Any help to fix the code or make new code is very much appreciated.
HHC
data have;
input date a b c d;
datalines;
1 4 4 5 5
2 4 4 5 9
3 4 5 5 0
4 3 6 8 9
5 3 5 0 0
6 4 5 1 2
7 6 5 9 6
;run;
data condition;
input id a_name $ a_value b_name $ b_value c_name $ c_value;
datalines;
1 a 4 b 4 c 5
2 b 5 c 9 d 6
;run;
proc iml;
use have(drop=date);
read all var _all_ into have[c=vnames];
close;
use condition;
read all var{a_name a_value b_name b_value c_name c_value};
close;
n=nrow(a_name);
do i=1 to n;
temp=have[,a_name[i]]||have[,b_name[i]||have[,c_name[i]];
idx=loc(temp[,1]=a_value[i] | temp[,2]=b_value[i] | temp[,3]=c_value[i]);
want=temp[idx,];
var1=var1//a_name[i];
var2=var2//b_name[i];
var3=var3//c_name[i];
value1=value1//a_value[i];
value2=value2//b_value[i];
value3=value3//c_value[i];
Total_N=Total_N//nrow(want);
N_match=N_match//ncol(loc(temp[,1]=a_value[i] & temp[,2]=b_value[i] & temp[,3]=c_value[i]));
end;
create want var {var1 var2 var3 value1 value2 value3 Total_N N_match};
append;
close;
quit;
As far as I can see your code is correct, though you were missing a ] along the way.
This code should work and I have commented where the error was.
data have;
input date a b c d;
datalines;
1 4 4 5 5
2 4 4 5 9
3 4 5 5 0
4 3 6 8 9
5 3 5 0 0
6 4 5 1 2
7 6 5 9 6
;run;
data condition;
input id a_name $ a_value b_name $ b_value c_name $ c_value;
datalines;
1 a 4 b 4 c 5
2 b 5 c 9 d 6
;run;
proc iml;
use have(drop=date);
read all var _all_ into have[c=vnames];
close have;
print have;
use condition;
read all var{a_name a_value b_name b_value c_name c_value};
close condition;
n=nrow(a_name);
do i=1 to n;
temp=have[,a_name[i]] || have[,b_name[i]] || have[,c_name[i]]; /* Your mistake was here :) */
idx=loc( temp[,1] = a_value[i] | temp[,2] = b_value[i] | temp[,3] = c_value[i] );
want=temp[idx,];
var1=var1//a_name[i];
var2=var2//b_name[i];
var3=var3//c_name[i];
value1=value1//a_value[i];
value2=value2//b_value[i];
value3=value3//c_value[i];
Total_N=Total_N//nrow(want);
N_match=N_match//ncol(loc(temp[,1]=a_value[i] & temp[,2]=b_value[i] & temp[,3]=c_value[i]));
end;
create want var {var1 var2 var3 value1 value2 value3 Total_N N_match};
append;
close want;
quit;
The N_MATCH variable gives the desired values as below
I just updated the post, I hope it is clearer now.
Thank you.
HC
As far as I can see your code is correct, though you were missing a ] along the way.
This code should work and I have commented where the error was.
data have;
input date a b c d;
datalines;
1 4 4 5 5
2 4 4 5 9
3 4 5 5 0
4 3 6 8 9
5 3 5 0 0
6 4 5 1 2
7 6 5 9 6
;run;
data condition;
input id a_name $ a_value b_name $ b_value c_name $ c_value;
datalines;
1 a 4 b 4 c 5
2 b 5 c 9 d 6
;run;
proc iml;
use have(drop=date);
read all var _all_ into have[c=vnames];
close have;
print have;
use condition;
read all var{a_name a_value b_name b_value c_name c_value};
close condition;
n=nrow(a_name);
do i=1 to n;
temp=have[,a_name[i]] || have[,b_name[i]] || have[,c_name[i]]; /* Your mistake was here :) */
idx=loc( temp[,1] = a_value[i] | temp[,2] = b_value[i] | temp[,3] = c_value[i] );
want=temp[idx,];
var1=var1//a_name[i];
var2=var2//b_name[i];
var3=var3//c_name[i];
value1=value1//a_value[i];
value2=value2//b_value[i];
value3=value3//c_value[i];
Total_N=Total_N//nrow(want);
N_match=N_match//ncol(loc(temp[,1]=a_value[i] & temp[,2]=b_value[i] & temp[,3]=c_value[i]));
end;
create want var {var1 var2 var3 value1 value2 value3 Total_N N_match};
append;
close want;
quit;
The N_MATCH variable gives the desired values as below
Oh, that is great.
Thanks a lot.
HC
HaHa. You nailed it. you really should learn some IML code . it is good for you. data have; input date a b c d; datalines; 1 4 4 5 5 2 4 4 5 9 3 4 5 5 0 4 3 6 8 9 5 3 5 0 0 6 4 5 1 2 7 6 5 9 6 ;run; data condition; input id a_name $ a_value b_name $ b_value c_name $ c_value; datalines; 1 a 4 b 4 c 5 2 b 5 c 9 d 6 ;run; proc iml; use have(drop=date); read all var _all_ into have[c=vnames]; close; use condition; read all var{id a_name a_value b_name b_value c_name c_value}; close; n=nrow(a_name); count=j(n,1,.); do i=1 to n; count[i]=sum(have[,a_name[i]]=a_value[i] & have[,b_name[i]]=b_value[i]& have[,c_name[i]]=c_value[i] ); end; create want var{id a_name a_value b_name b_value c_name c_value count}; append; close; quit;
Can you tell me what the meaning of have[c=vnames];
I dont see you use "c" anywhere but when I change "c" into something else, code not work. But "vnames", I can change to anything.
read all var _all_ into have[c=vnames];
HC
Yes. you have to specify "c=vnames" ,
It doesn't matter what 'vnames' is, you can name it any variable name.
Specify it is for using the variable name reference .
like
have[,a_name[i]]
note here a_name[i] is 'a' not 1 ,
so "c=vnames" is making sure you can refer to column of matrix as character 'a'.
Default index is like 1,2,........
"c=vnames" is copying variable names into row vector 'vnames'. C here is keyword means column
Thanks a lot.
I will try 🙂
HC
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.