DATA Step, Macro, Functions and more

Help with counting the number of record meeting condition from seperate file

Accepted Solution Solved
Reply
Super Contributor
Posts: 406
Accepted Solution

Help with counting the number of record meeting condition from seperate file

Hello,

 

I want to count the number of records that meet each pair of (variable and value) specified in a seperate file. 

 

So for the 1st combination of a=4 and b=5, there are 2 record out of total 5 (with either a=4 or b=5).

 

I write a code that get it done but that approach is defintely not efficient. It take too long.

 

I guess there should be a different way. 

 

Could you please help me with that?

 

Thanks a lot,

 

HHC

 


data have; 
input date a b c d;
datalines;
1 4 4 9 5
2 4 4 5 9
3 4 5 5 0
4 3 6 8 9
5 3 5 9 0
6 4 5 9 2
7 6 0 9 1
;run;

data have_var; 
input a_name $ a_value;
datalines;
a 4
b 5
c 9
d 9
;run;

*a sample for 1 combination of condition a=4 AND b=5 ;

	data BOTH; set have;
	if a=4 AND b=5 THEN DO;
		BOTH=1;OUTPUT;END;
	ELSE 
	if a=4 OR b=5 THEN DO;
		BOTH=0;OUTPUT; END;
	run;

	proc MEANS data=BOTH noprint;
	var both; 
	output out=N_same sum=N_same;
	run;
	
	data N_same; set N_same;
	drop _TYPE_;
	a_name="&a_name";
	b_name="&b_name";
	a_value=&a_value;
	b_value=&b_value;
	run;




*----This MAcro works but take too much time ------------------------------------------------------------------------;
data WANT; set xx; run;	*the marco below will build all final result into the WANT_CORR;

%MACRO correlation(a_name=,a_value=,b_name=,b_value=);
	data BOTH; set have;
	if &a_name=&a_value AND &b_name=&b_value THEN DO;
		BOTH=1;OUTPUT;END;
	ELSE 
	if &a_name=&a_value OR &b_name=&b_value THEN DO;
		BOTH=0;OUTPUT; END;
	run;

	proc MEANS data=BOTH noprint;
	var both; 
	output out=N_same sum=N_same;
	run;
	
	data N_same; set N_same;
	drop _TYPE_;
	a_name="&a_name";
	b_name="&b_name";
	a_value=&a_value;
	b_value=&b_value;
	run;

	data WANT; set WANT N_same; run;

%MEND;

	*Create combination of any 2 condition;
		data combination;									
		set have_var nobs=totalobs;
		drop i ;
		i+1;
		do j=i+1 to totalobs ;
			set have_var(keep = a_name a_value rename=(a_name=b_name a_value=b_value)) point=j;
			output;
		end;
		run;
	*Write code to temp file named "code";
		filename code temp;
		data _null_;
		  set combination end=last_cond;
		  file code;
			put '%correlation (a_name=' a_NAME ',a_value=' a_VALUE ', b_name=' b_NAME ',b_value=' b_VALUE ');'
		;run;
	*Run the temp file;
		%include code /source2;


Accepted Solutions
Solution
‎03-20-2017 10:31 PM
Super User
Posts: 10,018

Re: Help with counting the number of record meeting condition from seperate file

Here is .Check the last column.




data have; 
input date a b c d;
datalines;
1 4 4 9 5
2 4 4 5 9
3 4 5 5 0
4 3 6 8 9
5 3 5 9 0
6 4 5 9 2
7 6 0 9 1
;run;

data have_var; 
input a_name $ a_value;
datalines;
a 4
b 5
c 9
d 9
;run;

proc iml;
use have(drop=date);
read all var _all_ into have[c=vnames];
close;

use have_var;
read all var{a_name a_value};
close;

n=nrow(a_name);
do i=1 to n-1;
 do j=i+1 to n;
   temp=have[,a_name[i]]||have[,a_name[j]];
   idx=loc(temp[,1]=a_value[i] | temp[,2]=a_value[j]);
   want=temp[idx,];
   corr=corr(want);
   var1=var1//a_name[i];
   var2=var2//a_name[j];
   cor=cor//corr[2];
   Total_N=Total_N//nrow(want);
   N_non_missing=N_non_missing//ncol(loc(countmiss(want,'row')=0));
   N_match=N_match//ncol(loc(temp[,1]=a_value[i] & temp[,2]=a_value[j]));
 end;
end;

create want var {var1 var2 cor Total_N N_non_missing N_match};
append;
close;
quit;

View solution in original post


All Replies
Super User
Posts: 5,495

Re: Help with counting the number of record meeting condition from seperate file

Here's an approach that adds two variables to each observation:  n_matches (number of matched variables) and matching_list (names of variables that match).  You can extract, summarize, or do what you will with the results.

 

Assuming you have both data sets (HAVE and HAVE_VAR):

 

proc sql noprint;

select trim(a_name) into : varnames separated by ' ' from have_var;

select a_value into : varvalues separated by ' ' from have_var;

quit;

 

data want;

set have;

array values {&sqlobs} _temporary_ (&varvalues);

array names {*} &varnames;

n_matches=0;

length matching_list $ 500;

do _n_=1 to &sqlobs;

   if names{_n_} = values{_n_} then do;

      n_matches + 1;

      matching_varlist = catx(',' , matching_varlist, vname(names{_n_}));

   end;

end;

run;

 

One of the keys to getting this to work is that the SELECT statements do not change the order of data retrieved from HAVE_VAR.  While one SELECT statement retrieves the variable names and the other retrieves the values, the order matches up within both macro variables.  Do not use DISTINCT or you will modify the order improperly.

 

The code is untested, but looks like it should work.  Let me know if you have any problems with it.  And examine the two added variables to make sure they give you the information you want.

Trusted Advisor
Posts: 1,015

Re: Help with counting the number of record meeting condition from seperate file

You could use SQL to read have_var and build a set of statements like.

  if a=4 and b=5 then N_a4_b5+1;
  if a=4 and c=9 then N_a4_c9+1;

 

If SQL writes those statements to a macro var  (into :if_statements below), you could easily apply them to a data set reading HAVE and writiing one obs to WANT with the totals for each pairwise combination:

 

data have; 
input date a b c d;
datalines;
1 4 4 9 5
2 4 4 5 9
3 4 5 5 0
4 3 6 8 9
5 3 5 9 0
6 4 5 9 2
7 6 0 9 1
;run;

data have_var; 
input a_name $ a_value;
datalines;
a 4
b 5
c 9
d 9
;run;


proc sql noprint;
  select distinct  
    catx(' '
        ,'if'
        ,cats(L.a_name,'=',L.a_value)
        ,'and'
        ,cats(R.a_name,'=',R.a_value)
        ,'then'
        ,cats('N_',L.a_name,L.a_value,'_',R.a_name,R.a_value,'+1;')
        )
  into :if_statements separated by ' '
  from have_var as L inner join have_var as R
  on L.a_name<R.a_name;
quit;

data want (keep=N_:);
  set have end=end_of_have;
  &if_statements ;
  if end_of_have then output;
run;

 

Super Contributor
Posts: 406

Re: Help with counting the number of record meeting condition from seperate file

Thank you for your responses.

 

I am not sure if the code by Astounding generate the results, I find it difficult to read the “want” file.

The code by Mkeintz give the results. Can you make it vertically with “a_name” “a_value” “b_name” “b_value”?

 

HHC

Super User
Posts: 5,495

Re: Help with counting the number of record meeting condition from seperate file

[ Edited ]

If you have run my program, the intent is that you produce your own report.  For example, you could try:

 

proc freq data=want;

tables n_matches * matching_varlist / missing list;

run;

 

You can really slice and dice it however you want to see it.

Solution
‎03-20-2017 10:31 PM
Super User
Posts: 10,018

Re: Help with counting the number of record meeting condition from seperate file

Here is .Check the last column.




data have; 
input date a b c d;
datalines;
1 4 4 9 5
2 4 4 5 9
3 4 5 5 0
4 3 6 8 9
5 3 5 9 0
6 4 5 9 2
7 6 0 9 1
;run;

data have_var; 
input a_name $ a_value;
datalines;
a 4
b 5
c 9
d 9
;run;

proc iml;
use have(drop=date);
read all var _all_ into have[c=vnames];
close;

use have_var;
read all var{a_name a_value};
close;

n=nrow(a_name);
do i=1 to n-1;
 do j=i+1 to n;
   temp=have[,a_name[i]]||have[,a_name[j]];
   idx=loc(temp[,1]=a_value[i] | temp[,2]=a_value[j]);
   want=temp[idx,];
   corr=corr(want);
   var1=var1//a_name[i];
   var2=var2//a_name[j];
   cor=cor//corr[2];
   Total_N=Total_N//nrow(want);
   N_non_missing=N_non_missing//ncol(loc(countmiss(want,'row')=0));
   N_match=N_match//ncol(loc(temp[,1]=a_value[i] & temp[,2]=a_value[j]));
 end;
end;

create want var {var1 var2 cor Total_N N_non_missing N_match};
append;
close;
quit;

Super Contributor
Posts: 406

Re: Help with counting the number of record meeting condition from seperate file

It is really powerful that proc IML. I never do anything like that. 

Really appreciate it, Ksharp.

HHC

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 202 views
  • 1 like
  • 4 in conversation