BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hhchenfx
Barite | Level 11

Hello,

 

I want to count the number of records that meet each pair of (variable and value) specified in a seperate file. 

 

So for the 1st combination of a=4 and b=5, there are 2 record out of total 5 (with either a=4 or b=5).

 

I write a code that get it done but that approach is defintely not efficient. It take too long.

 

I guess there should be a different way. 

 

Could you please help me with that?

 

Thanks a lot,

 

HHC

 


data have; 
input date a b c d;
datalines;
1 4 4 9 5
2 4 4 5 9
3 4 5 5 0
4 3 6 8 9
5 3 5 9 0
6 4 5 9 2
7 6 0 9 1
;run;

data have_var; 
input a_name $ a_value;
datalines;
a 4
b 5
c 9
d 9
;run;

*a sample for 1 combination of condition a=4 AND b=5 ;

	data BOTH; set have;
	if a=4 AND b=5 THEN DO;
		BOTH=1;OUTPUT;END;
	ELSE 
	if a=4 OR b=5 THEN DO;
		BOTH=0;OUTPUT; END;
	run;

	proc MEANS data=BOTH noprint;
	var both; 
	output out=N_same sum=N_same;
	run;
	
	data N_same; set N_same;
	drop _TYPE_;
	a_name="&a_name";
	b_name="&b_name";
	a_value=&a_value;
	b_value=&b_value;
	run;




*----This MAcro works but take too much time ------------------------------------------------------------------------;
data WANT; set xx; run;	*the marco below will build all final result into the WANT_CORR;

%MACRO correlation(a_name=,a_value=,b_name=,b_value=);
	data BOTH; set have;
	if &a_name=&a_value AND &b_name=&b_value THEN DO;
		BOTH=1;OUTPUT;END;
	ELSE 
	if &a_name=&a_value OR &b_name=&b_value THEN DO;
		BOTH=0;OUTPUT; END;
	run;

	proc MEANS data=BOTH noprint;
	var both; 
	output out=N_same sum=N_same;
	run;
	
	data N_same; set N_same;
	drop _TYPE_;
	a_name="&a_name";
	b_name="&b_name";
	a_value=&a_value;
	b_value=&b_value;
	run;

	data WANT; set WANT N_same; run;

%MEND;

	*Create combination of any 2 condition;
		data combination;									
		set have_var nobs=totalobs;
		drop i ;
		i+1;
		do j=i+1 to totalobs ;
			set have_var(keep = a_name a_value rename=(a_name=b_name a_value=b_value)) point=j;
			output;
		end;
		run;
	*Write code to temp file named "code";
		filename code temp;
		data _null_;
		  set combination end=last_cond;
		  file code;
			put '%correlation (a_name=' a_NAME ',a_value=' a_VALUE ', b_name=' b_NAME ',b_value=' b_VALUE ');'
		;run;
	*Run the temp file;
		%include code /source2;

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
Here is .Check the last column.




data have; 
input date a b c d;
datalines;
1 4 4 9 5
2 4 4 5 9
3 4 5 5 0
4 3 6 8 9
5 3 5 9 0
6 4 5 9 2
7 6 0 9 1
;run;

data have_var; 
input a_name $ a_value;
datalines;
a 4
b 5
c 9
d 9
;run;

proc iml;
use have(drop=date);
read all var _all_ into have[c=vnames];
close;

use have_var;
read all var{a_name a_value};
close;

n=nrow(a_name);
do i=1 to n-1;
 do j=i+1 to n;
   temp=have[,a_name[i]]||have[,a_name[j]];
   idx=loc(temp[,1]=a_value[i] | temp[,2]=a_value[j]);
   want=temp[idx,];
   corr=corr(want);
   var1=var1//a_name[i];
   var2=var2//a_name[j];
   cor=cor//corr[2];
   Total_N=Total_N//nrow(want);
   N_non_missing=N_non_missing//ncol(loc(countmiss(want,'row')=0));
   N_match=N_match//ncol(loc(temp[,1]=a_value[i] & temp[,2]=a_value[j]));
 end;
end;

create want var {var1 var2 cor Total_N N_non_missing N_match};
append;
close;
quit;

View solution in original post

6 REPLIES 6
Astounding
PROC Star

Here's an approach that adds two variables to each observation:  n_matches (number of matched variables) and matching_list (names of variables that match).  You can extract, summarize, or do what you will with the results.

 

Assuming you have both data sets (HAVE and HAVE_VAR):

 

proc sql noprint;

select trim(a_name) into : varnames separated by ' ' from have_var;

select a_value into : varvalues separated by ' ' from have_var;

quit;

 

data want;

set have;

array values {&sqlobs} _temporary_ (&varvalues);

array names {*} &varnames;

n_matches=0;

length matching_list $ 500;

do _n_=1 to &sqlobs;

   if names{_n_} = values{_n_} then do;

      n_matches + 1;

      matching_varlist = catx(',' , matching_varlist, vname(names{_n_}));

   end;

end;

run;

 

One of the keys to getting this to work is that the SELECT statements do not change the order of data retrieved from HAVE_VAR.  While one SELECT statement retrieves the variable names and the other retrieves the values, the order matches up within both macro variables.  Do not use DISTINCT or you will modify the order improperly.

 

The code is untested, but looks like it should work.  Let me know if you have any problems with it.  And examine the two added variables to make sure they give you the information you want.

mkeintz
PROC Star

You could use SQL to read have_var and build a set of statements like.

  if a=4 and b=5 then N_a4_b5+1;
  if a=4 and c=9 then N_a4_c9+1;

 

If SQL writes those statements to a macro var  (into :if_statements below), you could easily apply them to a data set reading HAVE and writiing one obs to WANT with the totals for each pairwise combination:

 

data have; 
input date a b c d;
datalines;
1 4 4 9 5
2 4 4 5 9
3 4 5 5 0
4 3 6 8 9
5 3 5 9 0
6 4 5 9 2
7 6 0 9 1
;run;

data have_var; 
input a_name $ a_value;
datalines;
a 4
b 5
c 9
d 9
;run;


proc sql noprint;
  select distinct  
    catx(' '
        ,'if'
        ,cats(L.a_name,'=',L.a_value)
        ,'and'
        ,cats(R.a_name,'=',R.a_value)
        ,'then'
        ,cats('N_',L.a_name,L.a_value,'_',R.a_name,R.a_value,'+1;')
        )
  into :if_statements separated by ' '
  from have_var as L inner join have_var as R
  on L.a_name<R.a_name;
quit;

data want (keep=N_:);
  set have end=end_of_have;
  &if_statements ;
  if end_of_have then output;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
hhchenfx
Barite | Level 11

Thank you for your responses.

 

I am not sure if the code by Astounding generate the results, I find it difficult to read the “want” file.

The code by Mkeintz give the results. Can you make it vertically with “a_name” “a_value” “b_name” “b_value”?

 

HHC

Astounding
PROC Star

If you have run my program, the intent is that you produce your own report.  For example, you could try:

 

proc freq data=want;

tables n_matches * matching_varlist / missing list;

run;

 

You can really slice and dice it however you want to see it.

Ksharp
Super User
Here is .Check the last column.




data have; 
input date a b c d;
datalines;
1 4 4 9 5
2 4 4 5 9
3 4 5 5 0
4 3 6 8 9
5 3 5 9 0
6 4 5 9 2
7 6 0 9 1
;run;

data have_var; 
input a_name $ a_value;
datalines;
a 4
b 5
c 9
d 9
;run;

proc iml;
use have(drop=date);
read all var _all_ into have[c=vnames];
close;

use have_var;
read all var{a_name a_value};
close;

n=nrow(a_name);
do i=1 to n-1;
 do j=i+1 to n;
   temp=have[,a_name[i]]||have[,a_name[j]];
   idx=loc(temp[,1]=a_value[i] | temp[,2]=a_value[j]);
   want=temp[idx,];
   corr=corr(want);
   var1=var1//a_name[i];
   var2=var2//a_name[j];
   cor=cor//corr[2];
   Total_N=Total_N//nrow(want);
   N_non_missing=N_non_missing//ncol(loc(countmiss(want,'row')=0));
   N_match=N_match//ncol(loc(temp[,1]=a_value[i] & temp[,2]=a_value[j]));
 end;
end;

create want var {var1 var2 cor Total_N N_non_missing N_match};
append;
close;
quit;

hhchenfx
Barite | Level 11

It is really powerful that proc IML. I never do anything like that. 

Really appreciate it, Ksharp.

HHC

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 911 views
  • 1 like
  • 4 in conversation