Hi to all,
I have falt data file which contains more than 5000 observations and 384 variables and its look like this.
S. No AB001A AB0002A AB362 ...AB384 VAR1 VAR2 VAR3...Var384 SUM %Match Rank
1 -/- C/C A/A
2 C/C C/C A/A
3 C/C C/C A/A
4 C/C C/C A/A
5 C/C C/C A/A
6 C/C C/C A/A
7 C/C C/C A/A
8 -/- -/- -/-
9 C/C C/C A/A
10 C/C C/C A/A
11 -/- C/C A/A
12 C/C C/C A/A
13 C/C C/C A/A
14 C/C C/C A/A
16 C/C -/- A/A
17 -/- C/C A/A
18 C/C C/C A/A
19 C/C C/C A/A
I want to match obs 3 with obs 2, obs 4 with obs 2, like this until obs19, if it exactly matched i.e same value as obs 2 then score will be 1 else 0, that will be stored in var1(dummy var for AB001a) for AB001a , in var2 for ab0002a and in var3 for ab362 until AB384 (Var 384) and i want to calculate sum of all the 1's and observation match percent and their rank (top ten matchers), I did this successfully in excel but it took me lot of time, i used if condition in excel like (=if(A3=A$2,1,0) and then i dragged among all obs and i did sum of all obs, their %match and rank. My question is how can i do this in sas? can i use arrays for this? or in combinations of loops and arrays? I have good basic sas knowledge. can any one guide me how to do this in sas because i want to reduce my time to analyze my data.
Thanking you Regards,
Gnetist
Maybe I understand what you mean.
data have; input No AB001A $ AB0002A $ AB362 $; cards; 1 -/- C/C A/A 2 C/C C/C A/A 3 C/C C/C A/A 4 C/C C/C A/A 5 C/C C/C A/A 6 C/C C/C A/A 7 C/C C/C A/A 8 -/- -/- -/- 9 C/C C/C A/A 10 C/C C/C A/A 11 -/- C/C A/A 12 C/C C/C A/A 13 C/C C/C A/A 14 C/C C/C A/A 16 C/C -/- A/A 17 -/- C/C A/A 18 C/C C/C A/A 19 C/C C/C A/A ; run; data have; set have; array x{3} var1-var3; array temp{3} $ _temporary_; array v{*} $ AB: ; if _n_ eq 2 then do; do i=1 to dim(v); temp{i}=v{i}; end; end; if _n_ gt 2 then do; do j=1 to dim(v); x{j}=ifn(v{j}=temp{j},1,0); end; end; run;
Ksharp
Can you explain the 'Match' ? you want all the variable value are same , or only a single variable value matched.
And more important than anything else is posting the result you want(take the first several variables as an example) .
Ksharp
Maybe I understand what you mean.
data have; input No AB001A $ AB0002A $ AB362 $; cards; 1 -/- C/C A/A 2 C/C C/C A/A 3 C/C C/C A/A 4 C/C C/C A/A 5 C/C C/C A/A 6 C/C C/C A/A 7 C/C C/C A/A 8 -/- -/- -/- 9 C/C C/C A/A 10 C/C C/C A/A 11 -/- C/C A/A 12 C/C C/C A/A 13 C/C C/C A/A 14 C/C C/C A/A 16 C/C -/- A/A 17 -/- C/C A/A 18 C/C C/C A/A 19 C/C C/C A/A ; run; data have; set have; array x{3} var1-var3; array temp{3} $ _temporary_; array v{*} $ AB: ; if _n_ eq 2 then do; do i=1 to dim(v); temp{i}=v{i}; end; end; if _n_ gt 2 then do; do j=1 to dim(v); x{j}=ifn(v{j}=temp{j},1,0); end; end; run;
Ksharp
KSHARP,
Thank you very much for your help in solving my problem. Its working everything fine and i changed according to my requirement and i dropped i and j variables from your soultion, i think you may forget to drop them. I am really so thank full to you.
Regards,
Reddy
Hi,
Based on my understanding about your requirement i have designed one macro for matching the observations and based on that it will calculate the percentages respectively...
data sample;
input sr_no AB001A AB0002A;
cards;
1 15 25
2 15 25
3 25 84
4 25 84
5 08 09
6 10 12
7 10 12
8 48 15
;
%macro match;
proc contents data = sample out = test(keep = name) noprint;
run;
proc sql noprint;
select name into :list separated by ' '
from test
where name ^? "sr_no";
select count(name) into :count
from test
where name ^? "sr_no";
quit;
%let count = &count.;
data compare(drop = _temp:);
set sample;
%do i = 1 %to &count.;
_temp&i. = dif(%scan(&list.,&i.));
if _temp&i. = 0 then var&i. = 1;
else var&i. = 0;
%end;
run;
proc sql noprint;
select %do i = 1 %to %eval(&count.-1);
sum(%scan(&list.,&i.)),
%end;
sum(%scan(&list.,&count.)) into
%do i = 1 %to %eval(&count.-1);
:%scan(&list.,&i.)_sum,
%end;
:%scan(&list.,&count.)_sum
from compare
where %do i = 1 %to %eval(&count.-1);
var&i. = 1 or
%end;
var&count. = 1;
quit;
proc sql;
create table compare as
select *, %do i = 1 %to %eval(&count.-1);
sum(%scan(&list.,&i.)) as total_%scan(&list.,&i.),
%end;
sum(%scan(&list.,&count.)) as total_%scan(&list.,&count.)
from compare;
quit;
data final(drop = total: sum:);
set compare;
%do i = 1 %to &count.;
sum&i. = symget("%scan(&list.,&i.)_sum");
%scan(&list.,&i.)_percent = sum&i./total_%scan(&list.,&i.)*100;
%end;
run;
%mend;
options mlogic mprint;
%match;
Hope this macro help you to achive your output using SAS...
-Urvish
Urvish,
Thank you very much for your help and spending your valueable time to help me. I have character data and my first two numbers are parents. first i want to see how many of my vars are showing different values for parents (first 2 obs) and then i will count them. I will count total 1's and then divide them with number of vars that are different among two parents and then i will calculate % match. (ex sum of 1's/40 (different vars among parents) and then i will give rank from descending order. I think you code will work for it, i will try it first my code and then your if i get any problem i will contact you.
Thanking you very much,
Regards,
Reddy
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.