BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
genetist
Calcite | Level 5

Hi to all,

I have falt data file which contains more than 5000 observations and 384 variables and its look like this.

S. No   AB001A  AB0002A AB362 ...AB384  VAR1    VAR2    VAR3...Var384    SUM %Match  Rank
    1   -/-        C/C   A/A                       
    2   C/C        C/C   A/A                       
    3   C/C        C/C   A/A                       
    4   C/C        C/C   A/A                       
    5   C/C        C/C   A/A                       
    6   C/C        C/C   A/A                       
    7   C/C        C/C   A/A                       
    8   -/-        -/-   -/-                       
    9   C/C        C/C   A/A                       
    10  C/C        C/C   A/A                       
    11  -/-        C/C   A/A                       
    12  C/C        C/C   A/A                       
    13  C/C        C/C   A/A                       
    14  C/C        C/C   A/A                       
    16  C/C        -/-   A/A                       
    17  -/-        C/C   A/A                       
    18  C/C        C/C   A/A                       
    19  C/C        C/C   A/A                       
 

I want to match obs 3 with obs 2, obs 4 with obs 2, like this until obs19, if it exactly matched i.e same value as obs 2 then score will be 1 else 0, that will be stored in var1(dummy var for AB001a) for AB001a , in var2 for ab0002a and in var3 for ab362 until AB384 (Var 384) and i want to calculate sum of all the 1's and observation match percent and their rank (top ten matchers), I did this successfully in excel but it took me lot of time, i used if condition in excel like (=if(A3=A$2,1,0) and then i dragged among all obs and i did sum of all obs, their %match and rank. My question is how can i do this in sas? can i use arrays for this? or in combinations of loops and arrays? I have  good basic sas knowledge. can any one guide me how to do this in sas because i want to reduce my time to analyze my data.

Thanking you Regards,

Gnetist

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Maybe I understand what you mean.

data have;
input No   AB001A $ AB0002A $ AB362 $;
cards;
    1   -/-        C/C   A/A                       
    2   C/C        C/C   A/A                       
    3   C/C        C/C   A/A                       
    4   C/C        C/C   A/A                       
    5   C/C        C/C   A/A                       
    6   C/C        C/C   A/A                       
    7   C/C        C/C   A/A                       
    8   -/-        -/-   -/-                       
    9   C/C        C/C   A/A                       
    10  C/C        C/C   A/A                       
    11  -/-        C/C   A/A                       
    12  C/C        C/C   A/A                       
    13  C/C        C/C   A/A                       
    14  C/C        C/C   A/A                       
    16  C/C        -/-   A/A                       
    17  -/-        C/C   A/A                       
    18  C/C        C/C   A/A                       
    19  C/C        C/C   A/A 
;
run;
data have;
 set have;
 array x{3} var1-var3;
 array temp{3} $ _temporary_;
 array v{*} $ AB: ;
 if _n_ eq 2 then do;
                 do i=1 to dim(v);
                      temp{i}=v{i};
                     end;
                    end;
 if _n_ gt 2 then do;
                    do j=1 to dim(v);
                          x{j}=ifn(v{j}=temp{j},1,0);
                         end;
                      end;
run;




Ksharp

View solution in original post

5 REPLIES 5
Ksharp
Super User

Can you explain the 'Match' ? you want all the variable value are same , or only a single variable value matched.

And more important than anything else is posting the result you want(take the first several variables as an example) .

Ksharp

Ksharp
Super User

Maybe I understand what you mean.

data have;
input No   AB001A $ AB0002A $ AB362 $;
cards;
    1   -/-        C/C   A/A                       
    2   C/C        C/C   A/A                       
    3   C/C        C/C   A/A                       
    4   C/C        C/C   A/A                       
    5   C/C        C/C   A/A                       
    6   C/C        C/C   A/A                       
    7   C/C        C/C   A/A                       
    8   -/-        -/-   -/-                       
    9   C/C        C/C   A/A                       
    10  C/C        C/C   A/A                       
    11  -/-        C/C   A/A                       
    12  C/C        C/C   A/A                       
    13  C/C        C/C   A/A                       
    14  C/C        C/C   A/A                       
    16  C/C        -/-   A/A                       
    17  -/-        C/C   A/A                       
    18  C/C        C/C   A/A                       
    19  C/C        C/C   A/A 
;
run;
data have;
 set have;
 array x{3} var1-var3;
 array temp{3} $ _temporary_;
 array v{*} $ AB: ;
 if _n_ eq 2 then do;
                 do i=1 to dim(v);
                      temp{i}=v{i};
                     end;
                    end;
 if _n_ gt 2 then do;
                    do j=1 to dim(v);
                          x{j}=ifn(v{j}=temp{j},1,0);
                         end;
                      end;
run;




Ksharp

genetist
Calcite | Level 5

KSHARP,

Thank you very much for your help in solving my problem. Its working everything fine and i changed according to my requirement and i dropped i and j variables from your soultion, i think you may forget to drop them. I am really so thank full to you.

Regards,

Reddy

UrvishShah
Fluorite | Level 6

Hi,

Based on my understanding about your requirement i have designed one macro for matching the observations and based on that it will calculate the percentages respectively...

 

 

data sample;
  input sr_no AB001A AB0002A;
  cards;
1 15 25
2 15 25
3 25 84
4 25 84
5 08 09
6 10 12
7 10 12
8 48 15
;

%macro match;

   proc contents data = sample out = test(keep = name) noprint;      
   run;

   proc sql noprint;
    select name into :list separated by ' '
    from test
    where name ^? "sr_no";

    select count(name) into :count
    from test
    where name ^? "sr_no";
  quit;

 

  %let count = &count.;

  data compare(drop = _temp:);     
      set sample;

      %do i = 1 %to &count.;
            _temp&i. = dif(%scan(&list.,&i.));
            if _temp&i. = 0 then var&i. = 1;
            else var&i. = 0;
     %end;
run;

proc sql noprint;
   select %do i = 1 %to %eval(&count.-1);
                 sum(%scan(&list.,&i.)),
            %end;
                sum(%scan(&list.,&count.)) into
           

            %do i = 1 %to %eval(&count.-1);
                    :%scan(&list.,&i.)_sum,
            %end;
                   :%scan(&list.,&count.)_sum
from compare
where %do i = 1 %to %eval(&count.-1);
                  var&i. = 1 or
           %end;
                 var&count. = 1;
quit;

proc sql;
   create table compare as
   select *, %do i = 1 %to %eval(&count.-1);
                      sum(%scan(&list.,&i.)) as total_%scan(&list.,&i.),
                %end;
                      sum(%scan(&list.,&count.)) as total_%scan(&list.,&count.)
   from compare;
quit;

data final(drop = total: sum:);
  set compare;
  %do i = 1 %to &count.;
       sum&i. = symget("%scan(&list.,&i.)_sum");
      %scan(&list.,&i.)_percent = sum&i./total_%scan(&list.,&i.)*100;
  %end;
run;

%mend;

options mlogic mprint;
%match;

Hope this macro help you to achive your output using SAS...

-Urvish

genetist
Calcite | Level 5

Urvish,

Thank you very much for your help and spending your valueable time to help me. I have character data and my first two numbers are parents. first i want to see how many of my vars are showing different values for parents (first 2 obs) and then i will count them. I will count total 1's and then divide them with number of vars that are different among two parents and then i will calculate % match. (ex sum of 1's/40 (different vars among parents) and then i will give rank from descending order. I think you code will work for it, i will try it first my code and then your if i get any problem i will contact you.

Thanking you very much,

Regards,

Reddy

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1386 views
  • 3 likes
  • 3 in conversation