BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
genetist
Calcite | Level 5

Hi to all,

I have falt data file which contains more than 5000 observations and 384 variables and its look like this.

S. No   AB001A  AB0002A AB362 ...AB384  VAR1    VAR2    VAR3...Var384    SUM %Match  Rank
    1   -/-        C/C   A/A                       
    2   C/C        C/C   A/A                       
    3   C/C        C/C   A/A                       
    4   C/C        C/C   A/A                       
    5   C/C        C/C   A/A                       
    6   C/C        C/C   A/A                       
    7   C/C        C/C   A/A                       
    8   -/-        -/-   -/-                       
    9   C/C        C/C   A/A                       
    10  C/C        C/C   A/A                       
    11  -/-        C/C   A/A                       
    12  C/C        C/C   A/A                       
    13  C/C        C/C   A/A                       
    14  C/C        C/C   A/A                       
    16  C/C        -/-   A/A                       
    17  -/-        C/C   A/A                       
    18  C/C        C/C   A/A                       
    19  C/C        C/C   A/A                       
 

I want to match obs 3 with obs 2, obs 4 with obs 2, like this until obs19, if it exactly matched i.e same value as obs 2 then score will be 1 else 0, that will be stored in var1(dummy var for AB001a) for AB001a , in var2 for ab0002a and in var3 for ab362 until AB384 (Var 384) and i want to calculate sum of all the 1's and observation match percent and their rank (top ten matchers), I did this successfully in excel but it took me lot of time, i used if condition in excel like (=if(A3=A$2,1,0) and then i dragged among all obs and i did sum of all obs, their %match and rank. My question is how can i do this in sas? can i use arrays for this? or in combinations of loops and arrays? I have  good basic sas knowledge. can any one guide me how to do this in sas because i want to reduce my time to analyze my data.

Thanking you Regards,

Gnetist

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Maybe I understand what you mean.

data have;
input No   AB001A $ AB0002A $ AB362 $;
cards;
    1   -/-        C/C   A/A                       
    2   C/C        C/C   A/A                       
    3   C/C        C/C   A/A                       
    4   C/C        C/C   A/A                       
    5   C/C        C/C   A/A                       
    6   C/C        C/C   A/A                       
    7   C/C        C/C   A/A                       
    8   -/-        -/-   -/-                       
    9   C/C        C/C   A/A                       
    10  C/C        C/C   A/A                       
    11  -/-        C/C   A/A                       
    12  C/C        C/C   A/A                       
    13  C/C        C/C   A/A                       
    14  C/C        C/C   A/A                       
    16  C/C        -/-   A/A                       
    17  -/-        C/C   A/A                       
    18  C/C        C/C   A/A                       
    19  C/C        C/C   A/A 
;
run;
data have;
 set have;
 array x{3} var1-var3;
 array temp{3} $ _temporary_;
 array v{*} $ AB: ;
 if _n_ eq 2 then do;
                 do i=1 to dim(v);
                      temp{i}=v{i};
                     end;
                    end;
 if _n_ gt 2 then do;
                    do j=1 to dim(v);
                          x{j}=ifn(v{j}=temp{j},1,0);
                         end;
                      end;
run;




Ksharp

View solution in original post

5 REPLIES 5
Ksharp
Super User

Can you explain the 'Match' ? you want all the variable value are same , or only a single variable value matched.

And more important than anything else is posting the result you want(take the first several variables as an example) .

Ksharp

Ksharp
Super User

Maybe I understand what you mean.

data have;
input No   AB001A $ AB0002A $ AB362 $;
cards;
    1   -/-        C/C   A/A                       
    2   C/C        C/C   A/A                       
    3   C/C        C/C   A/A                       
    4   C/C        C/C   A/A                       
    5   C/C        C/C   A/A                       
    6   C/C        C/C   A/A                       
    7   C/C        C/C   A/A                       
    8   -/-        -/-   -/-                       
    9   C/C        C/C   A/A                       
    10  C/C        C/C   A/A                       
    11  -/-        C/C   A/A                       
    12  C/C        C/C   A/A                       
    13  C/C        C/C   A/A                       
    14  C/C        C/C   A/A                       
    16  C/C        -/-   A/A                       
    17  -/-        C/C   A/A                       
    18  C/C        C/C   A/A                       
    19  C/C        C/C   A/A 
;
run;
data have;
 set have;
 array x{3} var1-var3;
 array temp{3} $ _temporary_;
 array v{*} $ AB: ;
 if _n_ eq 2 then do;
                 do i=1 to dim(v);
                      temp{i}=v{i};
                     end;
                    end;
 if _n_ gt 2 then do;
                    do j=1 to dim(v);
                          x{j}=ifn(v{j}=temp{j},1,0);
                         end;
                      end;
run;




Ksharp

genetist
Calcite | Level 5

KSHARP,

Thank you very much for your help in solving my problem. Its working everything fine and i changed according to my requirement and i dropped i and j variables from your soultion, i think you may forget to drop them. I am really so thank full to you.

Regards,

Reddy

UrvishShah
Fluorite | Level 6

Hi,

Based on my understanding about your requirement i have designed one macro for matching the observations and based on that it will calculate the percentages respectively...

 

 

data sample;
  input sr_no AB001A AB0002A;
  cards;
1 15 25
2 15 25
3 25 84
4 25 84
5 08 09
6 10 12
7 10 12
8 48 15
;

%macro match;

   proc contents data = sample out = test(keep = name) noprint;      
   run;

   proc sql noprint;
    select name into :list separated by ' '
    from test
    where name ^? "sr_no";

    select count(name) into :count
    from test
    where name ^? "sr_no";
  quit;

 

  %let count = &count.;

  data compare(drop = _temp:);     
      set sample;

      %do i = 1 %to &count.;
            _temp&i. = dif(%scan(&list.,&i.));
            if _temp&i. = 0 then var&i. = 1;
            else var&i. = 0;
     %end;
run;

proc sql noprint;
   select %do i = 1 %to %eval(&count.-1);
                 sum(%scan(&list.,&i.)),
            %end;
                sum(%scan(&list.,&count.)) into
           

            %do i = 1 %to %eval(&count.-1);
                    :%scan(&list.,&i.)_sum,
            %end;
                   :%scan(&list.,&count.)_sum
from compare
where %do i = 1 %to %eval(&count.-1);
                  var&i. = 1 or
           %end;
                 var&count. = 1;
quit;

proc sql;
   create table compare as
   select *, %do i = 1 %to %eval(&count.-1);
                      sum(%scan(&list.,&i.)) as total_%scan(&list.,&i.),
                %end;
                      sum(%scan(&list.,&count.)) as total_%scan(&list.,&count.)
   from compare;
quit;

data final(drop = total: sum:);
  set compare;
  %do i = 1 %to &count.;
       sum&i. = symget("%scan(&list.,&i.)_sum");
      %scan(&list.,&i.)_percent = sum&i./total_%scan(&list.,&i.)*100;
  %end;
run;

%mend;

options mlogic mprint;
%match;

Hope this macro help you to achive your output using SAS...

-Urvish

genetist
Calcite | Level 5

Urvish,

Thank you very much for your help and spending your valueable time to help me. I have character data and my first two numbers are parents. first i want to see how many of my vars are showing different values for parents (first 2 obs) and then i will count them. I will count total 1's and then divide them with number of vars that are different among two parents and then i will calculate % match. (ex sum of 1's/40 (different vars among parents) and then i will give rank from descending order. I think you code will work for it, i will try it first my code and then your if i get any problem i will contact you.

Thanking you very much,

Regards,

Reddy

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1066 views
  • 3 likes
  • 3 in conversation