BookmarkSubscribeRSS Feed
sri1
Obsidian | Level 7

Hi all,

 

I just came across the following code in one of the SAS papers for pairwise comparisons .I have tried applying this to one of my programs on SAS Enterprise but the resulting dataset contains duplicates.May I know why?Could you also please  explain the highlighted parts...what SAS is actually doing?

Note : I have made one change from the original program i.e instead of output out have used ods output

 

%macro Pairwise_Chisq(data=, group=, outcome=, count=);
 proc freq data=&data noprint;
 table &group/out=out_group;
 run;
 data groupID;
 set out_group;
 GID=_n_;
 run;
 data _null_;
set groupID nobs=nobs;
 call symput('num_group', compress(put(nobs, 11.)));
 stop;
 run;
 data ChiTest; *store pairwise Chisquare statistics to this;
 stop;
 run;
 %Do group1=1 %to &num_group-1; *for each pair obtain chisquare stat;
 %Do group2=&group1+1 %to &num_group;
 data group_pair;
 set groupID;
 if GID EQ &group1 then call symput('name_group1', type);
 if GID EQ &group2 then call symput('name_group2', type);
 run;
 proc freq data=&data noprint; output out=CTout PCHI;
 tables &group*&outcome /chisq cellchi2 nopercent;
 where &group in ("&name_group1", "&name_group2");
 weight &count;
 run;
 data CTout1;
 set CTout;
 PAIR_COMPARED="&name_group1"||" vs "||"&name_group2";
 run;
 data ChiTest;
 set ChiTest CTout1;
 run;
 %End;
 %End;
 data ChiTest; *rename variables for final results;
 set ChiTest;
 CHI_STAT=_PCHI_; DF=DF_PCHI; P_VALUE=P_PCHI;
 keep pair_compared Chi_stat DF p_value;
 run;
%mend Pairwise_Chisq;

 

 

Your suggestions are highly appreciated

 

Many Thanks

14 REPLIES 14
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well your highlighted parts are very simple.

The first creates an empty dataset, the second sets the output values into one concatenated character variable, then sets this final dataset.

 

As for what the code is doing other than that, or creating duplicates, there is no way we can tell you.  We do not know what the input data is to this, what the output should be, or the process.  If you want some help with code off the internet, its advisable to go back to the author of that code.  It could be something as obscure as what they wanted is not what you wanted, and I can't tell that.

sri1
Obsidian | Level 7

Thanks RW9.May I know why we need to create an empty dataset here.Please find enclosed attachments for input data and output

 

Input datasetInput datasetOutputOutput

 

 Thanks

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

You need an empty dataset to store the results.  Basically each loop creates some dataand that data gets added to the empty dataset.  Otherwise each iteration would overwrite the data you had before and you would end up with only the last data stored.

As for debugging that, thats really your job Smiley Happy  

FreelanceReinh
Jade | Level 19

Hi @sri1 and welcome to the SAS Support Communities!

 

First of all, the macro needs a correction:

    if GID EQ &group1 then call symput('name_group1', type);
    if GID EQ &group2 then call symput('name_group2', type);

Apparently, the author forgot to replace the variable name type, which is specific to their example dataset, with the corresponding macro parameter reference:

    if GID EQ &group1 then call symput('name_group1', &group);
    if GID EQ &group2 then call symput('name_group2', &group);

If you still get duplicates after this correction, please provide more details, ideally an example dataset (in the form of a datastep: see melanoma data in http://support.sas.com/kb/22/565.html; fake data is fine).

sri1
Obsidian | Level 7

FreelanceReinh
Jade | Level 19

@sri1 wrote:


Are you saying that your output dataset does not look like the "Output" (containing 6 observations) shown in your previous post?

sri1
Obsidian | Level 7
Yes that's right..Sorry I just forgot to mention I have added a by variable gender to the above data
Here is the code I have used
%macro Pairwise_Chisq(data=, group=, outcome=, count=);
proc freq data=&data noprint;
by gender;
table &group/out=out_group;
run;
data groupID;
set out_group;
GID=_n_;
run;
data _null_;
set groupID nobs=nobs;
call symput('num_group', compress(put(nobs, 11.)));
stop;
run;
data ChiTest; *store pairwise Chisquare statistics to this;
stop;
run;
%Do group1=1 %to &num_group-1; *for each pair obtain chisquare stat;
%Do group2=&group1+1 %to &num_group;
data group_pair;
set groupID;
if GID EQ &group1 then call symput('name_group1', &group);
if GID EQ &group2 then call symput('name_group2', &group);
run;
proc freq data=&data noprint; output out=CTout PCHI;
by gender;
tables &group*&outcome /chisq cellchi2 nopercent;
where &group in ("&name_group1", "&name_group2");
weight &count;
run;
data CTout1;
set CTout;
PAIR_COMPARED="&name_group1"||" vs "||"&name_group2";
run;
data ChiTest;
set ChiTest CTout1;
run;
%End;
%End;
data ChiTest; *rename variables for final results;
set ChiTest;
CHI_STAT=_PCHI_; DF=DF_PCHI; P_VALUE=P_PCHI;
keep pair_compared Chi_stat DF p_value;
run;
%mend Pairwise_Chisq;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

You have answered your own question then.  You are not getting duplicates records out, one is for male, one is for female.

sri1
Obsidian | Level 7

I am not really sure they are not duplicates since every value is similar for corresponding variables
and also am getting combinations similar to Hutchinson's Vs
Hutchinson's

I have a similar dataset as below

 

Type            Site   Gender  COUNT
Hutchinson's Head    M      1
Hutchinson's Neck    M      35
Nodular      Head    M      4
Nodular      Neck    M      31
Superficial  Head    M      1
Superficial  Neck    M     38
Hutchinson's Neck    F     36
Nodular      Neck    F     36
Superficial  Neck    F     36
Hutchinson's Head    M     1
Hutchinson's Neck    M     11
Nodular      Head    M     1
Nodular      Neck    M     15
Superficial  Head    M     1
Superficial  Neck    M     15
Hutchinson's Neck    F     12
Nodular      Neck    F     12
Superficial  Head    F     1
Superficial  Neck    F     11


and when I am performing fisher exact as similar to chisquare am not getting the desired output

%macro Pairwise_Chisq(data=, group=, outcome=,count=);
proc freq data=&data noprint;
by gender;
table &group/out=out_group;
run;
data groupID;
set out_group;
GID=_n_;
run;
data _null_;
set groupID nobs=nobs;
call symput('num_group', compress(put(nobs, 11.)));
stop;
run;
data fish;
stop;
run;
%Do group1=1 %to &num_group-1; *for each pair obtain fish stat;
%Do group2=&group1+1 %to &num_group;
data group_pair;
set groupID;
if GID EQ &group1 then call symput('name_group1', &group);
if GID EQ &group2 then call symput('name_group2', &group);
run;
ods output RiskDiffCol2=pcr_diff(where=(Row="Difference"));
proc freq data=&data;
by gender;
tables &group*&outcome / riskdiff(cl=exact) ;
exact riskdiff;
where &group in ("&name_group1", "&name_group2");
weight &count;
run;
data diff1;
set pcr_diff (where=(row="Difference"));
PAIR_COMPARED="&name_group1"||" vs "||"&name_group2";
run;
data fish;
set fish diff1;
run;
%End;
%End;
data fish;
set fish;
keep gender type LowerCL UpperCL pair_compared Risk;
run;
%mend Pairwise_Chisq;

 

Please kindly suggest

 

Thanks

FreelanceReinh
Jade | Level 19

The original macro was not prepared for BY-group processing and the two BY statements you inserted are not sufficient. A simple workaround would be to run the original macro once for each BY group like this:

%Pairwise_Chisq(data=your_data(where=(gender='F')), ...);
... /* evaluate output dataset ChiTest here */
%Pairwise_Chisq(data=your_data(where=(gender='M')), ...);
... /* evaluate new output dataset ChiTest */
s_lassen
Meteorite | Level 14

Ah, so you want a by variable. I would suggest something like this:

%macro Pairwise_Chisq(data=, group=, outcome=, count=,by=);
  %local byclause bySQL prxid i w;
  %if %length(&by) %then %do;
    %let by=%sysfunc(compbl(&by));
    %let bySQL=%sysfunc(tranwrd(&by,%str( ),%str(,a.))),;
    %let prxid=%sysfunc(prxparse(s/(\S+)/ and a.$1=b.$1/));
    %let ByClause=%sysfunc(prxchange(&prxid,-1,&by));
    %syscall Prxfree(prxid);
    %put _local_;
    %end;

  proc sql;
    create table groups as select distinct
      a.&bySQL a.&group as group1,b.&group as group2
    from &data a, &data b
    where a.&group<b.&group
    order by a.&bySQL a.&group,b.&group;
    ;
  quit;

  data ChiTest;
    %if %length(&by) %then
      set &data(keep=&by);
      ;
    length Pair_compared $100 Chi_stat DF P_value 8;
    stop;
  run;

  filename tempsas temp;
  data _null_;
    set groups;
    file tempsas;
    put
      "proc freq data=&data noprint;"                    /
      "  output out=CTout PCHI;"                         /
      "tables &group*&outcome /chisq cellchi2 nopercent;"
      ;
    length wherecls $1000 byvalues $200;
    if vtype(group1)='N' then
      wherecls=cats("&group in(",group1,',',group2,')');
    else
      wherecls=cats("&group in ('",group1,"','",group2,"')");
    %do i=1 %to %sysfunc(countw(%str( )&by));
      %let w=%scan(&by,&i);
      if vtype(&w)='N' then do;
        call catt(wherecls," and &w=", &w);
        call cats(byvalues,&w,',');
        end;
      else do;
        call catt(wherecls," and &w='", &w,"'");
        call cats(byvalues,"'",&w,"',");
        end;
      %end;
     put
       'where ' WhereCls ';';
    if "&count" ne " " then
      put "  weight &count;";
    put
      'run;'                                                     /
      'proc sql;'                                                /
      "  insert into ChiTest(&bySQL Chi_stat,df,P_value,Pair_compared)" /
      '  select ' byvalues '_PCHI_,DF_PCHI,P_PCHI,"' group1 'vs ' group2 '"' /
      '  from CTout a'                                           /
      '  ;'                                                      /
      'quit;'                                                    /
      ;
  run;
  %include tempsas;
%mend Pairwise_Chisq;

It is not really by variable processing, as it is all done in WHERE clauses, so your data does not need to be sorted.

sri1
Obsidian | Level 7

Thanks Lassen!Rather than proc sql would you mind to provide one similar to the original code I have posted since I am not much aware of proc sql

 

 

s_lassen
Meteorite | Level 14

Obviously an old macro, and it has an error: What is the TYPE variable doing in the CALL SYMPUT statement (in the Group_pair data step)? Should it not be &Group?

 

When I ran your code on a dataset, I got no duplicates, the output was nicely ordered by PAIR_COMPARED.

 

You will have to be more specific about what code you submitted, and what results you got if you need more help.

 

I did take a closer look at the macro, though. Here's how I would write such a thing:

%macro Pairwise_Chisq(data=, group=, outcome=, count=);
  proc freq data=&data noprint;
     table &group/out=out_group;
  run;

  proc sql;
    create table groups as select
      a.&group as group1,b.&group as group2
    from out_group a, out_group b
    where a.&group<b.&group
    order by 1,2;
    ;
  quit;

  data ChiTest;
    /* You may want to change the length of the Pair_compared variable */
    length Pair_compared $100 Chi_stat DF P_value 8;
    retain _character_ ' ' _numeric_ .;
    stop;
  run;

  filename tempsas temp;
  data _null_;
    set groups;
    file tempsas;
    put
      "proc freq data=&data noprint;"                    /
      "  output out=CTout PCHI;"                         /
      "tables &group*&outcome /chisq cellchi2 nopercent;"
      ;
    if vtype(group1)='N' then
      incls=cats(group1,',',group2);
    else
      incls=cats("'",group1,"','",group2,"'");
    put "  where &group in(" incls ');';
    if "&count" ne " " then
      put "  weight &count;";
    put
      'run;'                                                     /
      'proc sql;'                                                /
      '  insert into ChiTest(Chi_stat,df,P_value,Pair_compared)' /
      '  select _PCHI_,DF_PCHI,P_PCHI,"' group1 'vs ' group2 '"' /
      '  from CTout'                                             /
      '  ;'                                                      /
      'quit;'                                                    /
      ;
  run;
  %include tempsas;
%mend Pairwise_Chisq;

There are several advantages to this version:

  • The TYPE variable error has been fixed (I think).
  • It works with both numeric and character group variables.
  • The COUNT parameter is now optional, it is not clear that you always have a weight variable.
  • It is much easier to look at the code used to create the output, just take a look at the TEMPSAS file after submitting.
sri1
Obsidian | Level 7

Thanks s_lassen.Could you please kindly give a solution in data step rather than proc sql.I am trying to do pairwise differences with fisher exact using the following code.

I have a similar dataset as below

 

Type            Site   Gender  COUNT
Hutchinson's Head    M      1
Hutchinson's Neck    M      35
Nodular      Head    M      4
Nodular      Neck    M      31
Superficial  Head    M      1
Superficial  Neck    M     38
Hutchinson's Neck    F     36
Nodular      Neck    F     36
Superficial  Neck    F     36
Hutchinson's Head    M     1
Hutchinson's Neck    M     11
Nodular      Head    M     1
Nodular      Neck    M     15
Superficial  Head    M     1
Superficial  Neck    M     15
Hutchinson's Neck    F     12
Nodular      Neck    F     12
Superficial  Head    F     1
Superficial  Neck    F     11


and when I am performing fisher exact as similar to chisquare am not getting the desired output

%macro Pairwise_Chisq(data=, group=, outcome=,count=);
proc freq data=&data noprint;
by gender;
table &group/out=out_group;
run;
data groupID;
set out_group;
GID=_n_;
run;
data _null_;
set groupID nobs=nobs;
call symput('num_group', compress(put(nobs, 11.)));
stop;
run;
data fish;
stop;
run;
%Do group1=1 %to &num_group-1; *for each pair obtain fish stat;
%Do group2=&group1+1 %to &num_group;
data group_pair;
set groupID;
if GID EQ &group1 then call symput('name_group1', &group);
if GID EQ &group2 then call symput('name_group2', &group);
run;
ods output RiskDiffCol2=pcr_diff(where=(Row="Difference"));
proc freq data=&data;
by gender;
tables &group*&outcome / riskdiff(cl=exact) ;
exact riskdiff;
where &group in ("&name_group1", "&name_group2");
weight &count;
run;
data diff1;
set pcr_diff (where=(row="Difference"));
PAIR_COMPARED="&name_group1"||" vs "||"&name_group2";
run;
data fish;
set fish diff1;
run;
%End;
%End;
data fish;
set fish;
keep gender type LowerCL UpperCL pair_compared Risk;
run;
%mend Pairwise_Chisq;

 

Please kindly suggest

 

Thanks

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 2409 views
  • 4 likes
  • 4 in conversation