Help using Base SAS procedures

Using proc freq on pairs of variables (error in the merge and do-loop of my macro)

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 13
Accepted Solution

Using proc freq on pairs of variables (error in the merge and do-loop of my macro)

Hello SAS community,

First, I want to say how grateful I am to find this discussion board. I think it is so valuable there is a community of people who are willing to help one another with their SAS programming questions and I want to thank anyone in advance who comments and replies. I hope one day I am a good enough SAS programmer to return the favor and help others.

My goal is to take a dataset of multiple variables and find how many times unique combinations of pairs of variables exist in the dataset. First, I created a fake dataset with three variables called y1, y2, y3. Then, I have a macro that uses proc freq on each pair of combinations. The last step is to combine the frequencies (and convert it to a fraction) into one data set with the original variables.

Here is my SAS program:

data test_data;

do i = 1 to 10;

cod = i;

call streaminit(123);

u1 = rand("Uniform");

y1 = ceil(5*u1);


call streaminit(456);

u2 = rand("Uniform");

y2 = ceil(5*u2);


call streaminit(789);

u3 = rand("Uniform");

y3 = ceil(5*u3);

output;

end;

drop u1 u2 u3 i;

run;

/*proc print data = test_data; run;*/

%macro pairedfreqs(dataset, order, k, vars);

proc freq data = &dataset order=data noprint;

tables y1*y2/ out = pfreq1 list nopct nocum;

run;

proc freq data = &dataset order=data noprint;

tables y1*y3/ out = pfreq2 list nopct nocum;

run;

proc freq data= &dataset order=data noprint;

tables y2*y3/ out = pfreq3 list nopct nocum;

run;

%do i=1 %to &k;

proc sort data=&dataset;

by %scan(&vars,&i);

run;


proc sort data=pfreq&i;

by %scan(&vars,&i);

run;


data f&i(drop=count percent);

merge &dataset pfreq&i;

by %scan(&vars,&i);

pfreq&i=percent/100;

run;


proc sort;

by ℴ

run;

%end;


data all_freqs;

merge %do i=1 %to &k; f&i %end;;

by ℴ

run;

%mend;

%pairedfreqs(dataset=test_data,order=cod,k = 3, vars= y1 y2 y3);

The problem arises with my do-loop, specifically where I create dataset "f2" because there is no variable y2 in this dataset. So the final data set of interest, all_freqs, has missing values for pfreq2 even though the actual data set pfreq2 does exist and is correct.

I have searched the internet and these discussion boards, and I have tried many different things but I am stuck. Any advice on fixing what I have so the correct data set results is appreciated.

The second step, or my second question, is then how to edit my macro so that the idea behind this procedure can extend to datasets with any number of variables. Meaning, change the macro contents so that if someone has a dataset with 5 variables (and is using the macro with the argument vars = var1 var2 var3 var4 var5), the macro will find all the combinations of pairs of two variables, find the frequencies (convert to a decimal), and then merge together in a final data set.

If further clarification is needed, let me know. Thanks!!


Accepted Solutions
Solution
‎12-23-2012 03:51 PM
PROC Star
Posts: 7,467

Re: Using proc freq on pairs of variables (error in the merge and do-loop of my macro)

This is an expanded version of Tom's suggested method.  It is probably more convoluted than it needs to be, but appears to be a way of accomplishing what you want and should be easy to generalize to any number of variables:

data test_data;

  do i = 1 to 10;

    cod = i;

    call streaminit(123);

    u1 = rand("Uniform");

    y1 = ceil(5*u1);

    call streaminit(456);

    u2 = rand("Uniform");

    y2 = ceil(5*u2);

    call streaminit(789);

    u3 = rand("Uniform");

    y3 = ceil(5*u3);

    output;

  end;

  drop u1 u2 u3 i;

run;

PROC MEANS DATA=WORK.TEST_DATA NOPRINT;

  WAYS 0,2;

  VAR cod;

  CLASS y1 y2 y3;

  OUTPUT  OUT=WORK.TEST_DATA_STATS (drop=_freq_) N()= / AUTONAME;

RUN;

proc sql noprint;

  select count(_type_)-1

    into :total_types

      from (select distinct _type_

              from test_data_stats)

  ;

  select cod_n

    into :total

      from test_data_stats

        where _type_ eq 0

  ;

quit;

proc sort data=test_data_stats;

  by descending _type_;

run;

data test_data_stats (drop=_: cod_N);

  set test_data_stats (where=(_type_ ne 0));

  array pfreq(&total_types);

  if _type_ ne lag(_type_) then types+1;

  pfreq(types)=cod_N/&total;

run;

data want;

  set test_data;

run;

proc sql noprint;

  create table test_data_stats_plus

    as select *

      from test_data_stats,

           test_data (keep=cod)

        order by cod

  ;

  select name

    into :vars separated by " "

      from dictionary.columns

        where libname="WORK" and

              memname="WANT" and

              name like 'y%'

  ;

quit;

%macro finish;

  %do i=1 %to &total_types;

    data type_&i (drop=types);

      set test_data_stats_plus

        (where=(types eq &i));

    run;

  %end;

  %let type=0;

  %do i=1 %to %eval(&total_types.-1);

    %do j=%eval(&i+1) %to &total_types.;

      %let type=%eval(&type+1);

      proc sort data=type_&type;

        by cod %scan(&vars,&i) %scan(&vars,&j);

      run;

      proc sort data=want;

        by cod %scan(&vars,&i) %scan(&vars,&j);

      run;

      data want;

        update want (in=ina) type_&type;

        by cod %scan(&vars,&i) %scan(&vars,&j);

        if ina;

      run;

    %end;

  %end;

%mend;

%finish

View solution in original post


All Replies
PROC Star
Posts: 7,467

Re: Using proc freq on pairs of variables (error in the merge and do-loop of my macro)

Given your sample dataset, can you post a file that looks like the final dataset you want to achieve?

Occasional Contributor
Posts: 13

Re: Using proc freq on pairs of variables (error in the merge and do-loop of my macro)

Hey Arthur,

This is the current output:

                     Obs         cod         y1         y2         y3         pfreq1    pfreq2         pfreq3       

                       1           1          3               1          1           0.1        .             0.3

                       2           2          2               2          2           0.1        .             0.1

                       3           3          2               1         1           0.1        .             0.3

                       4           4          1               5          2           0.1        .             0.1

                       5           5          3               5          4           0.1        .             0.1

                       6           6          5               2          5           0.1        .             0.1

                       7           7          1               3          5           0.1        .             0.2

                       8           8          1               1          1           0.1        .             0.3

                       9           9          4               4          5           0.1        .             0.1

                      10          10        4               3          5          0.1        .             0.1

This is the desired output:

                    Obs    cod         y1         y2         y3       pfreq1         pfreq2         pfreq3    

                       1      1          3          1          1           0.1             0.1            0.3

                       2      2          2          2          2           0.1             0.1            0.1

                       3      3          2          1          1           0.1             0.1            0.3

                       4      4          1          5          2           0.1             0.1            0.1

                       5      5          3          5          4           0.1             0.1            0.1

                       6      6          5          2          5           0.1             0.1            0.1

                       7      7          1          3          5           0.1             0.1            0.2

                       8      8          1          1          1           0.1             0.1            0.3

                       9      9          4          4          5           0.1             0.2            0.1

                      10     10         4          3          5          0.1             0.2            0.1

PROC Star
Posts: 7,467

Re: Using proc freq on pairs of variables (error in the merge and do-loop of my macro)

What are pfreq1 thru pfreq3 supposed to represent?

Taking your problem out of macro land for the moment, the following produces the counts you want:

proc freq data= test_data order=data noprint;

tables cod*y1*y2*y3/ out = pfreq list nopct nocum;

run;

proc sort data=pfreq;

  by cod;

run;

Occasional Contributor
Posts: 13

Re: Using proc freq on pairs of variables (error in the merge and do-loop of my macro)

Hey Arthur,

Your suggest code is only giving me one column of counts (which all display the value 1) not the three separate pairs of counts which should have the values I posted above (pfreq1, pfreq2, pfreq3) * 100.

pfreq1 - 3 are the pairs of frequencies (well, converted into decimals).

For example:

y1     y2

3         1

1         2

3        1

The pfreq values would be 2/3, 1/3, and 2/3 because the combination of y1 = 3 and y2 = 1 occurs twice and the combination y1 = 1 and y2 = 2 occurs once.

I hope that makes sense but if further clarification is needed let me know.

PROC Star
Posts: 1,167

Re: Using proc freq on pairs of variables (error in the merge and do-loop of my macro)

I'm not sure about your post-processing, but you might find this a little easier way to get all of your two-way counts than your macro loop:

PROC MEANS DATA=WORK.TEST_DATA NOPRINT;

WAYS 2;

VAR cod;

CLASS y1 y2 y3;

OUTPUT  OUT=WORK.TEST_DATA_STATS N()= / AUTONAME;

RUN;

Tom

Occasional Contributor
Posts: 13

Re: Using proc freq on pairs of variables (error in the merge and do-loop of my macro)

Hey Tom,

I like this idea because I think I know how to extend it for my second question (how to adjust the program for any sized data set).

Any idea on how to adjust the proc means output so the desired output I posted above is displayed? Instead, your proc means currently outputs 26 rows but I want to collapse the information back to the original 10 observations.

Thanks!

Solution
‎12-23-2012 03:51 PM
PROC Star
Posts: 7,467

Re: Using proc freq on pairs of variables (error in the merge and do-loop of my macro)

This is an expanded version of Tom's suggested method.  It is probably more convoluted than it needs to be, but appears to be a way of accomplishing what you want and should be easy to generalize to any number of variables:

data test_data;

  do i = 1 to 10;

    cod = i;

    call streaminit(123);

    u1 = rand("Uniform");

    y1 = ceil(5*u1);

    call streaminit(456);

    u2 = rand("Uniform");

    y2 = ceil(5*u2);

    call streaminit(789);

    u3 = rand("Uniform");

    y3 = ceil(5*u3);

    output;

  end;

  drop u1 u2 u3 i;

run;

PROC MEANS DATA=WORK.TEST_DATA NOPRINT;

  WAYS 0,2;

  VAR cod;

  CLASS y1 y2 y3;

  OUTPUT  OUT=WORK.TEST_DATA_STATS (drop=_freq_) N()= / AUTONAME;

RUN;

proc sql noprint;

  select count(_type_)-1

    into :total_types

      from (select distinct _type_

              from test_data_stats)

  ;

  select cod_n

    into :total

      from test_data_stats

        where _type_ eq 0

  ;

quit;

proc sort data=test_data_stats;

  by descending _type_;

run;

data test_data_stats (drop=_: cod_N);

  set test_data_stats (where=(_type_ ne 0));

  array pfreq(&total_types);

  if _type_ ne lag(_type_) then types+1;

  pfreq(types)=cod_N/&total;

run;

data want;

  set test_data;

run;

proc sql noprint;

  create table test_data_stats_plus

    as select *

      from test_data_stats,

           test_data (keep=cod)

        order by cod

  ;

  select name

    into :vars separated by " "

      from dictionary.columns

        where libname="WORK" and

              memname="WANT" and

              name like 'y%'

  ;

quit;

%macro finish;

  %do i=1 %to &total_types;

    data type_&i (drop=types);

      set test_data_stats_plus

        (where=(types eq &i));

    run;

  %end;

  %let type=0;

  %do i=1 %to %eval(&total_types.-1);

    %do j=%eval(&i+1) %to &total_types.;

      %let type=%eval(&type+1);

      proc sort data=type_&type;

        by cod %scan(&vars,&i) %scan(&vars,&j);

      run;

      proc sort data=want;

        by cod %scan(&vars,&i) %scan(&vars,&j);

      run;

      data want;

        update want (in=ina) type_&type;

        by cod %scan(&vars,&i) %scan(&vars,&j);

        if ina;

      run;

    %end;

  %end;

%mend;

%finish

Occasional Contributor
Posts: 13

Re: Using proc freq on pairs of variables (error in the merge and do-loop of my macro)

Happy, happy holidays to you Arthur!

Everything is fine and dandy with your code and I was able to successfully include it in my macro. I have one more question, if it isn't too much. I don't understand this part of your code:

select name

    into :vars separated by " "

      from dictionary.columns

        where libname="WORK" and

              memname="WANT" and

              name like 'y%'

  ;

Specifically, the last like "name like 'y%'. I am trying to allow users to have a different name for their variables other than y#. Is there another line that I could use instead in case the names of their variables are var1, var2, var3, ... etc? Or if the names were height, weight, age, etc.?

Thank you!!

Super Contributor
Posts: 1,636

Re: Using proc freq on pairs of variables (error in the merge and do-loop of my macro)

using:

select name

    into :vars separated by " "

      from dictionary.columns

        where libname="WORK" and

              memname="WANT" ;

PROC Star
Posts: 7,467

Re: Using proc freq on pairs of variables (error in the merge and do-loop of my macro)

Happy holidays to you and everyone else as well.  That part of the code was simply automating the creation of macro variable &vars so, in your case, it would create a variable that was equal to "y1 y2 y3".

You could always just allow the user to enter the variables in either a %let statement, a macro call, or by indicating which variables should be excluded from the list.  However, you would have to ensure that a relationship was created between that variable list and the resulting combinations (i.e., pfreq1, pfreq2 and pfreq3).  Conversely, what might be more manageable, would be to recode such a variable list into y1, y2, etc.

However, realize that I'm a Psychologist NOT a programmer, thus there could easily be a better way to address what you are trying to do.  I, typically, just try to find a good solution, then leave it to others to optimize the concept.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 721 views
  • 3 likes
  • 4 in conversation