turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Statistic to measure percentage of agreement betwe...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-18-2016 04:45 AM

Hi friends,

I have a dataset that contains 5 dichotomous (yes/no) variables: we will call it SUPER_DATA:

t1 t2 t3 t4 t5

1 0 1 0 1

1 0 0 1 1

.

.

.

.

.

etc.

I am trying to write some macro code that will produce a comparison between each of the variables.

The comparison will give a percentage of how often the two variables "agree."

So if we are comparing t1 and t2, i wish to compute the following:

s = (count(t1 = 1 and t2= 1) or count(t1=0 and t2 = 0)) / total number of observations.

(i.e. percentage of the time both variables are 1 or 0)

for each of the variables.

The macro will compute this statistic for all 25 comparisons (5*5).

The approach i have tried so far is creating a dataset for each comparison that only contains the observations from the two comparison variables thatt agree. Then i was planning on somehow counting the observations in each dataset and dividing by the size of the original dataset. However i couldnt get this to work, and o feel like there would be a better way to do this.

Any help is much appreciated.

Cheers.

Accepted Solutions

Solution

05-18-2016
10:36 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-18-2016 05:24 AM - edited 05-18-2016 05:25 AM

Hello,

No macro needed. This may be a good start:

data have; input t1 t2 t3 t4 t5; datalines; 1 0 1 0 1 1 0 0 1 1 ; data want; set have nobs=totalobs; array t{*} t1--t5; /*some automation needed here */ array countt{*} t1t2 t1t3 t1t4 t1t5 t2t3 t2t4 t2t5 t3t4 t3t5 t4t5; counterperc=1; do i=1 to dim(t); countervars=i+1; do j=counterperc to dim(countt) while (countervars le dim(t)) ; countt{j} = (t{i}=t{countervars}) / totalobs; countervars=countervars+1; counterperc=counterperc+1; end; end;

drop countervars counterperc j i; run;

All Replies

Solution

05-18-2016
10:36 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-18-2016 05:24 AM - edited 05-18-2016 05:25 AM

Hello,

No macro needed. This may be a good start:

data have; input t1 t2 t3 t4 t5; datalines; 1 0 1 0 1 1 0 0 1 1 ; data want; set have nobs=totalobs; array t{*} t1--t5; /*some automation needed here */ array countt{*} t1t2 t1t3 t1t4 t1t5 t2t3 t2t4 t2t5 t3t4 t3t5 t4t5; counterperc=1; do i=1 to dim(t); countervars=i+1; do j=counterperc to dim(countt) while (countervars le dim(t)) ; countt{j} = (t{i}=t{countervars}) / totalobs; countervars=countervars+1; counterperc=counterperc+1; end; end;

drop countervars counterperc j i; run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-18-2016 10:38 PM

Thanks for this mate, along with this and using a proc summary as suggested by @FreelanceReinhard i've got it working perfectly.

Cheers.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-18-2016 05:34 AM

You have a bit of comparisons. The data step code is more efficient but I thought this may b a useful read since it sounds like an agreement statistic, which is a proc freq.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-18-2016 05:51 AM

It is more convenient for IML code .

What kind of output do you want ?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-18-2016 05:55 AM

Something like this:

t1t2 t1t3 t1t4 .....

0.6 0.7 0.8 ......

So just one row with each column representing a comparison between two of

the variables.

##- Please type your reply above this line. Simple formatting, no

attachments. -##

t1t2 t1t3 t1t4 .....

0.6 0.7 0.8 ......

So just one row with each column representing a comparison between two of

the variables.

##- Please type your reply above this line. Simple formatting, no

attachments. -##

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-18-2016 06:26 AM - edited 05-18-2016 06:26 AM

That's what @Loko's data step was made for. Just summarize the WANT dataset:

```
proc summary data=want;
var t1t2--t4t5;
output out=agreement(drop=_:) sum=;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-18-2016 07:50 AM

OK . Here is.

data have; input t1 t2 t3 t4 t5; datalines; 1 0 1 0 1 1 0 0 1 1 ; run; proc transpose data=have(obs=0) out=temp1;run; proc transpose data=temp1 out=temp2(drop=_name_ _label_); var _name_; run; data _null_; set temp2 end=last; array x{*} $ _character_; if _n_=1 then call execute('proc sql;create table want as select '); do i=1 to dim(x)-1; do j=i+1 to dim(x); call execute(catx(' ','sum(',x{i},'=',x{j},')/count(*) as ',cats(x{i},'_',x{j}))); if not (i=dim(x)-1 and j=dim(x)) then call execute(','); end; end; if last then call execute('from have;quit;'); run;