Solved: Re: Subset for multiple observations per subject

TJ87 · Posted 06-23-2016 07:47 PM

Hi,

My data looks something like this:

ID A B C D E

1 a a b a a

1 a b b b b

2 c c b a a

2 b c c c a

I would like to output where the observations differ for any given column (A-E):

ID B D E

1 a a a

1 b b b

ID A C D

2 c b a

2 b c c

Could someone please suggest some code for this? Thanks.

PGStats · Posted 06-23-2016 11:49 PM

Like this?

data have;
input ID  (A B  C D  E) ($);
cards;
1   a  a  b  a  a
1   a  b  b  b  b
2   c  c  b  a  a  
2   b  c  c  c  a
;

data temp;
length var $12;
set have;
array x $ _character_;
do i = 2 to dim(x);
    var = vname(x{i});
    value = x{i};
    output;
    end;
keep id var value;
run;

proc sql;
create table want as
select id, var
from temp
group by id, var
having count(distinct value) > 1;
quit;

PG

View solution in original post

PGStats · Posted 06-23-2016 10:56 PM

Do you simply want to know which variables differ for each ID, or create one new dataset for each ID ?

PG

TJ87 · Posted 06-23-2016 11:36 PM

I'd just like to know which variables differ for each ID. Thank you.

PGStats · Posted 06-23-2016 11:49 PM

Like this?

data have;
input ID  (A B  C D  E) ($);
cards;
1   a  a  b  a  a
1   a  b  b  b  b
2   c  c  b  a  a  
2   b  c  c  c  a
;

data temp;
length var $12;
set have;
array x $ _character_;
do i = 2 to dim(x);
    var = vname(x{i});
    value = x{i};
    output;
    end;
keep id var value;
run;

proc sql;
create table want as
select id, var
from temp
group by id, var
having count(distinct value) > 1;
quit;

PG

TJ87 · Posted 06-24-2016 12:00 AM

Great, thanks, I'll give this a try.

Ksharp · Posted 06-23-2016 11:34 PM

Assuming I understand what you mean.

data have;
input ID  (A B  C D  E) ($);
cards;
1   a  a  b  a  a
1   a  b  b  b  b
2   c  c  b  a  a  
2   b  c  c  c  a
;
run;

proc sql;
create table temp as
 select id,
        (count(distinct A)=count(*)) as A,
        (count(distinct B)=count(*)) as B,
        (count(distinct C)=count(*)) as C,
        (count(distinct D)=count(*)) as D,
        (count(distinct E)=count(*)) as E
  from have
   group by id;
quit;
proc transpose data=temp out=key(where=(col1=1));
by id;
run;
data _null_;
 set key;
 by id;
 if first.id then call execute(cats('data want',id,';set have;if id=',id,';keep id '));
 call execute(_name_);
 if last.id then call execute(';run;');
run;

TJ87 · Posted 06-23-2016 11:40 PM

Wow, thank you. I'll give this a try. Can you suggest a shorthand notation for the explanatory (A-E) variables? I have about 100. Thanks.

Ksharp · Posted 06-23-2016 11:50 PM

Sure.

data have;
input ID  (A B  C D  E) ($);
cards;
1   a  a  b  a  a
1   a  b  b  b  b
2   c  c  b  a  a  
2   b  c  c  c  a
;
run;

proc transpose data=have(obs=0) out=vnames;
var _all_;
run;

proc sql noprint;
select catx(' ','(count(distinct',_name_,')=count(*)) as',_name_)
       into : list separated by ','
 from vnames
  where upcase(_name_) ne 'ID';

create table temp as
 select id,&list
  from have
   group by id;
quit;
proc transpose data=temp out=key(where=(col1=1));
by id;
run;
data _null_;
 set key;
 by id;
 if first.id then call execute(cats('data want',id,';set have;if id=',id,';keep id '));
 call execute(_name_);
 if last.id then call execute(';run;');
run;

TJ87 · Posted 06-23-2016 11:52 PM

Wow, thank you very much Keshan.

TJ87 · Posted 06-24-2016 02:17 PM

This works, too! Thanks!

Classroom Training Available!