frequency table

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 10
Accepted Solution

frequency table

probably a simple use of Proc Freq, but I can't seem to output what's needed:

have:

B     C

A1     A1

A2     A2

A4     A3

A5     A6

A7

A8

A9

I need to know:

      (1) How many observations are shared between 'B' and 'C'

      (2) How many are in 'B' only, and

      (3) How many are in 'A' only.

Many thanks!


Accepted Solutions
Solution
‎06-13-2012 01:42 PM
Respected Advisor
Posts: 3,777

Re: frequency table

It's probably best done with PROC FREQ I don't know.  I saw the problem as a merge BY ...; using IN= variables.

data a;

   infile cards missover;

   input (b c)($);

   cards;

A1     A1

A2     A2

A4     A3

A5     A6

A7

A8

A9

;;;;

   run;

data a(drop=a);

   if eof then output;

   merge

      a(in=in1 keep=b rename=b=a where=(not missing(a)))

      a(in=in2 keep=c rename=c=a where=(not missing(a)))

      end=eof;

   by a;

   shared + in1 & in2;

   bonly  + in1 and not in2;

   conly  + in2 and not in1;

   run;

proc print;

   run;

View solution in original post


All Replies
Trusted Advisor
Posts: 2,113

Re: frequency table

you need to reorganize your data as

group response

B          A1

B          A2

B          A4

B          A5

...

C          A1

C          A2

C          A3

...

Then it is

PROC FREQ;

  TABLES group*response;

  RUN;

Solution
‎06-13-2012 01:42 PM
Respected Advisor
Posts: 3,777

Re: frequency table

It's probably best done with PROC FREQ I don't know.  I saw the problem as a merge BY ...; using IN= variables.

data a;

   infile cards missover;

   input (b c)($);

   cards;

A1     A1

A2     A2

A4     A3

A5     A6

A7

A8

A9

;;;;

   run;

data a(drop=a);

   if eof then output;

   merge

      a(in=in1 keep=b rename=b=a where=(not missing(a)))

      a(in=in2 keep=c rename=c=a where=(not missing(a)))

      end=eof;

   by a;

   shared + in1 & in2;

   bonly  + in1 and not in2;

   conly  + in2 and not in1;

   run;

proc print;

   run;

Occasional Contributor
Posts: 10

Re: frequency table

this is great, many thanks!

Respected Advisor
Posts: 3,124

Re: frequency table

Hi _Null_,

Should 'missing' value be considered as a 'value' at all?  the question was raised from a practice using Hash() to answer the question:

data have;

infile cards truncover;

input B$ C$;

cards;

A1 A1

A2 A2

A4 A3

A5 A6

A7 .

A8 .

A9 .

;

data want;

  if _n_=1 then do;

  if 0 then set have;

dcl hash hb(dataset: 'have(keep=b)', ordered: 'a');

hb.definekey('b');

hb.definedata('b');

hb.definedone();

dcl hiter bt('hb');

dcl hash hc(dataset: 'have(keep=c)', ordered:'a');

hc.definekey('c');

hc.definedata('c');

hc.definedone();

dcl hiter ct('hc');

end;

_rc=ct.first();

do _rc=0 by 0 while (_rc=0);

_rc=hb.check(key:c);

if _rc ne 0 then conly+1;

else share+1;

_rc=ct.next();

end;

_rc=bt.first();

do _rc=0 by 0 while (_rc=0);

_rc=hc.check(key:b);

if _rc ne 0 then bonly+1;

_rc=bt.next();

end;

  output;

  stop;

run;

proc print;run;

The results shows conly=3. it turned out that the missing values of 'c' has been counted in.

Regards,

Haikuo

Respected Advisor
Posts: 3,777

Re: frequency table

I don't think so.  Add a WHERE data set option in your declare hash dataset: parameter.

Respected Advisor
Posts: 3,124

Re: frequency table

Thanks, Null.

Haikuo

Super Contributor
Posts: 1,636

Re: frequency table

Hi DN and Haikuo,

I am praticing Hash too. what do you mean by "Add a WHERE data set option in your declare hash dataset: parameter"? How should I modify Haikuo's code to get rid of the missing value?

Thanks - Linlin

Respected Advisor
Posts: 3,777

Re: frequency table

dcl hash hc(dataset: 'have(keep=c where=(not missing(c)))', ordered:'a');

I think that should work

Super Contributor
Posts: 1,636

Re: frequency table

Similar to data _null_:

data a;
   infile cards missover;
   input (b c)($);
   cards;
A1     A1
A2     A2
A4     A3
A5     A6
A7
A8
A9
;;;;
   run;

data b(keep=b rename=(b=v)) c(keep=c rename=c=v);
set a;
run;
proc sort data=c(where=(not missing(v)));by v;run;
proc sort data=b;by v;

data n(drop=v);
merge b(in=z) c(in=y) end = e;;
by v;
if z and y then both+1;
else if z and not y then b_only+1;
else c_only+1;
if e then output;
run;
proc print; run;
                                 Obs    both    b_only    c_only

                                  1       2        5         2

Linlin

Valued Guide
Posts: 765

Re: frequency table

hi ... using the data_null_ data set (given my SQL skills, there may be a less verbose way to do "BOTH B AND C") ...

proc sql;

title 'only B';

select count(*) from (select b from have except select c as b from have);

title 'only C';

select count(*) from  (select c from have (where=(^missing(c))) except select b as c from have);

title 'both B and C';

select count(*) from (select have.b, have2.c from have, have as have2 where have.b eq have2.c);

quit;

Super Contributor
Posts: 1,636

Re: frequency table

Hi Mike,

I am looking forward to your presentation on June 26th.

How about:

proc sql;

title "in b not c";

select count(*) from (select b from a where b ne c and b ne ' ');

title "in c not b";

select count(*) from (select c from a where b ne c and c ne ' ');

title "in both b and c";

select count(*) from (select b from a where b = c);

quit;

Valued Guide
Posts: 765

Re: frequency table

hi ... I think that your SQL code gives the correct counts by accident since if you change the data set, you get the the wrong answer (add A8 as a value for variable C)

data a;

infile cards missover;

input (b c)(:$2.);

cards;

A1     A1

A2     A2

A4     A3

A5     A6

A7

A8

A9     A8

;

your code gives ...

in b not c

       5 (should be 4 ... A4, A5, A7, A9)

in c not b

       3 (should be 2 ... A3, A6)

in both b and c

       2 (should be 3 ... A1, A2, A8)

there are only 9 unique values (A1 through A9) so there cannot be 10 outcomes (5+3+2), yes/no?

the stuff I posted still works ...

only B

       4

only C

       2

both B and C

       3

ps  I'm looking forward to speaking in the AM and teaching in the PM (plus I get  to learn stuff from the other presenters ... Darrell Massengill and Louise Hadden)

Super Contributor
Posts: 1,636

Re: frequency table

Thank you Mike!

How about the one below:

data a;

   infile cards missover;

   input (b c)($);

   cards;

A1     A1

A2     A2

A4     A3

A5     A6

A7     A8

A8

A9

;;;;

proc sql;

title "in b not c";

select count(*) from (select b from a where b not in select c from a);

title "in c not b";

select count(*) from (select c from a(where=(^missing(c))) where c not in select b from a );

title "in both b and c";

select count(*) from (select b from a where b in select c from a);

quit;

Linlin

Respected Advisor
Posts: 4,646

Re: frequency table

Those are set theory questions. SQL has set operators ready made, as Mike hinted (starting with Data_Null_; dataset) :

proc sql;

select condition, n from

(select "Shared" as condition, count(b) as n from (select b from have intersect select c from have))

union all

(select "B_only", count(b) from (select b from have except select c from have))

union all

(select "C_only", count(c) from (select c from have except select b from have));

Note: as stated in the documentation " Set operators combine columns from two queries based on their position in the referenced tables without regard to the individual column names" when you do not require CORRESPONDING operations.

PG

Simplified by PG.

PG
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 16 replies
  • 668 views
  • 3 likes
  • 8 in conversation