Solved: Get counts by groups across columns

rsva · Posted 06-16-2021 03:58 PM

Greetings,

(Sorry for duplicating questions, lost my previous post).

I have data on 5 partners for each client, partner1-5. I have their test results, POS or NEG for each partner POS1-5 and NEG1-5 (coded 1 if yes, 0 if no). I also have age of the partners, Age1-5. Data snapshot below

Partner1	Partner2	Partner3	Partner4	Partner5	POS1	POS2	POS3	POS4	POS5	NEG1	NEG2	NEG3	NEG4	NEG5	Age1	Age2	Age3	Age4	Age5	NPOS	NNEG
1					1	0	0	0	0	0	0	0	0	0	10-14	15-19	20-24	25-29	30+	1	0
1	1				0	1	0	0	0	1	0	0	0	0	10-14	15-19	10-14	30+	20-24	1	1
1	1	1			1	1	0	0	0	0	0	1	0	0	30+	20-24	25-29	15-19	20-24	2	1
1	1	1	1	1	0	0	1	1	0	1	1	0	0	1	20-24	25-29	10-14	15-19	20-24	2	3

I created the counts for nPOS and nNEG by Sum(of Pos1-Pos5) and Sum(of Neg1-Neg5). I am trying to get count of POS ans NEGs by age group. Meaning the distribution of age group among POS and among NEG. Output below :

	POS	NEG
10-14	2
15-19	2	1
20-24	1	2
25-29		2
30+	1

Any suggestions on how to get the counts would help.

THanks in advance,

tarheel13 · Posted 06-16-2021 04:50 PM

Okay, I think I got something that will work for you but I had to add a variable called ID and it has values 1-4 for each of the 4 rows. Basically, you make 3 datasets and transpose them by pos, neg, and age and then merge all the datasets together. From there, you can sum down the columns grouping by age. Take a look. I tried to attach an image but it's not working. Just print out the dataset test3 and see if it meets your needs.

data test;
input id pos1-pos5 neg1-neg5 age1 $ age2 $ age3 $ age4 $ age5 $;
cards;
1 1 0 0 0 0 0 0 0 0 0 10-14 15-19 20-24 25-29 30+
2 0 1 0 0 0 1 0 0 0 0 10-14 15-19 10-14 30+ 20-24
3 1 1 0 0 0 0 0 1 0 0 30+ 20-24 25-29 15-19 20-24
4 0 0 1 1 0 1 1 0 0 1 20-24 25-29 10-14 15-19 20-24
;
run;

*output 3 datasets from proc transpose, one for age, one for pos, one for neg;
proc transpose data=test out=longp prefix=pos;
	by ID;
	var pos1-pos5;
run;

proc transpose data=test out=longn prefix=neg;
	by ID;
	var neg1-neg5;
run;

proc transpose data=test out=longa prefix=age;
	by ID;
	var age1-age5;
run;

*create test2 ds by merging the 3 transposed datasets together;
data test2;
	merge longp(rename=(pos1=pos) drop=_name_) longn(rename=(neg1=neg) drop=_name_) longa
	(rename=(age1=age));
	by ID;
	drop _name_;
run;

proc print data=test2;run;

proc sql;
	create table test3 as select age, sum(pos) as pos, sum(neg) as neg
		from test2
		group by age;
quit;

View solution in original post

tarheel13 · Posted 06-16-2021 04:50 PM

Okay, I think I got something that will work for you but I had to add a variable called ID and it has values 1-4 for each of the 4 rows. Basically, you make 3 datasets and transpose them by pos, neg, and age and then merge all the datasets together. From there, you can sum down the columns grouping by age. Take a look. I tried to attach an image but it's not working. Just print out the dataset test3 and see if it meets your needs.

data test;
input id pos1-pos5 neg1-neg5 age1 $ age2 $ age3 $ age4 $ age5 $;
cards;
1 1 0 0 0 0 0 0 0 0 0 10-14 15-19 20-24 25-29 30+
2 0 1 0 0 0 1 0 0 0 0 10-14 15-19 10-14 30+ 20-24
3 1 1 0 0 0 0 0 1 0 0 30+ 20-24 25-29 15-19 20-24
4 0 0 1 1 0 1 1 0 0 1 20-24 25-29 10-14 15-19 20-24
;
run;

*output 3 datasets from proc transpose, one for age, one for pos, one for neg;
proc transpose data=test out=longp prefix=pos;
	by ID;
	var pos1-pos5;
run;

proc transpose data=test out=longn prefix=neg;
	by ID;
	var neg1-neg5;
run;

proc transpose data=test out=longa prefix=age;
	by ID;
	var age1-age5;
run;

*create test2 ds by merging the 3 transposed datasets together;
data test2;
	merge longp(rename=(pos1=pos) drop=_name_) longn(rename=(neg1=neg) drop=_name_) longa
	(rename=(age1=age));
	by ID;
	drop _name_;
run;

proc print data=test2;run;

proc sql;
	create table test3 as select age, sum(pos) as pos, sum(neg) as neg
		from test2
		group by age;
quit;

rsva · Posted 06-17-2021 11:16 AM

THanks for the solution.

ballardw · Posted 06-16-2021 05:17 PM

Is it every possible for a partner to have both a positive and negative result? If not you only need one variable with the information.

Here's a different approach to reshaping data with arrays and explicit output statement.

data test;
input id partner1-partner5 pos1-pos5 neg1-neg5 age1 $ age2 $ age3 $ age4 $ age5 $;
cards;
1 1 . . . . 1 0 0 0 0 0 0 0 0 0 10-14 15-19 20-24 25-29 30+
2 1 1 . . . 0 1 0 0 0 1 0 0 0 0 10-14 15-19 10-14 30+ 20-24
3 1 1 1 . . 1 1 0 0 0 0 0 1 0 0 30+ 20-24 25-29 15-19 20-24
4 1 1 1 1 1 0 0 1 1 0 1 1 0 0 1 20-24 25-29 10-14 15-19 20-24
;
run;

data want;
   set test;
   array pa(*) partner1-partner5;
   array p (*) pos1-pos5;
   array n (*) neg1-neg5;
   array a (*) age1-age5;
   do partner=1 to dim(pa);
      if pa[partner]=1 then do; 
         pos=p[partner];
         neg=n[partner];
         age=a[partner];
         output;
      end;
   end; 
   keep id partner pos neg age;
run;

proc tabulate data=want;
  class age;
  var pos neg;
  table age=' ',
        (pos neg) * sum=' '*f=best5.
        /misstext=' '
   ;
run;

You will find that for most cases that a long data set with one observation per set of measurements will work better.

And here is what I meant by using just one variable :

proc format;
value pn 
1=' Pos'
0='Neg'
;
proc tabulate data=want;
  class age;
  class pos/order=formatted;
  format pos pn.;
  table age=' ',
        pos=' ' * n=' '
        /misstext=' '
   ;
run;

Playing some minor games to duplicate the order you showed.

In Proc Tabulate you can use variable= to override default variable labels and if you use age=' ' the space supresses the label entirely.

rsva · Posted 06-17-2021 11:17 AM

Thanks for the solution. Both solutions worked perfectly.

Get counts by groups across columns

Re: Get counts by groups across columns

Re: Get counts by groups across columns

Re: Get counts by groups across columns

Re: Get counts by groups across columns

Re: Get counts by groups across columns

Get counts by groups across columns

Re: Get counts by groups across columns

Re: Get counts by groups across columns

Re: Get counts by groups across columns

Re: Get counts by groups across columns

Re: Get counts by groups across columns

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away