Greetings,
(Sorry for duplicating questions, lost my previous post).
I have data on 5 partners for each client, partner1-5. I have their test results, POS or NEG for each partner POS1-5 and NEG1-5 (coded 1 if yes, 0 if no). I also have age of the partners, Age1-5. Data snapshot below
Partner1 | Partner2 | Partner3 | Partner4 | Partner5 | POS1 | POS2 | POS3 | POS4 | POS5 | NEG1 | NEG2 | NEG3 | NEG4 | NEG5 | Age1 | Age2 | Age3 | Age4 | Age5 | NPOS | NNEG |
1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 10-14 | 15-19 | 20-24 | 25-29 | 30+ | 1 | 0 | ||||
1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 10-14 | 15-19 | 10-14 | 30+ | 20-24 | 1 | 1 | |||
1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 30+ | 20-24 | 25-29 | 15-19 | 20-24 | 2 | 1 | ||
1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 20-24 | 25-29 | 10-14 | 15-19 | 20-24 | 2 | 3 |
I created the counts for nPOS and nNEG by Sum(of Pos1-Pos5) and Sum(of Neg1-Neg5). I am trying to get count of POS ans NEGs by age group. Meaning the distribution of age group among POS and among NEG. Output below :
POS | NEG | |
10-14 | 2 | |
15-19 | 2 | 1 |
20-24 | 1 | 2 |
25-29 | 2 | |
30+ | 1 |
Any suggestions on how to get the counts would help.
THanks in advance,
Okay, I think I got something that will work for you but I had to add a variable called ID and it has values 1-4 for each of the 4 rows. Basically, you make 3 datasets and transpose them by pos, neg, and age and then merge all the datasets together. From there, you can sum down the columns grouping by age. Take a look. I tried to attach an image but it's not working. Just print out the dataset test3 and see if it meets your needs.
data test;
input id pos1-pos5 neg1-neg5 age1 $ age2 $ age3 $ age4 $ age5 $;
cards;
1 1 0 0 0 0 0 0 0 0 0 10-14 15-19 20-24 25-29 30+
2 0 1 0 0 0 1 0 0 0 0 10-14 15-19 10-14 30+ 20-24
3 1 1 0 0 0 0 0 1 0 0 30+ 20-24 25-29 15-19 20-24
4 0 0 1 1 0 1 1 0 0 1 20-24 25-29 10-14 15-19 20-24
;
run;
*output 3 datasets from proc transpose, one for age, one for pos, one for neg;
proc transpose data=test out=longp prefix=pos;
by ID;
var pos1-pos5;
run;
proc transpose data=test out=longn prefix=neg;
by ID;
var neg1-neg5;
run;
proc transpose data=test out=longa prefix=age;
by ID;
var age1-age5;
run;
*create test2 ds by merging the 3 transposed datasets together;
data test2;
merge longp(rename=(pos1=pos) drop=_name_) longn(rename=(neg1=neg) drop=_name_) longa
(rename=(age1=age));
by ID;
drop _name_;
run;
proc print data=test2;run;
proc sql;
create table test3 as select age, sum(pos) as pos, sum(neg) as neg
from test2
group by age;
quit;
Okay, I think I got something that will work for you but I had to add a variable called ID and it has values 1-4 for each of the 4 rows. Basically, you make 3 datasets and transpose them by pos, neg, and age and then merge all the datasets together. From there, you can sum down the columns grouping by age. Take a look. I tried to attach an image but it's not working. Just print out the dataset test3 and see if it meets your needs.
data test;
input id pos1-pos5 neg1-neg5 age1 $ age2 $ age3 $ age4 $ age5 $;
cards;
1 1 0 0 0 0 0 0 0 0 0 10-14 15-19 20-24 25-29 30+
2 0 1 0 0 0 1 0 0 0 0 10-14 15-19 10-14 30+ 20-24
3 1 1 0 0 0 0 0 1 0 0 30+ 20-24 25-29 15-19 20-24
4 0 0 1 1 0 1 1 0 0 1 20-24 25-29 10-14 15-19 20-24
;
run;
*output 3 datasets from proc transpose, one for age, one for pos, one for neg;
proc transpose data=test out=longp prefix=pos;
by ID;
var pos1-pos5;
run;
proc transpose data=test out=longn prefix=neg;
by ID;
var neg1-neg5;
run;
proc transpose data=test out=longa prefix=age;
by ID;
var age1-age5;
run;
*create test2 ds by merging the 3 transposed datasets together;
data test2;
merge longp(rename=(pos1=pos) drop=_name_) longn(rename=(neg1=neg) drop=_name_) longa
(rename=(age1=age));
by ID;
drop _name_;
run;
proc print data=test2;run;
proc sql;
create table test3 as select age, sum(pos) as pos, sum(neg) as neg
from test2
group by age;
quit;
THanks for the solution.
Is it every possible for a partner to have both a positive and negative result? If not you only need one variable with the information.
Here's a different approach to reshaping data with arrays and explicit output statement.
data test; input id partner1-partner5 pos1-pos5 neg1-neg5 age1 $ age2 $ age3 $ age4 $ age5 $; cards; 1 1 . . . . 1 0 0 0 0 0 0 0 0 0 10-14 15-19 20-24 25-29 30+ 2 1 1 . . . 0 1 0 0 0 1 0 0 0 0 10-14 15-19 10-14 30+ 20-24 3 1 1 1 . . 1 1 0 0 0 0 0 1 0 0 30+ 20-24 25-29 15-19 20-24 4 1 1 1 1 1 0 0 1 1 0 1 1 0 0 1 20-24 25-29 10-14 15-19 20-24 ; run; data want; set test; array pa(*) partner1-partner5; array p (*) pos1-pos5; array n (*) neg1-neg5; array a (*) age1-age5; do partner=1 to dim(pa); if pa[partner]=1 then do; pos=p[partner]; neg=n[partner]; age=a[partner]; output; end; end; keep id partner pos neg age; run; proc tabulate data=want; class age; var pos neg; table age=' ', (pos neg) * sum=' '*f=best5. /misstext=' ' ; run;
You will find that for most cases that a long data set with one observation per set of measurements will work better.
And here is what I meant by using just one variable :
proc format; value pn 1=' Pos' 0='Neg' ; proc tabulate data=want; class age; class pos/order=formatted; format pos pn.; table age=' ', pos=' ' * n=' ' /misstext=' ' ; run;
Playing some minor games to duplicate the order you showed.
In Proc Tabulate you can use variable= to override default variable labels and if you use age=' ' the space supresses the label entirely.
Thanks for the solution. Both solutions worked perfectly.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.