Greetings,
(Sorry for duplicating questions, lost my previous post).
I have data on 5 partners for each client, partner1-5. I have their test results, POS or NEG for each partner POS1-5 and NEG1-5 (coded 1 if yes, 0 if no). I also have age of the partners, Age1-5. Data snapshot below
Partner1 | Partner2 | Partner3 | Partner4 | Partner5 | POS1 | POS2 | POS3 | POS4 | POS5 | NEG1 | NEG2 | NEG3 | NEG4 | NEG5 | Age1 | Age2 | Age3 | Age4 | Age5 | NPOS | NNEG |
1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 10-14 | 15-19 | 20-24 | 25-29 | 30+ | 1 | 0 | ||||
1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 10-14 | 15-19 | 10-14 | 30+ | 20-24 | 1 | 1 | |||
1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 30+ | 20-24 | 25-29 | 15-19 | 20-24 | 2 | 1 | ||
1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 20-24 | 25-29 | 10-14 | 15-19 | 20-24 | 2 | 3 |
I created the counts for nPOS and nNEG by Sum(of Pos1-Pos5) and Sum(of Neg1-Neg5). I am trying to get count of POS ans NEGs by age group. Meaning the distribution of age group among POS and among NEG. Output below :
POS | NEG | |
10-14 | 2 | |
15-19 | 2 | 1 |
20-24 | 1 | 2 |
25-29 | 2 | |
30+ | 1 |
Any suggestions on how to get the counts would help.
THanks in advance,
Okay, I think I got something that will work for you but I had to add a variable called ID and it has values 1-4 for each of the 4 rows. Basically, you make 3 datasets and transpose them by pos, neg, and age and then merge all the datasets together. From there, you can sum down the columns grouping by age. Take a look. I tried to attach an image but it's not working. Just print out the dataset test3 and see if it meets your needs.
data test;
input id pos1-pos5 neg1-neg5 age1 $ age2 $ age3 $ age4 $ age5 $;
cards;
1 1 0 0 0 0 0 0 0 0 0 10-14 15-19 20-24 25-29 30+
2 0 1 0 0 0 1 0 0 0 0 10-14 15-19 10-14 30+ 20-24
3 1 1 0 0 0 0 0 1 0 0 30+ 20-24 25-29 15-19 20-24
4 0 0 1 1 0 1 1 0 0 1 20-24 25-29 10-14 15-19 20-24
;
run;
*output 3 datasets from proc transpose, one for age, one for pos, one for neg;
proc transpose data=test out=longp prefix=pos;
by ID;
var pos1-pos5;
run;
proc transpose data=test out=longn prefix=neg;
by ID;
var neg1-neg5;
run;
proc transpose data=test out=longa prefix=age;
by ID;
var age1-age5;
run;
*create test2 ds by merging the 3 transposed datasets together;
data test2;
merge longp(rename=(pos1=pos) drop=_name_) longn(rename=(neg1=neg) drop=_name_) longa
(rename=(age1=age));
by ID;
drop _name_;
run;
proc print data=test2;run;
proc sql;
create table test3 as select age, sum(pos) as pos, sum(neg) as neg
from test2
group by age;
quit;
Okay, I think I got something that will work for you but I had to add a variable called ID and it has values 1-4 for each of the 4 rows. Basically, you make 3 datasets and transpose them by pos, neg, and age and then merge all the datasets together. From there, you can sum down the columns grouping by age. Take a look. I tried to attach an image but it's not working. Just print out the dataset test3 and see if it meets your needs.
data test;
input id pos1-pos5 neg1-neg5 age1 $ age2 $ age3 $ age4 $ age5 $;
cards;
1 1 0 0 0 0 0 0 0 0 0 10-14 15-19 20-24 25-29 30+
2 0 1 0 0 0 1 0 0 0 0 10-14 15-19 10-14 30+ 20-24
3 1 1 0 0 0 0 0 1 0 0 30+ 20-24 25-29 15-19 20-24
4 0 0 1 1 0 1 1 0 0 1 20-24 25-29 10-14 15-19 20-24
;
run;
*output 3 datasets from proc transpose, one for age, one for pos, one for neg;
proc transpose data=test out=longp prefix=pos;
by ID;
var pos1-pos5;
run;
proc transpose data=test out=longn prefix=neg;
by ID;
var neg1-neg5;
run;
proc transpose data=test out=longa prefix=age;
by ID;
var age1-age5;
run;
*create test2 ds by merging the 3 transposed datasets together;
data test2;
merge longp(rename=(pos1=pos) drop=_name_) longn(rename=(neg1=neg) drop=_name_) longa
(rename=(age1=age));
by ID;
drop _name_;
run;
proc print data=test2;run;
proc sql;
create table test3 as select age, sum(pos) as pos, sum(neg) as neg
from test2
group by age;
quit;
THanks for the solution.
Is it every possible for a partner to have both a positive and negative result? If not you only need one variable with the information.
Here's a different approach to reshaping data with arrays and explicit output statement.
data test; input id partner1-partner5 pos1-pos5 neg1-neg5 age1 $ age2 $ age3 $ age4 $ age5 $; cards; 1 1 . . . . 1 0 0 0 0 0 0 0 0 0 10-14 15-19 20-24 25-29 30+ 2 1 1 . . . 0 1 0 0 0 1 0 0 0 0 10-14 15-19 10-14 30+ 20-24 3 1 1 1 . . 1 1 0 0 0 0 0 1 0 0 30+ 20-24 25-29 15-19 20-24 4 1 1 1 1 1 0 0 1 1 0 1 1 0 0 1 20-24 25-29 10-14 15-19 20-24 ; run; data want; set test; array pa(*) partner1-partner5; array p (*) pos1-pos5; array n (*) neg1-neg5; array a (*) age1-age5; do partner=1 to dim(pa); if pa[partner]=1 then do; pos=p[partner]; neg=n[partner]; age=a[partner]; output; end; end; keep id partner pos neg age; run; proc tabulate data=want; class age; var pos neg; table age=' ', (pos neg) * sum=' '*f=best5. /misstext=' ' ; run;
You will find that for most cases that a long data set with one observation per set of measurements will work better.
And here is what I meant by using just one variable :
proc format; value pn 1=' Pos' 0='Neg' ; proc tabulate data=want; class age; class pos/order=formatted; format pos pn.; table age=' ', pos=' ' * n=' ' /misstext=' ' ; run;
Playing some minor games to duplicate the order you showed.
In Proc Tabulate you can use variable= to override default variable labels and if you use age=' ' the space supresses the label entirely.
Thanks for the solution. Both solutions worked perfectly.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.