Hi all,
I'm currently trying to output a table with the basic form of:
Code Case Control
1 300 250
2 100 100
3 200 150
etc.
Case is a sum of all the people who have a 1 for Code_1exp and Case. Control is for people with Case = 0.
The data is currently in the format of
ID Code_1 Code_2 Code_3 Case
1001 .022 0 0 1
1002 0 0 .010 0
1003 0 0 0 1
etc.
Code_x are all variables with values GE 0 and Case is a binary variable in this table.
I'm looking for a highly efficient way to code this since I have approximately 180 Codes.
Thanks for your help!
Do you mean that you have code variables Code_1 to Code_180 or does Code_1 take on potentially 180 different values?
If the first case which values of Code_1 are used to sum over for the Case/control total? You might need to provide some slightly longer example, say 20 records, with the desired output for those 20.
My guess for your 3 lines of example data is:
Code Case Control
1 1 0
2 0 0
3 0 1
Am I close?
For the first question, Code_1 to Code_180, each of the codes have potentially any value above 0. However, I'm mainly looking at ever/never type results so a 1 can be used to replace any value above 0.
Here's a more accurate example table. To get registered into the Code row, they need a nonzero value for Code_x. So for Code_1, there are a total of 7 nonzero ID's, and out of those 7, 3 are cases and 4 are noncases.
Your guess is correct. I hope this clarifies what I'm aiming for.
Thanks again!
Code Case Control
1 3 4
2 2 1
3 7 10
ID | Code_1 | Code_2 | Code_3 | Case |
1001 | 0.022 | 0 | 1.2 | 1 |
1002 | 0 | 0 | 2.2 | 0 |
1003 | 0 | 0 | 3.2 | 0 |
1004 | 0.153 | 0 | 4.2 | 0 |
1005 | 0 | 0.3 | 1.2 | 1 |
1006 | 0 | 0 | 2.2 | 0 |
1007 | 0 | 0 | 3 | 0 |
1008 | 5.2 | 0 | 5 | 1 |
1009 | 2.3 | 0 | 8 | 1 |
1010 | 0 | 0 | 1 | 0 |
1011 | 0 | 0 | 3 | 0 |
1012 | 0 | 0 | 2 | 0 |
1013 | 3.13 | 0.1 | 1 | 0 |
1014 | 0 | 0 | 3 | 1 |
1015 | 0 | 0 | 0 | 0 |
1016 | 0.006 | 0 | 0 | 0 |
1017 | 0.13 | 0 | 3 | 0 |
1018 | 0 | 2 | 2 | 1 |
1019 | 0 | 0 | 0 | 1 |
1020 | 0 | 0 | 1 | 1 |
Does this work for you? (The window put lots of blank lines in here when it pasted. so the datalines aren't right and I'm too lazy to fix them right now)
data base;
input ID Code_1 Code_2 Code_3 Case ;
datalines;
1001 0.022 0 1.2 1
1002 0 0 2.2 0
1003 0 0 3.2 0
1004 0.153 0 4.2 0
1005 0 0.3 1.2 1
1006 0 0 2.2 0
1007 0 0 3 0
1008 5.2 0 5 1
1009 2.3 0 8 1
1010 0 0 1 0
1011 0 0 3 0
1012 0 0 2 0
1013 3.13 0.1 1 0
1014 0 0 3 1
1015 0 0 0 0
1016 0.006 0 0 0
1017 0.13 0 3 0
1018 0 2 2 1
1019 0 0 0 1
1020 0 0 1 1
;
run;
data transposed (keep= id code case);
set base ;
array codes code: ; /* if your varaiables aren't actually named starting with code or have other variables that start with "code" then you'll need to work with a more explicit list*/
do i = 1 to dim(codes);
if codes > 0 then code=i;
else code=.;
output;
end;
run;
proc format library=work;
value case 1='Case' 0='Control';
run;
proc tabulate data= transposed ;
class case /order=formatted;
class code;
table code='',case=''*n=''*f=f6.0
/misstext=' ' box='Code';
format case case.;
run;
Hi,
Thanks for the help ballardw. I'm currently running the code and it's looking like each code gets a new table created, I was wondering is there a way to combine the results of these tables into one master table?
Thanks.
I'm not using your code exactly as I've made it into a macro rather than an array due to the lack of ordering in the Codes (eg. 7, 32, 49, etc. instead of 1,2,3) and also noticed that the results come out more like
Code 7 Case Control
1 42 9483
I'm trying to output a table into Excel that would show (ideally)
Code Case Control
7 42 9483
32 20 8203
49 5 206
etc.
Proc tabulate creates the single output table. There is only one intermediate data set, so only two tables involved anyway. Depending on HOW you try to export to Excel you might get odd results..
What piece of the code makes you think it is generating one table per code???
If your code variables aren't sequentially numbered, not mentioned in your original post, other approaches can be made.
For instance the ARRAY statement could be:
Array codes code_1 - code_180; /* or whatever your max variable name is */
This will create lots of lines in the transposed dataset with missing values for code but they are ignored in proc tabulate anyway.
The SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment.
SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.
Find more tutorials on the SAS Users YouTube channel.