I'm trying to create a summary output table of frequencies. I have 4 different subgroups between TM and ADRD (0,0; 1,0; 0,1; 1,1) and I'm trying to line them up horizontally rather than vertically. This is what I have:
TM_group | ADRD_group | Table | Value_Labels | WgtFreq | Percent |
0 | 0 | Table DEM_AGE | 2:Age Group [65,75) | 48301097 | 56.025 |
0 | 0 | Table DEM_AGE | 3:Age Group >=75 | 37912485 | 43.975 |
0 | 0 | Table DEM_SEX | 1:Male | 36957033 | 42.867 |
0 | 0 | Table DEM_SEX | 2:Female | 49256549 | 57.133 |
0 | 1 | Table DEM_AGE | 2:Age Group [65,75) | 983069 | 26.877 |
0 | 1 | Table DEM_AGE | 3:Age Group >=75 | 2674626 | 73.123 |
0 | 1 | Table DEM_SEX | 1:Male | 1363336 | 37.273 |
0 | 1 | Table DEM_SEX | 2:Female | 2294360 | 62.727 |
1 | 0 | Table DEM_AGE | 2:Age Group [65,75) | 82517247 | 60.176 |
1 | 0 | Table DEM_AGE | 3:Age Group >=75 | 54609890 | 39.824 |
1 | 0 | Table DEM_SEX | 1:Male | 63773136 | 46.507 |
1 | 0 | Table DEM_SEX | 2:Female | 73354000 | 53.493 |
1 | 1 | Table DEM_AGE | 2:Age Group [65,75) | 1290895 | 25.83 |
1 | 1 | Table DEM_AGE | 3:Age Group >=75 | 3706852 | 74.171 |
1 | 1 | Table DEM_SEX | 1:Male | 2215172 | 44.323 |
1 | 1 | Table DEM_SEX | 2:Female | 2782576 | 55.677 |
And this is what I want:
no TM, no ADRD | no TM, ADRD | TM, no ADRD | TM, ADRD | ||||||
Table | Value_Labels | WgtFreq | Percent | WgtFreq | Percent | WgtFreq | Percent | WgtFreq | Percent |
Table DEM_AGE | 2:Age Group [65,75) | 48301097 | 56.025 | 983069 | 26.8767 | 82517247 | 60.1757 | 1290895 | 25.8295 |
Table DEM_AGE | 3:Age Group >=75 | 37912485 | 43.975 | 2674626 | 73.1233 | 54609890 | 39.8243 | 3706852 | 74.1705 |
Table DEM_SEX | 1:Male | 36957033 | 42.867 | 1363336 | 37.2731 | 63773136 | 46.5066 | 2215172 | 44.3234 |
Table DEM_SEX | 2:Female | 49256549 | 57.133 | 2294360 | 62.7269 | 73354000 | 53.4934 | 2782576 | 55.6766 |
I think Proc Transpose could be used here, if I could get it to work correctly. Another idea I thought of is to separate each group into its own datafile, and then to add the columns from the second file to the right of the columns in the first file, and then repeat for the 3rd and 4th files. But I don't know how to add the columns like that.
Here is code to create the dataset above
data have;
infile datalines dsd dlm=',' truncover;
input TM_group ADRD_group Table Value_Labels WgtFreq Percent;
datalines;
0,0,Table DEM_AGE,2:Age Group [65,75),48301097,56.0249
0,0,Table DEM_AGE,3:Age Group >=75,37912485,43.9751
0,0,Table DEM_SEX,1:Male,36957033,42.8668
0,0,Table DEM_SEX,2:Female,49256549,57.1332
0,1,Table DEM_AGE,2:Age Group [65,75),983069,26.8767
0,1,Table DEM_AGE,3:Age Group >=75,2674626,73.1233
0,1,Table DEM_SEX,1:Male,1363336,37.2731
0,1,Table DEM_SEX,2:Female,2294360,62.7269
1,0,Table DEM_AGE,2:Age Group [65,75),82517247,60.1757
1,0,Table DEM_AGE,3:Age Group >=75,54609890,39.8243
1,0,Table DEM_SEX,1:Male,63773136,46.5066
1,0,Table DEM_SEX,2:Female,73354000,53.4934
1,1,Table DEM_AGE,2:Age Group [65,75),1290895,25.8295
1,1,Table DEM_AGE,3:Age Group >=75,3706852,74.1705
1,1,Table DEM_SEX,1:Male,2215172,44.3234
1,1,Table DEM_SEX,2:Female,2782576,55.6766
; RUN;
That output is not a DATASET. It could be a TABLE, as in Tables and Figures in a report.
Are you looking for something like this:
proc report data=have;
columns table value_labels tm_group,adrd_group,(WgtFreq Percent);
define table / group;
define value_labels / group;
define tm_group / across;
define adrd_group /across;
run;
Note your data step will not work. You did not define any of the variables as character. You did not quote the values that contained delimiters.
data have;
infile datalines dsd dlm=',' truncover;
length TM_group ADRD_group 8 Table $50 Value_Labels $80 WgtFreq Percent 8;
input TM_group ADRD_group Table Value_Labels WgtFreq Percent;
datalines;
0,0,Table DEM_AGE,"2:Age Group [65,75)",48301097,56.0249
0,0,Table DEM_AGE,3:Age Group >=75,37912485,43.9751
0,0,Table DEM_SEX,1:Male,36957033,42.8668
0,0,Table DEM_SEX,2:Female,49256549,57.1332
0,1,Table DEM_AGE,"2:Age Group [65,75)",983069,26.8767
0,1,Table DEM_AGE,3:Age Group >=75,2674626,73.1233
0,1,Table DEM_SEX,1:Male,1363336,37.2731
0,1,Table DEM_SEX,2:Female,2294360,62.7269
1,0,Table DEM_AGE,"2:Age Group [65,75)",82517247,60.1757
1,0,Table DEM_AGE,3:Age Group >=75,54609890,39.8243
1,0,Table DEM_SEX,1:Male,63773136,46.5066
1,0,Table DEM_SEX,2:Female,73354000,53.4934
1,1,Table DEM_AGE,"2:Age Group [65,75)",1290895,25.8295
1,1,Table DEM_AGE,3:Age Group >=75,3706852,74.1705
1,1,Table DEM_SEX,1:Male,2215172,44.3234
1,1,Table DEM_SEX,2:Female,2782576,55.6766
;
@Wolverine wrote:
And this is what I want:
no TM, no ADRD no TM, ADRD TM, no ADRD TM, ADRD Table Value_Labels WgtFreq Percent WgtFreq Percent WgtFreq Percent WgtFreq Percent Table DEM_AGE 2:Age Group [65,75) 48301097 56.025 983069 26.8767 82517247 60.1757 1290895 25.8295 Table DEM_AGE 3:Age Group >=75 37912485 43.975 2674626 73.1233 54609890 39.8243 3706852 74.1705 Table DEM_SEX 1:Male 36957033 42.867 1363336 37.2731 63773136 46.5066 2215172 44.3234 Table DEM_SEX 2:Female 49256549 57.133 2294360 62.7269 73354000 53.4934 2782576 55.6766
There is no such layout possible in a SAS data set. You can't have columns nested under other columns. So PROC TRANSPOSE, which produces a data set, cannot do this.
However, if you want a report laid out like this, one could do this via PROC REPORT. Is that what you want, a report?
Side comment: any time you want to try to convert a long to wide data set, stop and reconsider. Wide data sets require more programming to get useful results. Why? Because almost all SAS PROCs expect long data set and cannot work easily with wide data sets. Repeating: any time you want to try to convert a long to wide data set, stop and reconsider.
By the way, it is most helpful when the code you use to provide data actually works, and your code doesn't work. Could you please fix it?
That output is not a DATASET. It could be a TABLE, as in Tables and Figures in a report.
Are you looking for something like this:
proc report data=have;
columns table value_labels tm_group,adrd_group,(WgtFreq Percent);
define table / group;
define value_labels / group;
define tm_group / across;
define adrd_group /across;
run;
Note your data step will not work. You did not define any of the variables as character. You did not quote the values that contained delimiters.
data have;
infile datalines dsd dlm=',' truncover;
length TM_group ADRD_group 8 Table $50 Value_Labels $80 WgtFreq Percent 8;
input TM_group ADRD_group Table Value_Labels WgtFreq Percent;
datalines;
0,0,Table DEM_AGE,"2:Age Group [65,75)",48301097,56.0249
0,0,Table DEM_AGE,3:Age Group >=75,37912485,43.9751
0,0,Table DEM_SEX,1:Male,36957033,42.8668
0,0,Table DEM_SEX,2:Female,49256549,57.1332
0,1,Table DEM_AGE,"2:Age Group [65,75)",983069,26.8767
0,1,Table DEM_AGE,3:Age Group >=75,2674626,73.1233
0,1,Table DEM_SEX,1:Male,1363336,37.2731
0,1,Table DEM_SEX,2:Female,2294360,62.7269
1,0,Table DEM_AGE,"2:Age Group [65,75)",82517247,60.1757
1,0,Table DEM_AGE,3:Age Group >=75,54609890,39.8243
1,0,Table DEM_SEX,1:Male,63773136,46.5066
1,0,Table DEM_SEX,2:Female,73354000,53.4934
1,1,Table DEM_AGE,"2:Age Group [65,75)",1290895,25.8295
1,1,Table DEM_AGE,3:Age Group >=75,3706852,74.1705
1,1,Table DEM_SEX,1:Male,2215172,44.3234
1,1,Table DEM_SEX,2:Female,2782576,55.6766
;
PLEASE Test your data step before posting. The variable type is wrong for Table and Value_labels, your value labels includes the delimiter so you need to quote the values.
data have; infile datalines dsd dlm=',' truncover; input TM_group ADRD_group Table:$20. Value_Labels :$20. WgtFreq Percent; TM_Adrd_group = cats(tm_group,adrd_group); datalines; 0,0,Table DEM_AGE,"2:Age Group [65,75)",48301097,56.0249 0,0,Table DEM_AGE,3:Age Group >=75,37912485,43.9751 0,0,Table DEM_SEX,1:Male,36957033,42.8668 0,0,Table DEM_SEX,2:Female,49256549,57.1332 0,1,Table DEM_AGE,"2:Age Group [65,75)",983069,26.8767 0,1,Table DEM_AGE,3:Age Group >=75,2674626,73.1233 0,1,Table DEM_SEX,1:Male,1363336,37.2731 0,1,Table DEM_SEX,2:Female,2294360,62.7269 1,0,Table DEM_AGE,"2:Age Group [65,75)",82517247,60.1757 1,0,Table DEM_AGE,3:Age Group >=75,54609890,39.8243 1,0,Table DEM_SEX,1:Male,63773136,46.5066 1,0,Table DEM_SEX,2:Female,73354000,53.4934 1,1,Table DEM_AGE,"2:Age Group [65,75)",1290895,25.8295 1,1,Table DEM_AGE,3:Age Group >=75,3706852,74.1705 1,1,Table DEM_SEX,1:Male,2215172,44.3234 1,1,Table DEM_SEX,2:Female,2782576,55.6766 ; RUN; proc format library=work; value $tm_adrd '00'='no TM, no ADRD' '01'='no TM, ADRD' '10'='TM, no ADRD' '11'='TM, ADRD' ; run; proc report data=have; columns table value_labels tm_adrd_group,(WgtFreq Percent); define table/group; define value_labels/group; define tm_adrd_group /across format=$tm_adrd. order=internal; run;
Though I would create the group variable, or others prior to summary then generate the summary with that new variable so there aren't any questions about combining stuff. If you are doing confidence limits this would be required.
You show
no TM, no ADRD | no TM, ADRD | TM, no ADRD | TM, ADRD |
without a clear definition of where those come from.
I have to guess that means the combinations of the variables Tm_group and Adrd_group.
One approach would be to create a composite variable in a data step such as
TM_Adrd_group = cats(tm_group,adrd_group);
Which should create a character variable with values of '00', '01','10' and '11'.
create format to show such things as "no TM, no ADRD" for the '00' version.
Then in one of the reporting procedures assign that format to the composite variable.
Either Proc Report or Tabulate would do such nesting the WgtFreq and Percent under the Tm_adrd_group
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.