Solved: Rearranging SAS file from "long" to "wide" by subgroup

Wolverine · Posted 05-23-2024 12:54 PM

I'm trying to create a summary output table of frequencies. I have 4 different subgroups between TM and ADRD (0,0; 1,0; 0,1; 1,1) and I'm trying to line them up horizontally rather than vertically. This is what I have:

TM_group	ADRD_group	Table	Value_Labels	WgtFreq	Percent
0	0	Table DEM_AGE	2:Age Group [65,75)	48301097	56.025
0	0	Table DEM_AGE	3:Age Group >=75	37912485	43.975
0	0	Table DEM_SEX	1:Male	36957033	42.867
0	0	Table DEM_SEX	2:Female	49256549	57.133
0	1	Table DEM_AGE	2:Age Group [65,75)	983069	26.877
0	1	Table DEM_AGE	3:Age Group >=75	2674626	73.123
0	1	Table DEM_SEX	1:Male	1363336	37.273
0	1	Table DEM_SEX	2:Female	2294360	62.727
1	0	Table DEM_AGE	2:Age Group [65,75)	82517247	60.176
1	0	Table DEM_AGE	3:Age Group >=75	54609890	39.824
1	0	Table DEM_SEX	1:Male	63773136	46.507
1	0	Table DEM_SEX	2:Female	73354000	53.493
1	1	Table DEM_AGE	2:Age Group [65,75)	1290895	25.83
1	1	Table DEM_AGE	3:Age Group >=75	3706852	74.171
1	1	Table DEM_SEX	1:Male	2215172	44.323
1	1	Table DEM_SEX	2:Female	2782576	55.677

And this is what I want:

		no TM, no ADRD		no TM, ADRD		TM, no ADRD		TM, ADRD
Table	Value_Labels	WgtFreq	Percent	WgtFreq	Percent	WgtFreq	Percent	WgtFreq	Percent
Table DEM_AGE	2:Age Group [65,75)	48301097	56.025	983069	26.8767	82517247	60.1757	1290895	25.8295
Table DEM_AGE	3:Age Group >=75	37912485	43.975	2674626	73.1233	54609890	39.8243	3706852	74.1705
Table DEM_SEX	1:Male	36957033	42.867	1363336	37.2731	63773136	46.5066	2215172	44.3234
Table DEM_SEX	2:Female	49256549	57.133	2294360	62.7269	73354000	53.4934	2782576	55.6766

I think Proc Transpose could be used here, if I could get it to work correctly. Another idea I thought of is to separate each group into its own datafile, and then to add the columns from the second file to the right of the columns in the first file, and then repeat for the 3rd and 4th files. But I don't know how to add the columns like that.

Here is code to create the dataset above

data have;
  infile datalines dsd dlm=',' truncover;
  input TM_group ADRD_group Table Value_Labels WgtFreq Percent;
datalines;
0,0,Table DEM_AGE,2:Age Group [65,75),48301097,56.0249
0,0,Table DEM_AGE,3:Age Group >=75,37912485,43.9751
0,0,Table DEM_SEX,1:Male,36957033,42.8668
0,0,Table DEM_SEX,2:Female,49256549,57.1332
0,1,Table DEM_AGE,2:Age Group [65,75),983069,26.8767
0,1,Table DEM_AGE,3:Age Group >=75,2674626,73.1233
0,1,Table DEM_SEX,1:Male,1363336,37.2731
0,1,Table DEM_SEX,2:Female,2294360,62.7269
1,0,Table DEM_AGE,2:Age Group [65,75),82517247,60.1757
1,0,Table DEM_AGE,3:Age Group >=75,54609890,39.8243
1,0,Table DEM_SEX,1:Male,63773136,46.5066
1,0,Table DEM_SEX,2:Female,73354000,53.4934
1,1,Table DEM_AGE,2:Age Group [65,75),1290895,25.8295
1,1,Table DEM_AGE,3:Age Group >=75,3706852,74.1705
1,1,Table DEM_SEX,1:Male,2215172,44.3234
1,1,Table DEM_SEX,2:Female,2782576,55.6766
; RUN;

Tom · Posted 05-23-2024 01:31 PM

That output is not a DATASET. It could be a TABLE, as in Tables and Figures in a report.

Are you looking for something like this:

proc report data=have;
  columns table value_labels tm_group,adrd_group,(WgtFreq Percent);
  define table / group;
  define value_labels / group;
  define tm_group / across;
  define adrd_group /across;
run;

Note your data step will not work. You did not define any of the variables as character. You did not quote the values that contained delimiters.

data have;
  infile datalines dsd dlm=',' truncover;
  length TM_group ADRD_group 8 Table $50 Value_Labels $80 WgtFreq Percent 8;
  input TM_group ADRD_group Table Value_Labels WgtFreq Percent;
datalines;
0,0,Table DEM_AGE,"2:Age Group [65,75)",48301097,56.0249
0,0,Table DEM_AGE,3:Age Group >=75,37912485,43.9751
0,0,Table DEM_SEX,1:Male,36957033,42.8668
0,0,Table DEM_SEX,2:Female,49256549,57.1332
0,1,Table DEM_AGE,"2:Age Group [65,75)",983069,26.8767
0,1,Table DEM_AGE,3:Age Group >=75,2674626,73.1233
0,1,Table DEM_SEX,1:Male,1363336,37.2731
0,1,Table DEM_SEX,2:Female,2294360,62.7269
1,0,Table DEM_AGE,"2:Age Group [65,75)",82517247,60.1757
1,0,Table DEM_AGE,3:Age Group >=75,54609890,39.8243
1,0,Table DEM_SEX,1:Male,63773136,46.5066
1,0,Table DEM_SEX,2:Female,73354000,53.4934
1,1,Table DEM_AGE,"2:Age Group [65,75)",1290895,25.8295
1,1,Table DEM_AGE,3:Age Group >=75,3706852,74.1705
1,1,Table DEM_SEX,1:Male,2215172,44.3234
1,1,Table DEM_SEX,2:Female,2782576,55.6766
;

View solution in original post

PaigeMiller · Posted 05-23-2024 01:09 PM

@Wolverine wrote:

And this is what I want:

no TM, no ADRD no TM, ADRD TM, no ADRD TM, ADRD

Table Value_Labels WgtFreq Percent WgtFreq Percent WgtFreq Percent WgtFreq Percent

Table DEM_AGE 2:Age Group [65,75) 48301097 56.025 983069 26.8767 82517247 60.1757 1290895 25.8295

Table DEM_AGE 3:Age Group >=75 37912485 43.975 2674626 73.1233 54609890 39.8243 3706852 74.1705

Table DEM_SEX 1:Male 36957033 42.867 1363336 37.2731 63773136 46.5066 2215172 44.3234

Table DEM_SEX 2:Female 49256549 57.133 2294360 62.7269 73354000 53.4934 2782576 55.6766
 

There is no such layout possible in a SAS data set. You can't have columns nested under other columns. So PROC TRANSPOSE, which produces a data set, cannot do this.

However, if you want a report laid out like this, one could do this via PROC REPORT. Is that what you want, a report?

Side comment: any time you want to try to convert a long to wide data set, stop and reconsider. Wide data sets require more programming to get useful results. Why? Because almost all SAS PROCs expect long data set and cannot work easily with wide data sets. Repeating: any time you want to try to convert a long to wide data set, stop and reconsider.

By the way, it is most helpful when the code you use to provide data actually works, and your code doesn't work. Could you please fix it?

--
Paige Miller

Tom · Posted 05-23-2024 01:31 PM

That output is not a DATASET. It could be a TABLE, as in Tables and Figures in a report.

Are you looking for something like this:

proc report data=have;
  columns table value_labels tm_group,adrd_group,(WgtFreq Percent);
  define table / group;
  define value_labels / group;
  define tm_group / across;
  define adrd_group /across;
run;

Note your data step will not work. You did not define any of the variables as character. You did not quote the values that contained delimiters.

data have;
  infile datalines dsd dlm=',' truncover;
  length TM_group ADRD_group 8 Table $50 Value_Labels $80 WgtFreq Percent 8;
  input TM_group ADRD_group Table Value_Labels WgtFreq Percent;
datalines;
0,0,Table DEM_AGE,"2:Age Group [65,75)",48301097,56.0249
0,0,Table DEM_AGE,3:Age Group >=75,37912485,43.9751
0,0,Table DEM_SEX,1:Male,36957033,42.8668
0,0,Table DEM_SEX,2:Female,49256549,57.1332
0,1,Table DEM_AGE,"2:Age Group [65,75)",983069,26.8767
0,1,Table DEM_AGE,3:Age Group >=75,2674626,73.1233
0,1,Table DEM_SEX,1:Male,1363336,37.2731
0,1,Table DEM_SEX,2:Female,2294360,62.7269
1,0,Table DEM_AGE,"2:Age Group [65,75)",82517247,60.1757
1,0,Table DEM_AGE,3:Age Group >=75,54609890,39.8243
1,0,Table DEM_SEX,1:Male,63773136,46.5066
1,0,Table DEM_SEX,2:Female,73354000,53.4934
1,1,Table DEM_AGE,"2:Age Group [65,75)",1290895,25.8295
1,1,Table DEM_AGE,3:Age Group >=75,3706852,74.1705
1,1,Table DEM_SEX,1:Male,2215172,44.3234
1,1,Table DEM_SEX,2:Female,2782576,55.6766
;

ballardw · Posted 05-23-2024 01:34 PM

PLEASE Test your data step before posting. The variable type is wrong for Table and Value_labels, your value labels includes the delimiter so you need to quote the values.

data have;
  infile datalines dsd dlm=',' truncover;
  input TM_group ADRD_group Table:$20. Value_Labels :$20. WgtFreq Percent;
  TM_Adrd_group = cats(tm_group,adrd_group);
datalines;
0,0,Table DEM_AGE,"2:Age Group [65,75)",48301097,56.0249
0,0,Table DEM_AGE,3:Age Group >=75,37912485,43.9751
0,0,Table DEM_SEX,1:Male,36957033,42.8668
0,0,Table DEM_SEX,2:Female,49256549,57.1332
0,1,Table DEM_AGE,"2:Age Group [65,75)",983069,26.8767
0,1,Table DEM_AGE,3:Age Group >=75,2674626,73.1233
0,1,Table DEM_SEX,1:Male,1363336,37.2731
0,1,Table DEM_SEX,2:Female,2294360,62.7269
1,0,Table DEM_AGE,"2:Age Group [65,75)",82517247,60.1757
1,0,Table DEM_AGE,3:Age Group >=75,54609890,39.8243
1,0,Table DEM_SEX,1:Male,63773136,46.5066
1,0,Table DEM_SEX,2:Female,73354000,53.4934
1,1,Table DEM_AGE,"2:Age Group [65,75)",1290895,25.8295
1,1,Table DEM_AGE,3:Age Group >=75,3706852,74.1705
1,1,Table DEM_SEX,1:Male,2215172,44.3234
1,1,Table DEM_SEX,2:Female,2782576,55.6766
; RUN;

proc format library=work;
value $tm_adrd
'00'='no TM, no ADRD'
'01'='no TM, ADRD'
'10'='TM, no ADRD'
'11'='TM,  ADRD'
;
run;

proc report  data=have;
   columns table value_labels tm_adrd_group,(WgtFreq Percent);
   define table/group;
   define value_labels/group;
   define tm_adrd_group /across format=$tm_adrd. order=internal;
run;

Though I would create the group variable, or others prior to summary then generate the summary with that new variable so there aren't any questions about combining stuff. If you are doing confidence limits this would be required.

You show

no TM, no ADRD

no TM, ADRD

TM, no ADRD

TM, ADRD

without a clear definition of where those come from.

I have to guess that means the combinations of the variables Tm_group and Adrd_group.

One approach would be to create a composite variable in a data step such as

TM_Adrd_group = cats(tm_group,adrd_group);

Which should create a character variable with values of '00', '01','10' and '11'.

create format to show such things as "no TM, no ADRD" for the '00' version.

Then in one of the reporting procedures assign that format to the composite variable.

Either Proc Report or Tabulate would do such nesting the WgtFreq and Percent under the Tm_adrd_group

Rearranging SAS file from "long" to "wide" by subgroup

Re: Rearranging SAS file from "long" to "wide" by subgroup

Re: Rearranging SAS file from "long" to "wide" by subgroup

Re: Rearranging SAS file from "long" to "wide" by subgroup

Re: Rearranging SAS file from "long" to "wide" by subgroup

Rearranging SAS file from "long" to "wide" by subgroup

Re: Rearranging SAS file from "long" to "wide" by subgroup

Re: Rearranging SAS file from "long" to "wide" by subgroup

Re: Rearranging SAS file from "long" to "wide" by subgroup

Re: Rearranging SAS file from "long" to "wide" by subgroup

Click image to register for webinar

Classroom Training Available!