Solved: transposing data

lalohg · Posted 07-18-2016 12:03 PM

Hi there, I have a data set that looks like the following:

sex	year	Zone	Color	age_group
M	1999	Rural	red	80 +
M	1999	Urban	blue	05-09
M	1999	Urban	white	50-54
M	1999	Rural	yellow	75-79
F	2000	Rural	black	15-19
M	2001	Rural	red	50-54
M	2000	Rural	blue	25-29
M	2002	Rural	white	25-29
F	2004	Rural	yellow	15-19
M	2005	Urban	black	20-24
M	2006	Rural	red	15-19
M	2006	Rural	blue	70-74
M	2008	Rural	white	00-04
F	2009	Rural	yellow	80 +
F	2010	Rural	black	80 +
M	2002	Rural	red	75-79
F	2005	Rural	blue	40-44
M	2006	Rural	white	35-39
F	2001	Rural	yellow	30-34
F	2000	Rural	black	70-74
M	2003	Rural	red	30-34
F	2003	Rural	blue	80 +
F	2003	Rural	white	10-14
F	2001	Rural	yellow	30-34
M	2000	Urban	black	60-64
M	2002	Urban	red	10-14
M	2004	SubUrban	blue	35-39
M	2005	Rural	white	15-19
F	2006	Rural	yellow	55-59
F	2000	Rural	black	55-59
F	2003	Rural	red	50-54
M	2003	Rural	blue	30-34
F	2003	SubUrban	white	20-24
F	2001	SubUrban	yellow	50-54
F	2000	Rural	red	05-09
M	2002	SubUrban	blue	25-29
M	2004	Urban	white	15-19
M	1999	Urban	yellow	25-29
F	1999	Rural	red	75-79
M	1999	Urban	blue	65-69
M	2000	Urban	white	55-59
F	2001	Rural	yellow	15-19
M	2000	Urban	red	75-79
M	2003	Rural	blue	15-19
M	2003	Rural	white	60-64
M	2003	Rural	yellow	55-59
M	2001	SubUrban	red	00-04
M	1999	Rural	blue	65-69
F	1999	SubUrban	white	45-49

and I would like to get columns for each gender's color (total and by zone) for each year and age group like the example below for Red

year	age_group	Female_red_total	Female_red_rural	Female_red_subUrban	Female_red_Urban	Male_red_total	Male_red_rural	Male_red_subUrban	Male_red_Urban
1999	80 +	0	0	0	0	1	1	0	0
2001	50-54	0	0	0	0	1	1	0	0
2006	15-19	0	0	0	0	1	1	0	0
2002	75-79	0	0	0	0	1	1	0	0
2003	30-34	0	0	0	0	1	0	0	0
2002	10-14	0	0	0	0	1	0	0	1
2003	50-54	1	1	0	0	0	0	0	0
2000	05-09	1	1	0	0	0	0	0	0
1999	75-79	1	1	0	0	0	0	0	0
2000	75-79	0	0	0	0	1	0	0	1
2001	00-04	0	0	0	0	1	0	1	0

could you please help?

Thanks

Eduardo.

Ksharp · Posted 07-19-2016 09:19 PM

OK. It is easy for adding one more proc means.

data have;
infile cards expandtabs truncover;
input (sex	year	Zone	Color	age_group) ($);
cards;
M	1999	Rural	red	80+
M	1999	Urban	blue	05-09
M	1999	Urban	white	50-54
M	1999	Rural	yellow	75-79
F	2000	Rural	black	15-19
M	2001	Rural	red	50-54
M	2000	Rural	blue	25-29
M	2002	Rural	white	25-29
F	2004	Rural	yellow	15-19
M	2005	Urban	black	20-24
M	2006	Rural	red	15-19
M	2006	Rural	blue	70-74
M	2008	Rural	white	00-04
F	2009	Rural	yellow	80+
F	2010	Rural	black	80+
M	2002	Rural	red	75-79
F	2005	Rural	blue	40-44
M	2006	Rural	white	35-39
F	2001	Rural	yellow	30-34
F	2000	Rural	black	70-74
M	2003	Rural	red	30-34
F	2003	Rural	blue	80+
F	2003	Rural	white	10-14
F	2001	Rural	yellow	30-34
M	2000	Urban	black	60-64
M	2002	Urban	red	10-14
M	2004	SubUrban	blue	35-39
M	2005	Rural	white	15-19
F	2006	Rural	yellow	55-59
F	2000	Rural	black	55-59
F	2003	Rural	red	50-54
M	2003	Rural	blue	30-34
F	2003	SubUrban	white	20-24
F	2001	SubUrban	yellow	50-54
F	2000	Rural	red	05-09
M	2002	SubUrban	blue	25-29
M	2004	Urban	white	15-19
M	1999	Urban	yellow	25-29
F	1999	Rural	red	75-79
M	1999	Urban	blue	65-69
M	2000	Urban	white	55-59
F	2001	Rural	yellow	15-19
M	2000	Urban	red	75-79
M	2003	Rural	blue	15-19
M	2003	Rural	white	60-64
M	2003	Rural	yellow	55-59
M	2001	SubUrban	red	00-04
M	1999	Rural	blue	65-69
F	1999	SubUrban	white	45-49
;
run;
proc freq data=have noprint;
table year*age_group*sex*Color*Zone/list out=temp;
run;
proc format;
value $ fmt
 'F'='Female'
 'M'='Male';
run;
proc transpose data=temp out=want(drop=_:) delimiter=_ ;
by year	age_group;
var count;
id sex Color Zone;
format sex $fmt.;
run;
proc stdize data=want out=final_want reponly missing=0;
run;

proc summary data=final_want;
by year	;
var _numeric_;
output out=sum_want(drop=_:) sum=;
run;

View solution in original post

ballardw · Posted 07-18-2016 02:19 PM

What are you going to do with a data set in that form?

Or do you what a report of that type?

If I were making a report of this I would start with something like:

Proc tabulate data = have;
   class year age_group sex zone color;
   table Year*age_group,
         sex*color*zone*n;
run;

lalohg · Posted 07-18-2016 02:45 PM

Hi Ballardw,

Thanks for replying, I need the data set like the example for Red that I posted. I am going to merge the data set containing all columns for females and males with another data set by year and age group so no need of Proc Tabulate. How can I get all columns for both sexes?

Thanks

ballardw · Posted 07-18-2016 04:20 PM

Generally normalized data would stay in the year, sex, age_group, zone format.

If you need to MATCH things (add a new value) based on those variables then there are many approachs from data step Merge, or Update or Hash or Proc Sql joins.

A another issue with your planned approach that your example data is for counts within a specific subset of data. What happens after you combine the data and someone resorts the data? Do you still know what those values represent for any given row?

And if you actually do a datastep merge and those variables are in both sets you very likely lose much of the data in the first set. Is that the intended behavior?

Ksharp · Posted 07-19-2016 02:52 AM

The following code could give you a start .

data have;
infile cards expandtabs truncover;
input (sex	year	Zone	Color	age_group) ($);
cards;
M	1999	Rural	red	80+
M	1999	Urban	blue	05-09
M	1999	Urban	white	50-54
M	1999	Rural	yellow	75-79
F	2000	Rural	black	15-19
M	2001	Rural	red	50-54
M	2000	Rural	blue	25-29
M	2002	Rural	white	25-29
F	2004	Rural	yellow	15-19
M	2005	Urban	black	20-24
M	2006	Rural	red	15-19
M	2006	Rural	blue	70-74
M	2008	Rural	white	00-04
F	2009	Rural	yellow	80+
F	2010	Rural	black	80+
M	2002	Rural	red	75-79
F	2005	Rural	blue	40-44
M	2006	Rural	white	35-39
F	2001	Rural	yellow	30-34
F	2000	Rural	black	70-74
M	2003	Rural	red	30-34
F	2003	Rural	blue	80+
F	2003	Rural	white	10-14
F	2001	Rural	yellow	30-34
M	2000	Urban	black	60-64
M	2002	Urban	red	10-14
M	2004	SubUrban	blue	35-39
M	2005	Rural	white	15-19
F	2006	Rural	yellow	55-59
F	2000	Rural	black	55-59
F	2003	Rural	red	50-54
M	2003	Rural	blue	30-34
F	2003	SubUrban	white	20-24
F	2001	SubUrban	yellow	50-54
F	2000	Rural	red	05-09
M	2002	SubUrban	blue	25-29
M	2004	Urban	white	15-19
M	1999	Urban	yellow	25-29
F	1999	Rural	red	75-79
M	1999	Urban	blue	65-69
M	2000	Urban	white	55-59
F	2001	Rural	yellow	15-19
M	2000	Urban	red	75-79
M	2003	Rural	blue	15-19
M	2003	Rural	white	60-64
M	2003	Rural	yellow	55-59
M	2001	SubUrban	red	00-04
M	1999	Rural	blue	65-69
F	1999	SubUrban	white	45-49
;
run;
proc freq data=have noprint;
table year*age_group*sex*Color*Zone/list out=temp;
run;
proc format;
value $ fmt
 'F'='Female'
 'M'='Male';
run;
proc transpose data=temp out=want(drop=_:) delimiter=_ ;
by year	age_group;
var count;
id sex Color Zone;
format sex $fmt.;
run;
proc stdize data=want out=final_want reponly missing=0;
run;
proc print noobs;run;

lalohg · Posted 07-19-2016 09:09 AM

Hi Xia,

your code worked just fine!! thank you very much!!, just one more question, how can I modify your code so I can get a new data set with the total count of each column by year?

Thanks Again

Ksharp · Posted 07-19-2016 09:19 PM

OK. It is easy for adding one more proc means.

data have;
infile cards expandtabs truncover;
input (sex	year	Zone	Color	age_group) ($);
cards;
M	1999	Rural	red	80+
M	1999	Urban	blue	05-09
M	1999	Urban	white	50-54
M	1999	Rural	yellow	75-79
F	2000	Rural	black	15-19
M	2001	Rural	red	50-54
M	2000	Rural	blue	25-29
M	2002	Rural	white	25-29
F	2004	Rural	yellow	15-19
M	2005	Urban	black	20-24
M	2006	Rural	red	15-19
M	2006	Rural	blue	70-74
M	2008	Rural	white	00-04
F	2009	Rural	yellow	80+
F	2010	Rural	black	80+
M	2002	Rural	red	75-79
F	2005	Rural	blue	40-44
M	2006	Rural	white	35-39
F	2001	Rural	yellow	30-34
F	2000	Rural	black	70-74
M	2003	Rural	red	30-34
F	2003	Rural	blue	80+
F	2003	Rural	white	10-14
F	2001	Rural	yellow	30-34
M	2000	Urban	black	60-64
M	2002	Urban	red	10-14
M	2004	SubUrban	blue	35-39
M	2005	Rural	white	15-19
F	2006	Rural	yellow	55-59
F	2000	Rural	black	55-59
F	2003	Rural	red	50-54
M	2003	Rural	blue	30-34
F	2003	SubUrban	white	20-24
F	2001	SubUrban	yellow	50-54
F	2000	Rural	red	05-09
M	2002	SubUrban	blue	25-29
M	2004	Urban	white	15-19
M	1999	Urban	yellow	25-29
F	1999	Rural	red	75-79
M	1999	Urban	blue	65-69
M	2000	Urban	white	55-59
F	2001	Rural	yellow	15-19
M	2000	Urban	red	75-79
M	2003	Rural	blue	15-19
M	2003	Rural	white	60-64
M	2003	Rural	yellow	55-59
M	2001	SubUrban	red	00-04
M	1999	Rural	blue	65-69
F	1999	SubUrban	white	45-49
;
run;
proc freq data=have noprint;
table year*age_group*sex*Color*Zone/list out=temp;
run;
proc format;
value $ fmt
 'F'='Female'
 'M'='Male';
run;
proc transpose data=temp out=want(drop=_:) delimiter=_ ;
by year	age_group;
var count;
id sex Color Zone;
format sex $fmt.;
run;
proc stdize data=want out=final_want reponly missing=0;
run;

proc summary data=final_want;
by year	;
var _numeric_;
output out=sum_want(drop=_:) sum=;
run;

lalohg · Posted 07-20-2016 02:35 PM

Hi Xia

I worked!! Thanks again!

Problem solved!

transposing data

Re: transposing data

Re: transposing data

Re: transposing data

Re: transposing data

Re: transposing data

Re: transposing data

Re: transposing data

Re: transposing data

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away