Hi there, I have a data set that looks like the following:
sex | year | Zone | Color | age_group |
M | 1999 | Rural | red | 80 + |
M | 1999 | Urban | blue | 05-09 |
M | 1999 | Urban | white | 50-54 |
M | 1999 | Rural | yellow | 75-79 |
F | 2000 | Rural | black | 15-19 |
M | 2001 | Rural | red | 50-54 |
M | 2000 | Rural | blue | 25-29 |
M | 2002 | Rural | white | 25-29 |
F | 2004 | Rural | yellow | 15-19 |
M | 2005 | Urban | black | 20-24 |
M | 2006 | Rural | red | 15-19 |
M | 2006 | Rural | blue | 70-74 |
M | 2008 | Rural | white | 00-04 |
F | 2009 | Rural | yellow | 80 + |
F | 2010 | Rural | black | 80 + |
M | 2002 | Rural | red | 75-79 |
F | 2005 | Rural | blue | 40-44 |
M | 2006 | Rural | white | 35-39 |
F | 2001 | Rural | yellow | 30-34 |
F | 2000 | Rural | black | 70-74 |
M | 2003 | Rural | red | 30-34 |
F | 2003 | Rural | blue | 80 + |
F | 2003 | Rural | white | 10-14 |
F | 2001 | Rural | yellow | 30-34 |
M | 2000 | Urban | black | 60-64 |
M | 2002 | Urban | red | 10-14 |
M | 2004 | SubUrban | blue | 35-39 |
M | 2005 | Rural | white | 15-19 |
F | 2006 | Rural | yellow | 55-59 |
F | 2000 | Rural | black | 55-59 |
F | 2003 | Rural | red | 50-54 |
M | 2003 | Rural | blue | 30-34 |
F | 2003 | SubUrban | white | 20-24 |
F | 2001 | SubUrban | yellow | 50-54 |
F | 2000 | Rural | red | 05-09 |
M | 2002 | SubUrban | blue | 25-29 |
M | 2004 | Urban | white | 15-19 |
M | 1999 | Urban | yellow | 25-29 |
F | 1999 | Rural | red | 75-79 |
M | 1999 | Urban | blue | 65-69 |
M | 2000 | Urban | white | 55-59 |
F | 2001 | Rural | yellow | 15-19 |
M | 2000 | Urban | red | 75-79 |
M | 2003 | Rural | blue | 15-19 |
M | 2003 | Rural | white | 60-64 |
M | 2003 | Rural | yellow | 55-59 |
M | 2001 | SubUrban | red | 00-04 |
M | 1999 | Rural | blue | 65-69 |
F | 1999 | SubUrban | white | 45-49 |
and I would like to get columns for each gender's color (total and by zone) for each year and age group like the example below for Red
year | age_group | Female_red_total | Female_red_rural | Female_red_subUrban | Female_red_Urban | Male_red_total | Male_red_rural | Male_red_subUrban | Male_red_Urban |
1999 | 80 + | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
2001 | 50-54 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
2006 | 15-19 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
2002 | 75-79 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
2003 | 30-34 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
2002 | 10-14 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
2003 | 50-54 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
2000 | 05-09 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
1999 | 75-79 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
2000 | 75-79 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
2001 | 00-04 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
could you please help?
Thanks
Eduardo.
OK. It is easy for adding one more proc means.
data have;
infile cards expandtabs truncover;
input (sex year Zone Color age_group) ($);
cards;
M 1999 Rural red 80+
M 1999 Urban blue 05-09
M 1999 Urban white 50-54
M 1999 Rural yellow 75-79
F 2000 Rural black 15-19
M 2001 Rural red 50-54
M 2000 Rural blue 25-29
M 2002 Rural white 25-29
F 2004 Rural yellow 15-19
M 2005 Urban black 20-24
M 2006 Rural red 15-19
M 2006 Rural blue 70-74
M 2008 Rural white 00-04
F 2009 Rural yellow 80+
F 2010 Rural black 80+
M 2002 Rural red 75-79
F 2005 Rural blue 40-44
M 2006 Rural white 35-39
F 2001 Rural yellow 30-34
F 2000 Rural black 70-74
M 2003 Rural red 30-34
F 2003 Rural blue 80+
F 2003 Rural white 10-14
F 2001 Rural yellow 30-34
M 2000 Urban black 60-64
M 2002 Urban red 10-14
M 2004 SubUrban blue 35-39
M 2005 Rural white 15-19
F 2006 Rural yellow 55-59
F 2000 Rural black 55-59
F 2003 Rural red 50-54
M 2003 Rural blue 30-34
F 2003 SubUrban white 20-24
F 2001 SubUrban yellow 50-54
F 2000 Rural red 05-09
M 2002 SubUrban blue 25-29
M 2004 Urban white 15-19
M 1999 Urban yellow 25-29
F 1999 Rural red 75-79
M 1999 Urban blue 65-69
M 2000 Urban white 55-59
F 2001 Rural yellow 15-19
M 2000 Urban red 75-79
M 2003 Rural blue 15-19
M 2003 Rural white 60-64
M 2003 Rural yellow 55-59
M 2001 SubUrban red 00-04
M 1999 Rural blue 65-69
F 1999 SubUrban white 45-49
;
run;
proc freq data=have noprint;
table year*age_group*sex*Color*Zone/list out=temp;
run;
proc format;
value $ fmt
'F'='Female'
'M'='Male';
run;
proc transpose data=temp out=want(drop=_:) delimiter=_ ;
by year age_group;
var count;
id sex Color Zone;
format sex $fmt.;
run;
proc stdize data=want out=final_want reponly missing=0;
run;
proc summary data=final_want;
by year ;
var _numeric_;
output out=sum_want(drop=_:) sum=;
run;
What are you going to do with a data set in that form?
Or do you what a report of that type?
If I were making a report of this I would start with something like:
Proc tabulate data = have;
class year age_group sex zone color;
table Year*age_group,
sex*color*zone*n;
run;
Hi Ballardw,
Generally normalized data would stay in the year, sex, age_group, zone format.
If you need to MATCH things (add a new value) based on those variables then there are many approachs from data step Merge, or Update or Hash or Proc Sql joins.
A another issue with your planned approach that your example data is for counts within a specific subset of data. What happens after you combine the data and someone resorts the data? Do you still know what those values represent for any given row?
And if you actually do a datastep merge and those variables are in both sets you very likely lose much of the data in the first set. Is that the intended behavior?
The following code could give you a start .
data have;
infile cards expandtabs truncover;
input (sex year Zone Color age_group) ($);
cards;
M 1999 Rural red 80+
M 1999 Urban blue 05-09
M 1999 Urban white 50-54
M 1999 Rural yellow 75-79
F 2000 Rural black 15-19
M 2001 Rural red 50-54
M 2000 Rural blue 25-29
M 2002 Rural white 25-29
F 2004 Rural yellow 15-19
M 2005 Urban black 20-24
M 2006 Rural red 15-19
M 2006 Rural blue 70-74
M 2008 Rural white 00-04
F 2009 Rural yellow 80+
F 2010 Rural black 80+
M 2002 Rural red 75-79
F 2005 Rural blue 40-44
M 2006 Rural white 35-39
F 2001 Rural yellow 30-34
F 2000 Rural black 70-74
M 2003 Rural red 30-34
F 2003 Rural blue 80+
F 2003 Rural white 10-14
F 2001 Rural yellow 30-34
M 2000 Urban black 60-64
M 2002 Urban red 10-14
M 2004 SubUrban blue 35-39
M 2005 Rural white 15-19
F 2006 Rural yellow 55-59
F 2000 Rural black 55-59
F 2003 Rural red 50-54
M 2003 Rural blue 30-34
F 2003 SubUrban white 20-24
F 2001 SubUrban yellow 50-54
F 2000 Rural red 05-09
M 2002 SubUrban blue 25-29
M 2004 Urban white 15-19
M 1999 Urban yellow 25-29
F 1999 Rural red 75-79
M 1999 Urban blue 65-69
M 2000 Urban white 55-59
F 2001 Rural yellow 15-19
M 2000 Urban red 75-79
M 2003 Rural blue 15-19
M 2003 Rural white 60-64
M 2003 Rural yellow 55-59
M 2001 SubUrban red 00-04
M 1999 Rural blue 65-69
F 1999 SubUrban white 45-49
;
run;
proc freq data=have noprint;
table year*age_group*sex*Color*Zone/list out=temp;
run;
proc format;
value $ fmt
'F'='Female'
'M'='Male';
run;
proc transpose data=temp out=want(drop=_:) delimiter=_ ;
by year age_group;
var count;
id sex Color Zone;
format sex $fmt.;
run;
proc stdize data=want out=final_want reponly missing=0;
run;
proc print noobs;run;
Hi Xia,
your code worked just fine!! thank you very much!!, just one more question, how can I modify your code so I can get a new data set with the total count of each column by year?
Thanks Again
OK. It is easy for adding one more proc means.
data have;
infile cards expandtabs truncover;
input (sex year Zone Color age_group) ($);
cards;
M 1999 Rural red 80+
M 1999 Urban blue 05-09
M 1999 Urban white 50-54
M 1999 Rural yellow 75-79
F 2000 Rural black 15-19
M 2001 Rural red 50-54
M 2000 Rural blue 25-29
M 2002 Rural white 25-29
F 2004 Rural yellow 15-19
M 2005 Urban black 20-24
M 2006 Rural red 15-19
M 2006 Rural blue 70-74
M 2008 Rural white 00-04
F 2009 Rural yellow 80+
F 2010 Rural black 80+
M 2002 Rural red 75-79
F 2005 Rural blue 40-44
M 2006 Rural white 35-39
F 2001 Rural yellow 30-34
F 2000 Rural black 70-74
M 2003 Rural red 30-34
F 2003 Rural blue 80+
F 2003 Rural white 10-14
F 2001 Rural yellow 30-34
M 2000 Urban black 60-64
M 2002 Urban red 10-14
M 2004 SubUrban blue 35-39
M 2005 Rural white 15-19
F 2006 Rural yellow 55-59
F 2000 Rural black 55-59
F 2003 Rural red 50-54
M 2003 Rural blue 30-34
F 2003 SubUrban white 20-24
F 2001 SubUrban yellow 50-54
F 2000 Rural red 05-09
M 2002 SubUrban blue 25-29
M 2004 Urban white 15-19
M 1999 Urban yellow 25-29
F 1999 Rural red 75-79
M 1999 Urban blue 65-69
M 2000 Urban white 55-59
F 2001 Rural yellow 15-19
M 2000 Urban red 75-79
M 2003 Rural blue 15-19
M 2003 Rural white 60-64
M 2003 Rural yellow 55-59
M 2001 SubUrban red 00-04
M 1999 Rural blue 65-69
F 1999 SubUrban white 45-49
;
run;
proc freq data=have noprint;
table year*age_group*sex*Color*Zone/list out=temp;
run;
proc format;
value $ fmt
'F'='Female'
'M'='Male';
run;
proc transpose data=temp out=want(drop=_:) delimiter=_ ;
by year age_group;
var count;
id sex Color Zone;
format sex $fmt.;
run;
proc stdize data=want out=final_want reponly missing=0;
run;
proc summary data=final_want;
by year ;
var _numeric_;
output out=sum_want(drop=_:) sum=;
run;
Hi Xia
I worked!! Thanks again!
Problem solved!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.