BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lalohg
Quartz | Level 8

Hi there, I have a data set that looks like the following:

 

sex year Zone Color age_group
M 1999 Rural red 80 +
M 1999 Urban blue 05-09
M 1999 Urban white 50-54
M 1999 Rural yellow 75-79
F 2000 Rural black 15-19
M 2001 Rural red 50-54
M 2000 Rural blue 25-29
M 2002 Rural white 25-29
F 2004 Rural yellow 15-19
M 2005 Urban black 20-24
M 2006 Rural red 15-19
M 2006 Rural blue 70-74
M 2008 Rural white 00-04
F 2009 Rural yellow 80 +
F 2010 Rural black 80 +
M 2002 Rural red 75-79
F 2005 Rural blue 40-44
M 2006 Rural white 35-39
F 2001 Rural yellow 30-34
F 2000 Rural black 70-74
M 2003 Rural red 30-34
F 2003 Rural blue 80 +
F 2003 Rural white 10-14
F 2001 Rural yellow 30-34
M 2000 Urban black 60-64
M 2002 Urban red 10-14
M 2004 SubUrban blue 35-39
M 2005 Rural white 15-19
F 2006 Rural yellow 55-59
F 2000 Rural black 55-59
F 2003 Rural red 50-54
M 2003 Rural blue 30-34
F 2003 SubUrban white 20-24
F 2001 SubUrban yellow 50-54
F 2000 Rural red 05-09
M 2002 SubUrban blue 25-29
M 2004 Urban white 15-19
M 1999 Urban yellow 25-29
F 1999 Rural red 75-79
M 1999 Urban blue 65-69
M 2000 Urban white 55-59
F 2001 Rural yellow 15-19
M 2000 Urban red 75-79
M 2003 Rural blue 15-19
M 2003 Rural white 60-64
M 2003 Rural yellow 55-59
M 2001 SubUrban red 00-04
M 1999 Rural blue 65-69
F 1999 SubUrban white 45-49

 

 

and I would like to get columns for each gender's color (total and by zone) for each year and age group like the example below for Red

year age_group Female_red_total Female_red_rural Female_red_subUrban Female_red_Urban Male_red_total Male_red_rural Male_red_subUrban Male_red_Urban
1999 80 + 0 0 0 0 1 1 0 0
2001 50-54 0 0 0 0 1 1 0 0
2006 15-19 0 0 0 0 1 1 0 0
2002 75-79 0 0 0 0 1 1 0 0
2003 30-34 0 0 0 0 1 0 0 0
2002 10-14 0 0 0 0 1 0 0 1
2003 50-54 1 1 0 0 0 0 0 0
2000 05-09 1 1 0 0 0 0 0 0
1999 75-79 1 1 0 0 0 0 0 0
2000 75-79 0 0 0 0 1 0 0 1
2001 00-04 0 0 0 0 1 0 1 0

 

 

 

could you please help?

Thanks

 

Eduardo.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

OK. It is  easy for adding one more proc means.

 

data have;
infile cards expandtabs truncover;
input (sex	year	Zone	Color	age_group) ($);
cards;
M	1999	Rural	red	80+
M	1999	Urban	blue	05-09
M	1999	Urban	white	50-54
M	1999	Rural	yellow	75-79
F	2000	Rural	black	15-19
M	2001	Rural	red	50-54
M	2000	Rural	blue	25-29
M	2002	Rural	white	25-29
F	2004	Rural	yellow	15-19
M	2005	Urban	black	20-24
M	2006	Rural	red	15-19
M	2006	Rural	blue	70-74
M	2008	Rural	white	00-04
F	2009	Rural	yellow	80+
F	2010	Rural	black	80+
M	2002	Rural	red	75-79
F	2005	Rural	blue	40-44
M	2006	Rural	white	35-39
F	2001	Rural	yellow	30-34
F	2000	Rural	black	70-74
M	2003	Rural	red	30-34
F	2003	Rural	blue	80+
F	2003	Rural	white	10-14
F	2001	Rural	yellow	30-34
M	2000	Urban	black	60-64
M	2002	Urban	red	10-14
M	2004	SubUrban	blue	35-39
M	2005	Rural	white	15-19
F	2006	Rural	yellow	55-59
F	2000	Rural	black	55-59
F	2003	Rural	red	50-54
M	2003	Rural	blue	30-34
F	2003	SubUrban	white	20-24
F	2001	SubUrban	yellow	50-54
F	2000	Rural	red	05-09
M	2002	SubUrban	blue	25-29
M	2004	Urban	white	15-19
M	1999	Urban	yellow	25-29
F	1999	Rural	red	75-79
M	1999	Urban	blue	65-69
M	2000	Urban	white	55-59
F	2001	Rural	yellow	15-19
M	2000	Urban	red	75-79
M	2003	Rural	blue	15-19
M	2003	Rural	white	60-64
M	2003	Rural	yellow	55-59
M	2001	SubUrban	red	00-04
M	1999	Rural	blue	65-69
F	1999	SubUrban	white	45-49
;
run;
proc freq data=have noprint;
table year*age_group*sex*Color*Zone/list out=temp;
run;
proc format;
value $ fmt
 'F'='Female'
 'M'='Male';
run;
proc transpose data=temp out=want(drop=_:) delimiter=_ ;
by year	age_group;
var count;
id sex Color Zone;
format sex $fmt.;
run;
proc stdize data=want out=final_want reponly missing=0;
run;

proc summary data=final_want;
by year	;
var _numeric_;
output out=sum_want(drop=_:) sum=;
run;

View solution in original post

7 REPLIES 7
ballardw
Super User

What are you going to do with a data set in that form?

Or do you what a report of that type?

If I were making a report of this I would start with something like:

 

Proc tabulate data = have;
   class year age_group sex zone color;
   table Year*age_group,
         sex*color*zone*n;
run;
lalohg
Quartz | Level 8

Hi Ballardw,

Thanks for replying, I need the data set like the example for Red that I posted. I am going to merge the data set containing all columns for females and males with another data set by year and age group so no need of Proc Tabulate. How can I get all columns for both sexes?
 
Thanks
ballardw
Super User

Generally normalized data would stay in the year, sex, age_group, zone format.

If you need to MATCH things (add a new value) based on those variables then there are many approachs from data step Merge, or Update or Hash or Proc Sql joins.

 

A another issue with your planned approach that your example data is for counts within a specific subset of data. What happens after you combine the data and someone resorts the data? Do you still know what those values represent for any given row?

And if you actually do a datastep merge and those variables are in both sets you very likely lose much of the data in the first set. Is that the intended behavior?

Ksharp
Super User

The following code could give you a start .

 

 

data have;
infile cards expandtabs truncover;
input (sex	year	Zone	Color	age_group) ($);
cards;
M	1999	Rural	red	80+
M	1999	Urban	blue	05-09
M	1999	Urban	white	50-54
M	1999	Rural	yellow	75-79
F	2000	Rural	black	15-19
M	2001	Rural	red	50-54
M	2000	Rural	blue	25-29
M	2002	Rural	white	25-29
F	2004	Rural	yellow	15-19
M	2005	Urban	black	20-24
M	2006	Rural	red	15-19
M	2006	Rural	blue	70-74
M	2008	Rural	white	00-04
F	2009	Rural	yellow	80+
F	2010	Rural	black	80+
M	2002	Rural	red	75-79
F	2005	Rural	blue	40-44
M	2006	Rural	white	35-39
F	2001	Rural	yellow	30-34
F	2000	Rural	black	70-74
M	2003	Rural	red	30-34
F	2003	Rural	blue	80+
F	2003	Rural	white	10-14
F	2001	Rural	yellow	30-34
M	2000	Urban	black	60-64
M	2002	Urban	red	10-14
M	2004	SubUrban	blue	35-39
M	2005	Rural	white	15-19
F	2006	Rural	yellow	55-59
F	2000	Rural	black	55-59
F	2003	Rural	red	50-54
M	2003	Rural	blue	30-34
F	2003	SubUrban	white	20-24
F	2001	SubUrban	yellow	50-54
F	2000	Rural	red	05-09
M	2002	SubUrban	blue	25-29
M	2004	Urban	white	15-19
M	1999	Urban	yellow	25-29
F	1999	Rural	red	75-79
M	1999	Urban	blue	65-69
M	2000	Urban	white	55-59
F	2001	Rural	yellow	15-19
M	2000	Urban	red	75-79
M	2003	Rural	blue	15-19
M	2003	Rural	white	60-64
M	2003	Rural	yellow	55-59
M	2001	SubUrban	red	00-04
M	1999	Rural	blue	65-69
F	1999	SubUrban	white	45-49
;
run;
proc freq data=have noprint;
table year*age_group*sex*Color*Zone/list out=temp;
run;
proc format;
value $ fmt
 'F'='Female'
 'M'='Male';
run;
proc transpose data=temp out=want(drop=_:) delimiter=_ ;
by year	age_group;
var count;
id sex Color Zone;
format sex $fmt.;
run;
proc stdize data=want out=final_want reponly missing=0;
run;
proc print noobs;run;
lalohg
Quartz | Level 8

Hi Xia,

your code worked just fine!! thank you very much!!, just one more question, how can I modify your code so I can get a new data set with the total count of each column by year?

 

Thanks Again

Ksharp
Super User

OK. It is  easy for adding one more proc means.

 

data have;
infile cards expandtabs truncover;
input (sex	year	Zone	Color	age_group) ($);
cards;
M	1999	Rural	red	80+
M	1999	Urban	blue	05-09
M	1999	Urban	white	50-54
M	1999	Rural	yellow	75-79
F	2000	Rural	black	15-19
M	2001	Rural	red	50-54
M	2000	Rural	blue	25-29
M	2002	Rural	white	25-29
F	2004	Rural	yellow	15-19
M	2005	Urban	black	20-24
M	2006	Rural	red	15-19
M	2006	Rural	blue	70-74
M	2008	Rural	white	00-04
F	2009	Rural	yellow	80+
F	2010	Rural	black	80+
M	2002	Rural	red	75-79
F	2005	Rural	blue	40-44
M	2006	Rural	white	35-39
F	2001	Rural	yellow	30-34
F	2000	Rural	black	70-74
M	2003	Rural	red	30-34
F	2003	Rural	blue	80+
F	2003	Rural	white	10-14
F	2001	Rural	yellow	30-34
M	2000	Urban	black	60-64
M	2002	Urban	red	10-14
M	2004	SubUrban	blue	35-39
M	2005	Rural	white	15-19
F	2006	Rural	yellow	55-59
F	2000	Rural	black	55-59
F	2003	Rural	red	50-54
M	2003	Rural	blue	30-34
F	2003	SubUrban	white	20-24
F	2001	SubUrban	yellow	50-54
F	2000	Rural	red	05-09
M	2002	SubUrban	blue	25-29
M	2004	Urban	white	15-19
M	1999	Urban	yellow	25-29
F	1999	Rural	red	75-79
M	1999	Urban	blue	65-69
M	2000	Urban	white	55-59
F	2001	Rural	yellow	15-19
M	2000	Urban	red	75-79
M	2003	Rural	blue	15-19
M	2003	Rural	white	60-64
M	2003	Rural	yellow	55-59
M	2001	SubUrban	red	00-04
M	1999	Rural	blue	65-69
F	1999	SubUrban	white	45-49
;
run;
proc freq data=have noprint;
table year*age_group*sex*Color*Zone/list out=temp;
run;
proc format;
value $ fmt
 'F'='Female'
 'M'='Male';
run;
proc transpose data=temp out=want(drop=_:) delimiter=_ ;
by year	age_group;
var count;
id sex Color Zone;
format sex $fmt.;
run;
proc stdize data=want out=final_want reponly missing=0;
run;

proc summary data=final_want;
by year	;
var _numeric_;
output out=sum_want(drop=_:) sum=;
run;
lalohg
Quartz | Level 8

Hi Xia

I worked!! Thanks again! 

Problem solved!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1204 views
  • 2 likes
  • 3 in conversation