Programming the statistical procedures from SAS

How to divide the frequency of a variable's category

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

How to divide the frequency of a variable's category

Hi everyone,

 

I have an age group variable that I am looking at... they have categories as follows:

Capture.PNG

 

We need to create a category of 0 to 5 years, but unfortunately 5 year olds are lumped with the 6 year olds. How do I create a variable or even a category within the same variable (AGEGRP), where the FREQUENCY is divided by 2? (we have a rationale as to why we are doing this). Such that the number of 5 year olds show up with a frequency of ~9435?

 

Thank you very much!!

 


Accepted Solutions
Solution
3 weeks ago
Super User
Posts: 13,507

Re: How to divide the frequency of a variable's category

See if this gets you started.

Proc freq data=raw noprint;
   tables agegrp/out=work.summary;
run;

data work.temp;
   set work.summary;
   if agegrp=2 then do;
      count5= floor(count/2);
      count6= count-count5;
      count = count5; 
      agegrp=1;
      output;
      count= count6;
      agegrp=2;
      output;
   end;
   else output;
end;

proc summary data=work.temp nway;
   class agegrp;
   var count;
   output out=want (drop= _:) sum=;
run;

/* WANT will have two variable: agegrp and the count(or frequency)
  per agegrp with the 5 years combined to the agegrp=1 and 
  agegrp=2 only the 6 year olds
  Create a new format to show agegrp=1 as 0-5, agegrp=2 as 6
*/

Proc freq data=want;
   tables agegrp;
   format agegrp thenewformat.;
   weight count;
run;

Since you haven't provided SAS variable names, data sets I'm guessing. If you don't have a sas data set then you need to read that csv.

 

Your desired output shows that at some point you have set the age group 88 to "missing" I suspect.

You will need to create a new format to show the new age group meanings for agegrp 1 and 2.

View solution in original post


All Replies
Respected Advisor
Posts: 2,986

Re: How to divide the frequency of a variable's category


@kneeduh wrote:

Hi everyone,

 

I have an age group variable that I am looking at... they have categories as follows:

Capture.PNG

 

We need to create a category of 0 to 5 years, but unfortunately 5 year olds are lumped with the 6 year olds. How do I create a variable or even a category within the same variable (AGEGRP), where the FREQUENCY is divided by 2? (we have a rationale as to why we are doing this). Such that the number of 5 year olds show up with a frequency of ~9435?

 

Thank you very much!!

 


I can't make heads or tails out of this. Show us the desired output, and explain the steps to get there.

 

By the way, it's nearly impossible to decipher "How do I create a variable or even a category ... " because variables and categories are different things.

--
Paige Miller
Occasional Contributor
Posts: 7

Re: How to divide the frequency of a variable's category

[ Edited ]
Posted in reply to PaigeMiller

Sorry - am fairly new when it comes to working with SAS.

 

Essentially I want the frequency of 0 to 5 year olds to equal 55249.5, this would be derived from adding the number of 0 to 4 year olds (45815) with half of the number of 5 to 6 year olds (18869/2).

 

So far, this is a code that is getting me part way there.


data children;
set folder.census2006;
if AGEGRP = 1 then oldcount04 = 1;
if AGEGRP ^= 1 then oldcount04 = 0;
if AGEGRP = 2 then oldcount56 = 1;
if AGEGRP ^= 2 then oldcount56 = 0;
run;


data children;
set children;
newcount05= oldcount04+round(oldcount56/2);
run;

PROC FREQ DATA=CHILDREN;
TABLES AGEGRP oldcount04 oldcount56 newcount05;
RUN;

 

 

However- seems that the 'oldcount56' variable is not dividing by 2 successfully. As a result, I'm seeing 0 to 4 year olds just adding together with the entire 5 to 6 year old category.

Respected Advisor
Posts: 2,986

Re: How to divide the frequency of a variable's category


@kneeduh wrote:

Sorry - am fairly new when it comes to working with SAS.

 

Essentially I want the frequency of 0 to 5 year olds to equal 55249.5, this would be derived from adding the number of 0 to 4 year olds (45815) with half of the number of 5 to 6 year olds (18869/2).

Reading this literally, you want a single calculation of 45815 + 18869/2.

 

However, I get the feeling you want a table with many calculations. Do you? 

 

Anyway, I asked you to SHOW us the desired output. Type it in. The only thing you have shown us is this single calculation. We're not going to be able to provide code for this table (if that's what you want) unless you SHOW us what you want. And I don't really think we have to show you how to code that single calculation.

 

So I await your reply.

--
Paige Miller
Occasional Contributor
Posts: 7

Re: How to divide the frequency of a variable's category

[ Edited ]
Posted in reply to PaigeMiller

I've attached the output here. Hope this is what you meant.

Thanks!

 

I actually only want this calculation above. I simply need the number of 0 to 5 year olds to show up, so I use this it subset my sample and get rid of anyone above the age of 5. I just cannot get the frequency to reflect the 5-6 year age group to be divided by 2, for the addition. 0 to 4 year old and 0 to 5 year old groups add together instead.

 
Attachment
Respected Advisor
Posts: 2,986

Re: How to divide the frequency of a variable's category

So the output you want, as shown in sas_output.docx, has additional columns that are not in the input data, these additional columns are percent, cumulative frequency and cumulative percent. Nowhere do I see an output that moves the 5 yr olds into a 0 to 5 category, nor do I see what would happen to the other categories.

 

You haven't shown me the desired output, as I understand what you are asking. You have shown me what is probably an output from PROC FREQ given your existing data, not what I asked for.

--
Paige Miller
Super User
Posts: 13,507

Re: How to divide the frequency of a variable's category


@kneeduh wrote:

Hi everyone,

 

I have an age group variable that I am looking at... they have categories as follows:

Capture.PNG

 

We need to create a category of 0 to 5 years, but unfortunately 5 year olds are lumped with the 6 year olds. How do I create a variable or even a category within the same variable (AGEGRP), where the FREQUENCY is divided by 2? (we have a rationale as to why we are doing this). Such that the number of 5 year olds show up with a frequency of ~9435?

 

Thank you very much!!

 


The first thing is to start with a data set. That is a picture. What does your data actually look like?

 

Personally if the base data that generated that "report" does not have individual ages then you get to have some fun.

One possibly defensible approach is to assume that proportion of ages within each age group is uniform then 5 year olds would be half of the 5 and 6 group. But Having worked with single age US Census population estimates I know that is extremely unlikely.

A possibly better approach if this data is for the US is to find the single age population estimates for the appropriate year and state, calculate the proportion between 5 and 6 from that file to apply it to your data.

How to do that would depend on what your data set looks like.

 

 

Occasional Contributor
Posts: 7

Re: How to divide the frequency of a variable's category

[ Edited ]

Thanks for all of your input! Dividing by 2 is fairly similar to the number of children born in Canada in 2001 (who would be 5 years in 2006), so that was why we just decided to stick with dividing by 2. But I would certainly be open to any further suggestions/advice you have!

 

I am struggling to upload the dataset as the file is far too large. Could you possibly advise on how I can go around this?

 

Thanks again!

Occasional Contributor
Posts: 7

Re: How to divide the frequency of a variable's category

Giving this a shot

Attachment
Solution
3 weeks ago
Super User
Posts: 13,507

Re: How to divide the frequency of a variable's category

See if this gets you started.

Proc freq data=raw noprint;
   tables agegrp/out=work.summary;
run;

data work.temp;
   set work.summary;
   if agegrp=2 then do;
      count5= floor(count/2);
      count6= count-count5;
      count = count5; 
      agegrp=1;
      output;
      count= count6;
      agegrp=2;
      output;
   end;
   else output;
end;

proc summary data=work.temp nway;
   class agegrp;
   var count;
   output out=want (drop= _:) sum=;
run;

/* WANT will have two variable: agegrp and the count(or frequency)
  per agegrp with the 5 years combined to the agegrp=1 and 
  agegrp=2 only the 6 year olds
  Create a new format to show agegrp=1 as 0-5, agegrp=2 as 6
*/

Proc freq data=want;
   tables agegrp;
   format agegrp thenewformat.;
   weight count;
run;

Since you haven't provided SAS variable names, data sets I'm guessing. If you don't have a sas data set then you need to read that csv.

 

Your desired output shows that at some point you have set the age group 88 to "missing" I suspect.

You will need to create a new format to show the new age group meanings for agegrp 1 and 2.

Occasional Contributor
Posts: 7

Re: How to divide the frequency of a variable's category

Thank you so very much! This helped immensely. I really appreciate it!
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 161 views
  • 0 likes
  • 3 in conversation