BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kneeduh
Calcite | Level 5

Hi everyone,

 

I have an age group variable that I am looking at... they have categories as follows:

Capture.PNG

 

We need to create a category of 0 to 5 years, but unfortunately 5 year olds are lumped with the 6 year olds. How do I create a variable or even a category within the same variable (AGEGRP), where the FREQUENCY is divided by 2? (we have a rationale as to why we are doing this). Such that the number of 5 year olds show up with a frequency of ~9435?

 

Thank you very much!!

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

See if this gets you started.

Proc freq data=raw noprint;
   tables agegrp/out=work.summary;
run;

data work.temp;
   set work.summary;
   if agegrp=2 then do;
      count5= floor(count/2);
      count6= count-count5;
      count = count5; 
      agegrp=1;
      output;
      count= count6;
      agegrp=2;
      output;
   end;
   else output;
end;

proc summary data=work.temp nway;
   class agegrp;
   var count;
   output out=want (drop= _:) sum=;
run;

/* WANT will have two variable: agegrp and the count(or frequency)
  per agegrp with the 5 years combined to the agegrp=1 and 
  agegrp=2 only the 6 year olds
  Create a new format to show agegrp=1 as 0-5, agegrp=2 as 6
*/

Proc freq data=want;
   tables agegrp;
   format agegrp thenewformat.;
   weight count;
run;

Since you haven't provided SAS variable names, data sets I'm guessing. If you don't have a sas data set then you need to read that csv.

 

Your desired output shows that at some point you have set the age group 88 to "missing" I suspect.

You will need to create a new format to show the new age group meanings for agegrp 1 and 2.

View solution in original post

10 REPLIES 10
PaigeMiller
Diamond | Level 26

@kneeduh wrote:

Hi everyone,

 

I have an age group variable that I am looking at... they have categories as follows:

Capture.PNG

 

We need to create a category of 0 to 5 years, but unfortunately 5 year olds are lumped with the 6 year olds. How do I create a variable or even a category within the same variable (AGEGRP), where the FREQUENCY is divided by 2? (we have a rationale as to why we are doing this). Such that the number of 5 year olds show up with a frequency of ~9435?

 

Thank you very much!!

 


I can't make heads or tails out of this. Show us the desired output, and explain the steps to get there.

 

By the way, it's nearly impossible to decipher "How do I create a variable or even a category ... " because variables and categories are different things.

--
Paige Miller
kneeduh
Calcite | Level 5

Sorry - am fairly new when it comes to working with SAS.

 

Essentially I want the frequency of 0 to 5 year olds to equal 55249.5, this would be derived from adding the number of 0 to 4 year olds (45815) with half of the number of 5 to 6 year olds (18869/2).

 

So far, this is a code that is getting me part way there.


data children;
set folder.census2006;
if AGEGRP = 1 then oldcount04 = 1;
if AGEGRP ^= 1 then oldcount04 = 0;
if AGEGRP = 2 then oldcount56 = 1;
if AGEGRP ^= 2 then oldcount56 = 0;
run;


data children;
set children;
newcount05= oldcount04+round(oldcount56/2);
run;

PROC FREQ DATA=CHILDREN;
TABLES AGEGRP oldcount04 oldcount56 newcount05;
RUN;

 

 

However- seems that the 'oldcount56' variable is not dividing by 2 successfully. As a result, I'm seeing 0 to 4 year olds just adding together with the entire 5 to 6 year old category.

PaigeMiller
Diamond | Level 26

@kneeduh wrote:

Sorry - am fairly new when it comes to working with SAS.

 

Essentially I want the frequency of 0 to 5 year olds to equal 55249.5, this would be derived from adding the number of 0 to 4 year olds (45815) with half of the number of 5 to 6 year olds (18869/2).

Reading this literally, you want a single calculation of 45815 + 18869/2.

 

However, I get the feeling you want a table with many calculations. Do you? 

 

Anyway, I asked you to SHOW us the desired output. Type it in. The only thing you have shown us is this single calculation. We're not going to be able to provide code for this table (if that's what you want) unless you SHOW us what you want. And I don't really think we have to show you how to code that single calculation.

 

So I await your reply.

--
Paige Miller
kneeduh
Calcite | Level 5

I've attached the output here. Hope this is what you meant.

Thanks!

 

I actually only want this calculation above. I simply need the number of 0 to 5 year olds to show up, so I use this it subset my sample and get rid of anyone above the age of 5. I just cannot get the frequency to reflect the 5-6 year age group to be divided by 2, for the addition. 0 to 4 year old and 0 to 5 year old groups add together instead.

 
PaigeMiller
Diamond | Level 26

So the output you want, as shown in sas_output.docx, has additional columns that are not in the input data, these additional columns are percent, cumulative frequency and cumulative percent. Nowhere do I see an output that moves the 5 yr olds into a 0 to 5 category, nor do I see what would happen to the other categories.

 

You haven't shown me the desired output, as I understand what you are asking. You have shown me what is probably an output from PROC FREQ given your existing data, not what I asked for.

--
Paige Miller
ballardw
Super User

@kneeduh wrote:

Hi everyone,

 

I have an age group variable that I am looking at... they have categories as follows:

Capture.PNG

 

We need to create a category of 0 to 5 years, but unfortunately 5 year olds are lumped with the 6 year olds. How do I create a variable or even a category within the same variable (AGEGRP), where the FREQUENCY is divided by 2? (we have a rationale as to why we are doing this). Such that the number of 5 year olds show up with a frequency of ~9435?

 

Thank you very much!!

 


The first thing is to start with a data set. That is a picture. What does your data actually look like?

 

Personally if the base data that generated that "report" does not have individual ages then you get to have some fun.

One possibly defensible approach is to assume that proportion of ages within each age group is uniform then 5 year olds would be half of the 5 and 6 group. But Having worked with single age US Census population estimates I know that is extremely unlikely.

A possibly better approach if this data is for the US is to find the single age population estimates for the appropriate year and state, calculate the proportion between 5 and 6 from that file to apply it to your data.

How to do that would depend on what your data set looks like.

 

 

kneeduh
Calcite | Level 5

Thanks for all of your input! Dividing by 2 is fairly similar to the number of children born in Canada in 2001 (who would be 5 years in 2006), so that was why we just decided to stick with dividing by 2. But I would certainly be open to any further suggestions/advice you have!

 

I am struggling to upload the dataset as the file is far too large. Could you possibly advise on how I can go around this?

 

Thanks again!

kneeduh
Calcite | Level 5

Giving this a shot

ballardw
Super User

See if this gets you started.

Proc freq data=raw noprint;
   tables agegrp/out=work.summary;
run;

data work.temp;
   set work.summary;
   if agegrp=2 then do;
      count5= floor(count/2);
      count6= count-count5;
      count = count5; 
      agegrp=1;
      output;
      count= count6;
      agegrp=2;
      output;
   end;
   else output;
end;

proc summary data=work.temp nway;
   class agegrp;
   var count;
   output out=want (drop= _:) sum=;
run;

/* WANT will have two variable: agegrp and the count(or frequency)
  per agegrp with the 5 years combined to the agegrp=1 and 
  agegrp=2 only the 6 year olds
  Create a new format to show agegrp=1 as 0-5, agegrp=2 as 6
*/

Proc freq data=want;
   tables agegrp;
   format agegrp thenewformat.;
   weight count;
run;

Since you haven't provided SAS variable names, data sets I'm guessing. If you don't have a sas data set then you need to read that csv.

 

Your desired output shows that at some point you have set the age group 88 to "missing" I suspect.

You will need to create a new format to show the new age group meanings for agegrp 1 and 2.

kneeduh
Calcite | Level 5
Thank you so very much! This helped immensely. I really appreciate it!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 2670 views
  • 0 likes
  • 3 in conversation