BookmarkSubscribeRSS Feed
m_u_r_a_l_i
Calcite | Level 5
Hi,

I am working on a requirement, which goes like this :

Input File 1
----------------
name sex age
******* *** ***
ADAMS M 20
BAKER F 19
DOUGLAS M 19
HALE M 21
JONE F 19
LARUE F 18
NICK M 19
OLA M 22
PEBBLE F 22
RAINES F 21

Input File 2
----------------
AGE DESCRIPTION
***** ******************
19 nineteen years old
20 twenty years old
21 twenty one years old
22 twenty two years old
23 twenty three years old
24 twenty four years old
25 twenty five years old

I sort both the files on AGE, and I want the o.p to be

AGE TOTAL NUMBER OF PEOPLE IN THAT AGE DESCRIPTION
19 2 nineteen years old

I want to know how I do this.

My logic is to

1. read the first record in SAS. Have the count variable set as 0.
2. store the AGE in a temp variable. And have count variable set as 1.
3. Read the next record, and get the AGE.
4. Compare the temp value and the AGE.
If both are same : count + 1
If the age > temp, then
write a record into the o.p file -


Do the same until end of file.

I am quite new to SAS, so I am not able to find out how I can accomplish this in SAS. Any help or any guidance in this regard would be very helpful....

Thanks.
Murali.
5 REPLIES 5
Cynthia_sas
SAS Super FREQ
Hi:
If this were my task, I would not start by counting the records myself until after I'd figured out whether any of the reporting procedures could suit my needs. For example, PROC FREQ, PROC REPORT, PROC TABULATE, PROC SQL, all have ways to count and categorize records in the manner you want. If you review the results from the following program, you will see that each technique produces a slightly different result, and, depending on what you want, one of these may be a quicker approach than writing your own program to perform this task.

If you need help with any of the syntax used in these programs, you might consider looking at the documentation for each procedure for more help. Or, Tech Support can help you with the details of whatever method you finally select.

cynthia
[pre]
** make some data to test;
data file1;
infile datalines;
input Name $ Gender $ Age;
return;
datalines;
ADAMS M 20
BAKER F 19
DOUGLAS M 19
HALE M 21
JONE F 19
LARUE F 18
NICK M 19
OLA M 22
PEBBLE F 22
RAINES F 21
;
run;

** make a user-defined format for the age description;
proc format;
value agef
18 = 'Eighteen years old'
19 = 'Nineteen years old'
20 = 'Twenty years old '
21 = 'Twenty one years old'
22 = 'Twenty two years old'
23 = 'Twenty three years old'
24 = 'Twenty four years old'
25 = 'Twenty five years old ';
run;

** start ODS;
ods html file='c:\temp\useReport.html' style=egdefault;

** PROC FREQ;
proc freq data=file1;
title '1) Using PROC FREQ';
tables Age;
format Age agef.;
label Age = 'Age Description';
run;

** PROC SQL;
title "2) Using PROC SQL";
proc sql;
select age label='Age',
count(age) as cntage label='Total People',
put(age,agef.) as desc label='Description'
from work.file1
group by age
order by age;
quit;

** PROC REPORT;
proc report data=file1 nowd;
title '3) Using PROC REPORT';
column age agegrp n age=desc;
define age /group noprint;
define agegrp/ computed 'Age' ;
define n / 'Total Number of People';
define desc / group f=agef. 'In this Age Description' ;
rbreak after / summarize;
compute agegrp / character length=5;
agegrp = right(put(age,5.));
if _break_ = '_RBREAK_' then do;
agegrp = 'Total';
end;
endcomp;
run;

** PROC TABULATE;
options missing=0;
proc tabulate data=file1 f=comma8.;
title "4) Using PROC TABULATE";
class age /preloadfmt;
table age all, n / printmiss;
label age = 'Age Description';
keylabel n = 'Total All People In This Category'
all = 'Grand Total';
format age agef.;
run;

options missing = .;
title;
ods html close;
[/pre]
m_u_r_a_l_i
Calcite | Level 5
Hi,

Thanks for the response. After reading through your code, especially
** make a user-defined format for the age description; section, I wish to tell you that there is another challenge in this requirement.

I have given the age as 19 to 25. But in the real time scneario, I might have N number of ages, and i dont know how many would be there. So how would I be able to define.
deleted_user
Not applicable
I read Cynthia's response, amazed at how well she had covered the requirements, without straying too far from the goals. I did wonder though at the format, thinking it was a little constrained by an understanding of the data. Experience tells me that understanding of the data more often starts when SAS is applied to it, and everyone learns something new.

In this case, I thought the format could be created from a table of values, and it would not have the risks of miscoding, or missing values. The following code will create a format from 1 to 100 and can be extended easily as far as is needed.

[Pre]
Data CNTLIN;
Retain FMTNAME "AGEF"
TYPE "N";
Length LABEL $60;
Do START = 1 To 100 By 1;
LABEL = CompBl( Put( START, WORDS30.) || "years old");
Output;
End;
Run;

Proc Format CntlIn = CNTLIN;
Run;
[/Pre]

Kind regards

David
Cynthia_sas
SAS Super FREQ
David...I probably would have found the MIN and MAX ages in the data set and then used them for the start and stop values of the loop. That's because if you're going to use PRELOADFMT, then you want to only have it fill in the blanks from the MIN to the MAX age -- not from 1 to 100.

Indeed, you are correct. I built the format based on the data that was presented in the question, because that was the quickest way to illustrate the different reporting techniques. I figured that if my top and bottom limit for format ages was problematic, that somebody would point it out.

And now, we've had this lovely conversation about how to use reporting procedures to accomplish the task and learned more about PROC FORMAT besides! 😉

cynthia
deleted_user
Not applicable
I confess, my answer was only an A- because I didn't take account of the data, and sought a generic solution. Even though I had also commented that data understanding was important. So, we have the A+ answer to find the boundaries and deliver a format that would be conservative.

I didn't even consider the preloadfmt implications Cynthia, thank you for sharing them.

Kind regards

David

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1013 views
  • 0 likes
  • 3 in conversation