SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sasgorilla
Pyrite | Level 9

I have a demographic data set with no missing data that covers a 12 month period. It is built like the below: 

data a;
input date :monyy. age sex race count;
format date monyy.;
datalines;
JAN18 24 0 0 1 
JAN18 22 1 1 3
FEB18 19 1 2 1
MAR18 20 0 0 4
MAR18 20 1 4 2
;
RUN;

The categorical variables are numerical with values representing different levels (e.g., sex=0  is male, sex=1 is female; race=0 is white, =1 is black, =2 is asian, etc.). Count means how many individuals with that combination of preceding variables are in that month sample. There are no individual identifiers in the sample.

 

The total number of individuals and distribution of levels of variables changes slightly each month, and I'm wondering how best to find the one month average for each variable and value due to some other monthly analyses.

 

In other words, I want to average the 12 month period demographics to one month for a "table 1" since I don't have individual information of how they may come and go over the study period (i.e., if there are 12000 people in the entire study, the one-month average n=1000; if there are 9000 white people over the 12 month period then white people would represent 750 of the one month sample, etc.). I would want this to round to the nearest whole person. 

 

Any ideas on how best to code? I appreciate the help. 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

It looks like you just want "sum of subject/number of month" for each level of each variabl

 

 

 

data a;
input date :monyy. age sex race count;
format date monyy.;
datalines;
JAN18 24 0 0 1 
JAN18 22 1 1 3
FEB18 19 1 2 1
MAR18 20 0 0 4
MAR18 20 1 4 2
;
RUN;
data temp;
 set a;
 do i=1 to count;
  output;
 end;
drop i count;
run;

proc sql;
create table want as
select 'age' as var,age as levels,int(count(*)/count(distinct date)) as average
 from temp
  group by age
union all
select 'sex' as var,sex,int(count(*)/count(distinct date))
 from temp
  group by sex
union all
select 'race' as var,race,int(count(*)/count(distinct date))
 from temp
  group by race
union all
select 'Total' as var,.,int(count(*)/count(distinct date))
 from temp
;
quit;

 

 

 

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26

Honestly, a larger data set (covering a whole year) even if it is fake data, and showing us the desired output would help us understand the problem better.

 

It sounds like you want some sort of sum or average or weighted sum or weighted average over the entire year. PROC SUMMARY will do this. It can also easily slice the data into different results for combinations of age sex and race.

 

--
Paige Miller
Ksharp
Super User

It looks like you just want "sum of subject/number of month" for each level of each variabl

 

 

 

data a;
input date :monyy. age sex race count;
format date monyy.;
datalines;
JAN18 24 0 0 1 
JAN18 22 1 1 3
FEB18 19 1 2 1
MAR18 20 0 0 4
MAR18 20 1 4 2
;
RUN;
data temp;
 set a;
 do i=1 to count;
  output;
 end;
drop i count;
run;

proc sql;
create table want as
select 'age' as var,age as levels,int(count(*)/count(distinct date)) as average
 from temp
  group by age
union all
select 'sex' as var,sex,int(count(*)/count(distinct date))
 from temp
  group by sex
union all
select 'race' as var,race,int(count(*)/count(distinct date))
 from temp
  group by race
union all
select 'Total' as var,.,int(count(*)/count(distinct date))
 from temp
;
quit;

 

 

 

sasgorilla
Pyrite | Level 9

This looks like exactly what I am looking for, thanks! I will add the remaining variables and I believe this will get me everything I need. I appreciate your help. 

PaigeMiller
Diamond | Level 26

I don't think @Ksharp has provided the best answer. First, PROC SUMMARY will be a lot less typing and probably execute faster. In the case where your real data set has more than the three variables (age sex race), you will have to do a lot more typing in SQL than you would have to do in PROC SUMMARY.

 

If there is a need to slice the data by two (or more) variables at one time, such as all combinations of age and sex, PROC SUMMARY handles this easily, with little extra coding. The extra coding needed for an SQL solution is not obvious.

 

Furthermore, if the real data has missing values in variable COUNT, SQL gives the wrong answers, while SUMMARY gives the right answers.

 

PROC SUMMARY is a much better tool here.

--
Paige Miller
mkeintz
PROC Star

I'm not sure I understand exactly what you want.  I guess you want the average of count for each level of RACE, of AGE, and of SEX.  If so, then this would generate a report of those averages:

 

data a;
  input date :monyy. age sex race count;
  format date monyy.;
datalines;
JAN18 24 0 0 1 
JAN18 22 1 1 3
FEB18 19 1 2 1
MAR18 20 0 0 4
MAR18 20 1 4 2
RUN;

proc means data=a;
  class age sex race ;
  var count;
  types age sex race;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

So you are starting with the counts per combination.  Basically your starting point is the output from this TABLES statement of PROC FREQ against your individual level data.

tables date*age*sex*race / noprint out=A ;

But what is it that want to get from this? I notice that some of your possible values for RACE etc do not actually appear in the data.  To get summaries for those levels you might want to use the completetypes option of PROC MEANS/SUMMARY.

data a;
  input date :monyy. age sex race count;
  format date monyy7.;
datalines;
JAN18 24 0 0 1 
JAN18 22 1 1 3
FEB18 19 1 2 1
MAR18 20 0 0 4
MAR18 20 1 4 2
;

proc format ;
value sex 0='male' 1='female';
value race 0='white' 1='black' 2='asian' 3='other race1' 4='other race2';
value age 19='19' 20='20' 21='21' 22='22' 23='23' 24='24';
value months '01JAN2018'd='2018-01' '01FEB2018'd='2018-02' '01MAR2018'd='2018-03' ;
run;

proc means mean min max completetypes;
  class date age sex race / preloadfmt ;
  ways 1;
  var count;
  format date months. age age. sex sex. race race. ;
run;

Tom_1-1733684198109.png

 

 

Ksharp
Super User
 

I had a question that is a little different from the topic I asked, so I wanted to message you directly. 

 

Is there a way in proc sql to add a column that provides the relative frequency of each "average" by level of the variable? 

For example, if "sex" had two levels and my "0" average was 4000 and my "1" average was 1000, I would want to see .8 and .2, respectively. 

 

Thank you. 
data a;
input date :monyy. age sex race count;
format date monyy.;
datalines;
JAN18 24 0 0 1 
JAN18 22 1 1 3
FEB18 19 1 2 1
MAR18 20 0 0 4
MAR18 20 1 4 2
;
RUN;
data temp;
 set a;
 do i=1 to count;
  output;
 end;
drop i count;
run;

proc sql;
create table want as
select 'age' as var,age as levels,int(count(*)/count(distinct date)) as average,
 calculated average/
(select sum(average) from (select int(count(*)/count(distinct date)) as average from temp group by age)) as rate
 from temp
  group by age
;
quit;
Ksharp
Super User
data a;
input date :monyy. age sex race count;
format date monyy.;
datalines;
JAN18 24 0 0 1 
JAN18 22 1 1 3
FEB18 19 1 2 1
MAR18 20 0 0 4
MAR18 20 1 4 2
;
RUN;
data temp;
 set a;
 do i=1 to count;
  output;
 end;
drop i count;
run;

proc sql;
create table want as
select 'age' as var,age as levels,int(count(*)/count(distinct date)) as average,
 calculated average/
(select sum(average) from (select int(count(*)/count(distinct date)) as average from temp group by age)) as rate
 from temp
  group by age

union all

select 'sex' as var,sex,int(count(*)/count(distinct date)) as average,
 calculated average/
(select sum(average) from (select int(count(*)/count(distinct date)) as average from temp group by sex)) as rate
 from temp
  group by sex

union all

select 'race' as var,race,int(count(*)/count(distinct date)) as average,
 calculated average/
(select sum(average) from (select int(count(*)/count(distinct date)) as average from temp group by race)) as rate
 from temp
  group by race

;
quit;

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1200 views
  • 4 likes
  • 5 in conversation