SAS Programming

sasgorilla · Posted 12-07-2024 11:28 AM

I have a demographic data set with no missing data that covers a 12 month period. It is built like the below:

data a;
input date :monyy. age sex race count;
format date monyy.;
datalines;
JAN18 24 0 0 1 
JAN18 22 1 1 3
FEB18 19 1 2 1
MAR18 20 0 0 4
MAR18 20 1 4 2
;
RUN;

The categorical variables are numerical with values representing different levels (e.g., sex=0 is male, sex=1 is female; race=0 is white, =1 is black, =2 is asian, etc.). Count means how many individuals with that combination of preceding variables are in that month sample. There are no individual identifiers in the sample.

The total number of individuals and distribution of levels of variables changes slightly each month, and I'm wondering how best to find the one month average for each variable and value due to some other monthly analyses.

In other words, I want to average the 12 month period demographics to one month for a "table 1" since I don't have individual information of how they may come and go over the study period (i.e., if there are 12000 people in the entire study, the one-month average n=1000; if there are 9000 white people over the 12 month period then white people would represent 750 of the one month sample, etc.). I would want this to round to the nearest whole person.

Any ideas on how best to code? I appreciate the help.

Ksharp · Posted 12-07-2024 08:51 PM

It looks like you just want "sum of subject/number of month" for each level of each variabl

data a;
input date :monyy. age sex race count;
format date monyy.;
datalines;
JAN18 24 0 0 1 
JAN18 22 1 1 3
FEB18 19 1 2 1
MAR18 20 0 0 4
MAR18 20 1 4 2
;
RUN;
data temp;
 set a;
 do i=1 to count;
  output;
 end;
drop i count;
run;

proc sql;
create table want as
select 'age' as var,age as levels,int(count(*)/count(distinct date)) as average
 from temp
  group by age
union all
select 'sex' as var,sex,int(count(*)/count(distinct date))
 from temp
  group by sex
union all
select 'race' as var,race,int(count(*)/count(distinct date))
 from temp
  group by race
union all
select 'Total' as var,.,int(count(*)/count(distinct date))
 from temp
;
quit;

View solution in original post

PaigeMiller · Posted 12-07-2024 12:07 PM

Honestly, a larger data set (covering a whole year) even if it is fake data, and showing us the desired output would help us understand the problem better.

It sounds like you want some sort of sum or average or weighted sum or weighted average over the entire year. PROC SUMMARY will do this. It can also easily slice the data into different results for combinations of age sex and race.

--
Paige Miller

Ksharp · Posted 12-07-2024 08:51 PM

It looks like you just want "sum of subject/number of month" for each level of each variabl

data a;
input date :monyy. age sex race count;
format date monyy.;
datalines;
JAN18 24 0 0 1 
JAN18 22 1 1 3
FEB18 19 1 2 1
MAR18 20 0 0 4
MAR18 20 1 4 2
;
RUN;
data temp;
 set a;
 do i=1 to count;
  output;
 end;
drop i count;
run;

proc sql;
create table want as
select 'age' as var,age as levels,int(count(*)/count(distinct date)) as average
 from temp
  group by age
union all
select 'sex' as var,sex,int(count(*)/count(distinct date))
 from temp
  group by sex
union all
select 'race' as var,race,int(count(*)/count(distinct date))
 from temp
  group by race
union all
select 'Total' as var,.,int(count(*)/count(distinct date))
 from temp
;
quit;

sasgorilla · Posted 12-08-2024 12:12 AM

This looks like exactly what I am looking for, thanks! I will add the remaining variables and I believe this will get me everything I need. I appreciate your help.

PaigeMiller · Posted 12-08-2024 04:40 AM

I don't think @Ksharp has provided the best answer. First, PROC SUMMARY will be a lot less typing and probably execute faster. In the case where your real data set has more than the three variables (age sex race), you will have to do a lot more typing in SQL than you would have to do in PROC SUMMARY.

If there is a need to slice the data by two (or more) variables at one time, such as all combinations of age and sex, PROC SUMMARY handles this easily, with little extra coding. The extra coding needed for an SQL solution is not obvious.

Furthermore, if the real data has missing values in variable COUNT, SQL gives the wrong answers, while SUMMARY gives the right answers.

PROC SUMMARY is a much better tool here.

--
Paige Miller

mkeintz · Posted 12-07-2024 10:09 PM

I'm not sure I understand exactly what you want. I guess you want the average of count for each level of RACE, of AGE, and of SEX. If so, then this would generate a report of those averages:

data a;
  input date :monyy. age sex race count;
  format date monyy.;
datalines;
JAN18 24 0 0 1 
JAN18 22 1 1 3
FEB18 19 1 2 1
MAR18 20 0 0 4
MAR18 20 1 4 2
RUN;

proc means data=a;
  class age sex race ;
  var count;
  types age sex race;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Tom · Posted 12-08-2024 01:56 PM

So you are starting with the counts per combination. Basically your starting point is the output from this TABLES statement of PROC FREQ against your individual level data.

tables date*age*sex*race / noprint out=A ;

But what is it that want to get from this? I notice that some of your possible values for RACE etc do not actually appear in the data. To get summaries for those levels you might want to use the completetypes option of PROC MEANS/SUMMARY.

data a;
  input date :monyy. age sex race count;
  format date monyy7.;
datalines;
JAN18 24 0 0 1 
JAN18 22 1 1 3
FEB18 19 1 2 1
MAR18 20 0 0 4
MAR18 20 1 4 2
;

proc format ;
value sex 0='male' 1='female';
value race 0='white' 1='black' 2='asian' 3='other race1' 4='other race2';
value age 19='19' 20='20' 21='21' 22='22' 23='23' 24='24';
value months '01JAN2018'd='2018-01' '01FEB2018'd='2018-02' '01MAR2018'd='2018-03' ;
run;

proc means mean min max completetypes;
  class date age sex race / preloadfmt ;
  ways 1;
  var count;
  format date months. age age. sex sex. race race. ;
run;

Ksharp · Posted 12-10-2024 08:39 PM

 

I had a question that is a little different from the topic I asked, so I wanted to message you directly. 

 

Is there a way in proc sql to add a column that provides the relative frequency of each "average" by level of the variable? 

For example, if "sex" had two levels and my "0" average was 4000 and my "1" average was 1000, I would want to see .8 and .2, respectively. 

 

Thank you.

data a;
input date :monyy. age sex race count;
format date monyy.;
datalines;
JAN18 24 0 0 1 
JAN18 22 1 1 3
FEB18 19 1 2 1
MAR18 20 0 0 4
MAR18 20 1 4 2
;
RUN;
data temp;
 set a;
 do i=1 to count;
  output;
 end;
drop i count;
run;

proc sql;
create table want as
select 'age' as var,age as levels,int(count(*)/count(distinct date)) as average,
 calculated average/
(select sum(average) from (select int(count(*)/count(distinct date)) as average from temp group by age)) as rate
 from temp
  group by age
;
quit;

Ksharp · Posted 12-10-2024 08:46 PM

data a;
input date :monyy. age sex race count;
format date monyy.;
datalines;
JAN18 24 0 0 1 
JAN18 22 1 1 3
FEB18 19 1 2 1
MAR18 20 0 0 4
MAR18 20 1 4 2
;
RUN;
data temp;
 set a;
 do i=1 to count;
  output;
 end;
drop i count;
run;

proc sql;
create table want as
select 'age' as var,age as levels,int(count(*)/count(distinct date)) as average,
 calculated average/
(select sum(average) from (select int(count(*)/count(distinct date)) as average from temp group by age)) as rate
 from temp
  group by age

union all

select 'sex' as var,sex,int(count(*)/count(distinct date)) as average,
 calculated average/
(select sum(average) from (select int(count(*)/count(distinct date)) as average from temp group by sex)) as rate
 from temp
  group by sex

union all

select 'race' as var,race,int(count(*)/count(distinct date)) as average,
 calculated average/
(select sum(average) from (select int(count(*)/count(distinct date)) as average from temp group by race)) as rate
 from temp
  group by race

;
quit;

SAS Programming

How to find average values of variables across a time period

Re: How to find average values of variables across a time period

Re: How to find average values of variables across a time period

Re: How to find average values of variables across a time period

Re: How to find average values of variables across a time period

Re: How to find average values of variables across a time period

Re: How to find average values of variables across a time period

Re: How to find average values of variables across a time period

Re: How to find average values of variables across a time period

Re: How to find average values of variables across a time period

Overlapping time periods

Conditional replacement of a variable values

ModelOps, MLOps, and Finding Value in Analytics

count average time interval

find the count of variable by group

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...