I have a given data that indicates user active proportion for each month. It depends on the active days during each month. Given this, I tried to write loop to get active proportion for each quarter. Just cannot find a better way to index the loop. For example, one user active only in January, then the active proportion during first quarter (Jan. Feb. and March) would be 31/(31+28+31), or the average of active proportion for Jan, Feb and March.
Thanks for any help!
given data like:
userID | ac12022 | ac22022 | ac32022 | ac42022 | ac52022 | ac62022 | ac72022 | ac82022 | ac92022 | ac102022 | ac112022 | ac122022 |
1 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
2 | 0.00 | 0.38 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
wanted data like: (e.g. first quarter active proportion = (0 + 0.38 +1) / 3
userID | acq12022 | acq22022 | acq32022 | acq42022 |
1 | 1.00 | 1.00 | 1.00 | 1.00 |
2 | 0.46 | 1.00 | 1.00 | 1.00 |
You are working with variables which have calendar information in the variable name. This is never a good idea, and it is the cause of all of your programming problems. Better to create a long data set with calendar information in a variable called MONTH.
So lets create example data in the improved (long) layout.
data example;
do user_id=1 to 2;
do month=1 to 12;
mdy=mdy(month,1,2022);
/* create random value for proportion */
prop=rand('uniform');
output;
end;
end;
run;
Now the programming to get quarterly averages is simple, no looping needed.
proc summary data=example nway;
class user_id mdy;
format mdy yyq6.;
var prop;
output out=_stats_ mean=quarter_mean;
run;
Thanks for your reply. It works on the sample data I provided. The real data I am working on has more other types of variables. And I cannot change the whole data structure to long data. Ideally, I would like just add quarterly active proportion variable after each quarter. Part of the data like:
userID | ac12022 | ac22022 | ac32022 | acq12022 | ac42022 | ac52022 | ac62022 | acq22022 | ac72022 |
1 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
2 | 0.00 | 0.38 | 1.00 | 0.79 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
@sasecn wrote:
And I cannot change the whole data structure to long data.
Yes you can. And yes you should. This makes the calculations much much much much easier. From there, you can create the report in any format you want.
@sasecn wrote:
Thanks for your reply. It works on the sample data I provided. The real data I am working on has more other types of variables. And I cannot change the whole data structure to long data. Ideally, I would like just add quarterly active proportion variable after each quarter. Part of the data like:
userID ac12022 ac22022 ac32022 acq12022 ac42022 ac52022 ac62022 acq22022 ac72022 1 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 2 0.00 0.38 1.00 0.79 1.00 1.00 1.00 1.00 1.00
Please note that inserting or adding the quarter value is changing the structure of the data set. You can't add a "column" to a data set without replacing the entire set in SAS. Any of the steps you take to modify the data before creating that final set is entirely up to you and the provider would never know.
Additional to the bit about the date information in the variable name you chose a very complicated method. If you had used a Year/Mo approach, such as AC20221 (better AC202201) then you could reference variables as lists and get them to process in order: Ac202201-Ac202203 for instance in many place would process those three sequential named variables in order.
The process you are currently contemplating means that the code that works this time would need significant revision, i.e. new variable names in more places and conditional executions when you get the next data with 1, 2 or 3 more months added. For each month.
Maxim 19: Long Beats Wide.
Do not keep data (e.g. dates or date-related information) in structure (variable names). If you have other, non-sequential variables, keep them in one dataset with one observation per userid, but transform the other values into a long dataset.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.