Hello Great SAS Gurus,
I have a longitudinal data set infile, 1000 rows of records, 63 columns (variables). Within the 63 variables, 3 varibles are fixed (not time dependent), called VarF and VarN, and cstatus_0. Other 60 variables are time-indexed, they are cstatus_k, status_k, m_k, for k = 1 to 20 and k is the year index. cstastus_k and status_k variables are simply indicator variables. My goal is quite simple, to summarize this data set to get one record of summarization consisting of time-indexed variables. This could be achieved through proc sql, data step, or proc means. Say, if I would like to do this by proc sql
proc sql;
create table outfile as
select
sum((cstatus_(k-1)=0)*VarF) format=comma25. as A_k,
sum((cstatus_(k-1)=0)*VarN) format=comma25. as B_k,
sum((cstatus_(k-1)=0)*m_k) format=comma25. as C_k,
sum(calculated C_k *VarN) as D_k,
sum(status_k=1) as E_k,
sum((status_k=1)*VarN) as F_k
from infile;
quit;
I actually want the above summary be done for k=1 to 20 so that I can have a summary file outfile consisting one record including following variables like
A_1, B_1, C_1, D_1, E_1, F_1, A_2, B_2, C_2, D_2, E_2, F_2, ...., A_20, B_20, C_20, D_20, E_20, F_20
I prefer not to transpose my infile to a long format file (ie, create a column to record the year index). Are there any efficient ways to achive my goal through abbreviated variable list, arrays, macro variables, looping with proc sql, or data step, or other summary procedures like proc means? Thanks.
Can you post a small example of your data and expected output.
Too much text for a programmer/math person like me
VarF | VarN | M1 | Status1 | CStatus1 | M2 | Status2 | CStatus2 | M3 | Status3 | CStatus3 |
100000 | 11054.4 | 0.000297 | 0 | 0 | 0.000331 | 0 | 0 | 0.000368 | 0 | 0 |
100000 | 10963.28 | 0.000297 | 0 | 0 | 0.000331 | 0 | 0 | 0.000368 | 0 | 0 |
100000 | 11652.98 | 0.000431 | 0 | 0 | 0.000446 | 0 | 0 | 0.000498 | 0 | 0 |
200000 | 23408.25 | 0.00148 | 0 | 0 | 0.001582 | 0 | 0 | 0.001655 | 0 | 0 |
120000 | 14283.6 | 0.001615 | 0 | 0 | 0.001707 | 0 | 0 | 0.001766 | 0 | 0 |
120000 | 14323.5 | 0.001615 | 0 | 0 | 0.001707 | 0 | 0 | 0.001766 | 0 | 0 |
100000 | 11643.45 | 0.004211 | 0 | 0 | 0.004503 | 0 | 0 | 0.004886 | 0 | 0 |
100000 | 10947.15 | 0.000512 | 0 | 0 | 0.000531 | 0 | 0 | 0.000609 | 0 | 0 |
100000 | 10851.9 | 0.000512 | 0 | 0 | 0.000531 | 0 | 0 | 0.000609 | 0 | 0 |
200000 | 5258.85 | 0.000578 | 0 | 0 | 0.000621 | 0 | 0 | 0.000679 | 0 | 0 |
The above is the a small example of infile, I reduce 1000 rows to 10 rows, and 20 time-indexed variables to 3 time-indexed variabls. Expected output will be
as above with 20 replaced by 3, one summary record with varibles A_1, ...., F_1, A_2, ..., F_2, A_3,..., F_3
Can you include the output you'd want?
From your SQL code you also refer to cstatus_(k-1) - but K starts at 1?
Actually my infile contains a column called cstatus0 (or cstatus_0), which is missing in the small sample file above. The output is the output from proc sql. I don't know what output you are refering to.
The results you'd like to see based on the input provided. The SQL you provided won't work with the data provided.
Have the battle in solving a problem is defining it, so both you and someone else can understand it.
Hi,
I think it quite eassy by using %DO Loop within proc sql...But i am able to understand the Select caluse in your Proc Sql...However, if you want perform repeated action on so many variables in SAS than use %DO Loop within Macro Definition...
%macro test;
proc sql feedback;
create table outfile as
select
%do i = 1 %to 20;
sum((cstatus_&i. = 0)*VarF) format=comma25. as A_&i.,
sum((cstatus_&i. = 0)*VarN) format=comma25. as B_&i.,
sum((cstatus_&i. = 0)*m_&i.) format=comma25. as C_&i.,
sum(calculated C_&i. *VarN) as D_&i.,
sum(status_&i. = 1) as E_&i.,
sum((status_&i. = 1)*VarN) as F_&i.
%end;
from infile;
quit;
%mend;
%test;
Hope it works....
Thanks,
Urvish
I think an array in a data step is probably better and easier to maintain.
Hi Reeza,
Thank you for the response. I would like to see your data step way to solve the same problem. Like what Urvish did, you just show the code and main idea, don't worry about small details.
Yes Reeza it is possible by using SAS Arrays in a data step...
Thanks,
Urvish
Hi Urvish,
Thanks for your code which works. I used a similar code as yours before, but I got an error message. Then I thought that proc sql may not allow loops. Finally I found that error was caused by using sum() function to calculated variable, which was not allowed in sql. I guess this might be the reason that Reeza said my code did not work. Once I replaced the calculated variable by the expression used to calculate it, and used your looping, then it worked. Another thing I recognized is that your code should be modified by the following:
%macro test;
proc sql feedback;
create table outfile as
select
%do i = 1 %to 19;
sum((cstatus_&i. = 0)*VarF) format=comma25. as A_&i.,
sum((cstatus_&i. = 0)*VarN) format=comma25. as B_&i.,
sum((cstatus_&i. = 0)*m_&i.) format=comma25. as C_&i.,
sum(calculated C_&i. *VarN) as D_&i.,
sum(status_&i. = 1) as E_&i.,
sum((status_&i. = 1)*VarN) as F_&i.,
%end;
sum((cstatus_&20. = 0)*VarF) format=comma25. as A_&20.,
sum((cstatus_&20. = 0)*VarN) format=comma25. as B_&20.,
sum((cstatus_&20. = 0)*m_&20.) format=comma25. as C_&20.,
sum(calculated C_&20. *VarN) as D_&20.,
sum(status_&20. = 1) as E_&20.,
sum((status_&20. = 1)*VarN) as F_&20.
from infile;
quit;
%mend;
%test;
since proc sql need to use comma to separate variables. I don't know why you use period after each &var. I removed them, it also worked. Thanks again for your help.
Hi...
Ya you are right...Actually when i saw your Question posted in this forum, i was at work...and i do not have SAS at my local so can't copy and paste your sample data to my server...So i thought it works but it's good that you god idea about how to get output...
Happy to discuss with you...
Thanks,
Urvish
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.