DATA Step, Macro, Functions and more

How to summarize dynamically many time-indexed variables in proc SQL, data step, or proc means ?

Reply
Contributor
Posts: 50

How to summarize dynamically many time-indexed variables in proc SQL, data step, or proc means ?

Hello Great SAS Gurus,

I have a longitudinal data set infile, 1000 rows of records, 63 columns (variables). Within the 63 variables, 3 varibles are fixed (not time dependent), called VarF and VarN, and cstatus_0. Other 60 variables are time-indexed, they are cstatus_k, status_k, m_k, for k = 1 to 20 and k is the year index. cstastus_k and status_k variables are simply indicator variables. My goal is quite simple, to summarize this data set to get one record of summarization consisting of time-indexed variables. This could be achieved through proc sql, data step, or proc means. Say, if I would like to do this by proc sql

proc sql;

  create table outfile as

  select   

    sum((cstatus_(k-1)=0)*VarF) format=comma25. as A_k,

    sum((cstatus_(k-1)=0)*VarN) format=comma25. as B_k,

    sum((cstatus_(k-1)=0)*m_k) format=comma25. as C_k,

    sum(calculated C_k *VarN) as D_k,

    sum(status_k=1) as E_k,

    sum((status_k=1)*VarN) as F_k

  from infile;

quit;

I actually want the above summary be done for k=1 to 20 so that I can have a summary file outfile consisting one record including following variables like

A_1, B_1, C_1, D_1, E_1, F_1, A_2, B_2, C_2, D_2, E_2, F_2, ...., A_20, B_20, C_20, D_20, E_20, F_20

I prefer not to transpose my infile to a long format file (ie, create a column to record the year index). Are there any efficient ways to achive my goal through abbreviated variable list, arrays, macro variables, looping with proc sql, or data step, or other summary procedures like proc means? Thanks.

Super User
Posts: 19,820

Re: How to summarize dynamically many time-indexed variables in proc SQL, data step, or proc means ?

Can you post a small example of your data and expected output.

Too much text for a programmer/math person like me Smiley Wink

Contributor
Posts: 50

Re: How to summarize dynamically many time-indexed variables in proc SQL, data step, or proc means ?

 

VarF

 

VarN

 

M1

 

Status1

 

CStatus1

 

M2

 

Status2

 

CStatus2

 

M3

 

Status3

 

CStatus3

 

100000

 

11054.4

 

0.000297

 

0

 

0

 

0.000331

 

0

 

0

 

0.000368

 

0

 

0

 

100000

 

10963.28

 

0.000297

 

0

 

0

 

0.000331

 

0

 

0

 

0.000368

 

0

 

0

 

100000

 

11652.98

 

0.000431

 

0

 

0

 

0.000446

 

0

 

0

 

0.000498

 

0

 

0

 

200000

 

23408.25

 

0.00148

 

0

 

0

 

0.001582

 

0

 

0

 

0.001655

 

0

 

0

 

120000

 

14283.6

 

0.001615

 

0

 

0

 

0.001707

 

0

 

0

 

0.001766

 

0

 

0

 

120000

 

14323.5

 

0.001615

 

0

 

0

 

0.001707

 

0

 

0

 

0.001766

 

0

 

0

 

100000

 

11643.45

 

0.004211

 

0

 

0

 

0.004503

 

0

 

0

 

0.004886

 

0

 

0

 

100000

 

10947.15

 

0.000512

 

0

 

0

 

0.000531

 

0

 

0

 

0.000609

 

0

 

0

 

100000

 

10851.9

 

0.000512

 

0

 

0

 

0.000531

 

0

 

0

 

0.000609

 

0

 

0

 

200000

 

5258.85

 

0.000578

 

0

 

0

 

0.000621

 

0

 

0

 

0.000679

 

0

 

0

The above is the a small example of infile, I reduce 1000 rows to 10 rows, and 20 time-indexed variables to 3 time-indexed variabls. Expected output will be

as above with 20 replaced by 3, one summary record with varibles A_1, ...., F_1, A_2, ..., F_2, A_3,..., F_3

Super User
Posts: 19,820

Re: How to summarize dynamically many time-indexed variables in proc SQL, data step, or proc means ?

Can you include the output you'd want?

From your SQL code you also refer to cstatus_(k-1) - but K starts at 1?

Contributor
Posts: 50

Re: How to summarize dynamically many time-indexed variables in proc SQL, data step, or proc means ?

Actually my infile contains a column called cstatus0 (or cstatus_0), which is missing in the small sample file above.  The output is the output from proc sql. I don't know what output you are refering to.

Super User
Posts: 19,820

Re: How to summarize dynamically many time-indexed variables in proc SQL, data step, or proc means ?

The results you'd like to see based on the input provided.  The SQL you provided won't work with the data provided. 

Have the battle in solving a problem is defining it, so both you and someone else can understand it.

Regular Contributor
Posts: 195

Re: How to summarize dynamically many time-indexed variables in proc SQL, data step, or proc means ?

Hi,

I think it quite eassy by using %DO Loop within proc sql...But i am able to understand the Select caluse in your Proc Sql...However, if you want perform repeated action on so many variables in SAS than use %DO Loop within Macro Definition...

%macro test;

        proc sql feedback;

               create table outfile as

               select

                        %do i = 1 %to 20;

                               sum((cstatus_&i. = 0)*VarF) format=comma25. as A_&i.,

                               sum((cstatus_&i. = 0)*VarN) format=comma25. as B_&i.,

                               sum((cstatus_&i. = 0)*m_&i.) format=comma25. as C_&i.,

                               sum(calculated C_&i. *VarN) as D_&i.,

                               sum(status_&i. = 1) as E_&i.,

                               sum((status_&i. = 1)*VarN) as F_&i.

                        %end;

               from infile;

         quit;

%mend;


%test;

Hope it works....

Thanks,

Urvish

Super User
Posts: 19,820

Re: How to summarize dynamically many time-indexed variables in proc SQL, data step, or proc means ?

Posted in reply to UrvishShah

I think an array in a data step is probably better and easier to maintain.

Contributor
Posts: 50

Re: How to summarize dynamically many time-indexed variables in proc SQL, data step, or proc means ?

Hi Reeza,

Thank you for the response. I would like to see your data step way to solve the same problem. Like what Urvish did, you just show the code and main idea, don't worry about small details.

Regular Contributor
Posts: 195

Re: How to summarize dynamically many time-indexed variables in proc SQL, data step, or proc means ?

Yes Reeza it is possible by using SAS Arrays in a data step...

Thanks,

Urvish


Contributor
Posts: 50

Re: How to summarize dynamically many time-indexed variables in proc SQL, data step, or proc means ?

Posted in reply to UrvishShah

Hi Urvish,

Thanks for your code which works. I used a similar code as yours before, but I got an error message. Then I thought that proc sql may not allow loops. Finally I found that error was caused by using sum() function to calculated variable, which was not allowed in sql. I guess this might be the reason that Reeza said my code did not work. Once I replaced the calculated variable by the expression used to calculate it, and used your looping, then it worked. Another thing I recognized is that your code should be modified by the following:

%macro test;

        proc sql feedback;

               create table outfile as

               select

                        %do i = 1 %to 19;

                               sum((cstatus_&i. = 0)*VarF) format=comma25. as A_&i.,

                               sum((cstatus_&i. = 0)*VarN) format=comma25. as B_&i.,

                               sum((cstatus_&i. = 0)*m_&i.) format=comma25. as C_&i.,

                               sum(calculated C_&i. *VarN) as D_&i.,

                               sum(status_&i. = 1) as E_&i.,

                               sum((status_&i. = 1)*VarN) as F_&i.,

                        %end;

                             sum((cstatus_&20. = 0)*VarF) format=comma25. as A_&20.,  

                               sum((cstatus_&20. = 0)*VarN) format=comma25. as B_&20.,

                               sum((cstatus_&20. = 0)*m_&20.) format=comma25. as C_&20.,

                               sum(calculated C_&20. *VarN) as D_&20.,

                               sum(status_&20. = 1) as E_&20.,

                               sum((status_&20. = 1)*VarN) as F_&20.

               from infile;

         quit;

%mend;


%test;

since proc sql need to use comma to separate variables. I don't know why you use period after each &var. I removed them, it also worked. Thanks again for your help.

Regular Contributor
Posts: 195

Re: How to summarize dynamically many time-indexed variables in proc SQL, data step, or proc means ?

Hi...

Ya you are right...Actually when i saw your Question posted in this forum, i was at work...and i do not have SAS at my local so can't copy and paste your sample data to my server...So i thought it works but it's good that you god idea about how to get output...

Happy to discuss with you...

Thanks,

Urvish

Ask a Question
Discussion stats
  • 11 replies
  • 325 views
  • 0 likes
  • 3 in conversation