Help using Base SAS procedures

combined dataset with variance from different files

Reply
N/A
Posts: 0

combined dataset with variance from different files

I want to calculate the variance and the sum of the squared deviations expressed in average observation unit (sd) (basically variance with mean in stead of df as divisor) for different files and then combine all these measures into one file which contains all the variances and one file that contains all the sd

so for instance from the next three files

code....a1.....a2......a3
1.........1.......0.2.....0.4
1.........0.8....0.2.....0.4
1.........0.8....0.2.....0.3
1.........0.9....0.2.....0.4

code...a1.....a2.....a3.....a4
2........1.......0.2....0.4....0.2
2........0.8....0.1....0.4....0.1
2........0.9....0.2....0.4....0.1
2........0.9....0.2....0.4....0.1

code....a2.....a3.....a5
3.........0.2....0.4....0.5
3.........0.2....0.3....0.6
3.........0.3....0.3....0.5
3.........0.2....0.4....0.5

I want to create two new files, one with the variances and one with sd, so


title=variance

code...a1.........a2.........a3.........a4.........a5....
1........0.009.....0.000.....0.003........................
2........0.007.....0.003.....0.000....0.003............
3.....................0.003.....0.003................0.002

and

title=sd

code...a1.........a2.........a3.........a4.........a5....
1........0.031.....0.000.....0.020........................
2........0.022.....0.043.....0.000....0.060............
3.....................0.033.....0.029................0.014

I tried to do this with proc means, but that only supports variance and in vardef= mean is not a valid option. Another problem that i am facing is the fact that i have about 5000 files which all have a different number of variables, which is the reason why manual is not an option.

I am a total SAS rookie and I am struggling with it for a while now, so any suggestions would be very welcome
PROC Star
Posts: 1,760

Re: combined dataset with variance from different files

Posted in reply to deleted_user
Not the simplest thing for a beginner, but not overly complex either.
This should do what you want.
[pre]
data RR1;
input CODE A1 A2 A3;
cards;
1 1 0.2 0.4
1 0.8 0.2 0.4
1 0.8 0.2 0.3
1 0.9 0.2 0.4
data RR2;
input CODE A1 A2 A3 A4;
cards;
2 1 0.2 0.4 0.2
2 0.8 0.1 0.4 0.1
2 0.9 0.2 0.4 0.1
2 0.9 0.2 0.4 0.1
data RR3;
input CODE A2 A3 A5;
cards;
3 0.2 0.4 0.5
3 0.2 0.3 0.6
3 0.3 0.3 0.5
3 0.2 0.4 0.5
run;

data T/view=T; * group all tables into one;
set
%macro t;
%do i=1 %to 3;
RR&i
%end;
%mend;
%t
;
proc summary data=T nway noprint; * derive variances, means and freqs;
class CODE;
var A1-A5;
output out=T_SUM var= mean= /autoname;
run;

data VARIANCE
SD;
set T_SUM;
array _A(5); * create an array to store final columns;
do I=1 to 5; * store variances into array;
_A[ I]=input(VVALUEX('A'||compress(put(I,1.))||'_VAR'),32.);
end;
output VARIANCE; * output variances;
do I=1 to 5; * derive new value=variance*df/mean;
_A[ I]=input(VVALUEX('A'||compress(put(I,1.))||'_VAR'),32.)
*(_FREQ_-1)
/input(VVALUEX('A'||compress(put(I,1.))||'_MEAN'),32.);
end;
output SD; * output squared deviations;
keep CODE _A: ; * only keep code and array elements in output tables;
run;

[/pre]

You'll hopefully learn a few things by tweaking this code to suit your exact needs.
N/A
Posts: 0

Re: combined dataset with variance from different files

Posted in reply to deleted_user
@ Chris, thanks for your reply, the tweaking helped indeed!

I tried your solution but the problem with my real data set was that I do not know the names of the columns and I did not want them to store in one table since it would become to big. So what i did was that I derived variances in the separate files, create an empty table with all the column names by using sashelp.vcolumn and then use proc append:

data RR1;
input CODE A1 A2 A3;
cards;
1 1 0.2 0.4
1 0.8 0.2 0.4
1 0.8 0.2 0.3
1 0.9 0.2 0.4
data RR2;
input CODE A1 A2 A3 A4;
cards;
2 1 0.2 0.4 0.2
2 0.8 0.1 0.4 0.1
2 0.9 0.2 0.4 0.1
2 0.9 0.2 0.4 0.1
data RR3;
input CODE A2 A3 A5;
cards;
3 0.2 0.4 0.5
3 0.2 0.3 0.6
3 0.3 0.3 0.5
3 0.2 0.4 0.5
run;

%macro var(number);
%do i=1 %to &number;

proc summary data=work.RR&i nway noprint; /*derive variance for all A, for each RR*/
class CODE;
var A:;
output out=work.RR&i.var(drop=_TYPE_ _FREQ_) var= /autoname;
run;

%end;
%mend var;
%var (3);
run;

proc sql; /*get a list of all collumns in the different tables*/
select name into :list
separated by ' '
from sashelp.vcolumn
where libname='WORK' and name not like 'A_';
quit;

data VAR; /*create grand table with all the collumn names*/
set work.RR1var;
input &list;
run;


%macro all(numb); /*append all variances into the grand table*/
%do i=1 %to &numb;

proc append base=work.VAR data=work.RR&i.var force; /*use force for non matching collumns*/
run;

%end;
%mend all;
%all(3);
run;


The only problem I still have to solve is how to get squared deviations, so sd=variance*df/mean. (besides, in my real data set the FREG did not work because there were blanks but that is easy to solve by getting N from the proc summary).

Does anyone have a suggestion or comment on my code? Message was edited by: H. Klein
N/A
Posts: 0

Re: combined dataset with variance from different files

Posted in reply to deleted_user
let me explain my problem:

The columns in my real data set look like BROAN_00000041_00056646 and in the table obtained with proc summary column names like BROAN_00000041_00056646_VAR (and the same with MEAN and N in stead of VAR).

The code from the solution given by Chris adjusted to my situation looks like this

data bb1sd
set work.bb1var;
array BROAN (*) BROAN:;
do i=1 to dim(BROAN);
BROAN(i)=input(VVALUEX('BROAN'||compress(put(I,1.))||'_VAR'),32.)
*(input(VVALUEX('BROAN'||compress(put(I,1.))||'_N'),32.)-1)
/input(VVALUEX('BROAN'||compress(put(I,1.))||'_MEAN'),32.);
end;
output bb1sd;
keep CODE BROAN ;
run;

in the log the following notes appear:

argument to function VValuex is not a known variable name:
invalid argument to vvaluex
invalid numeric data, _N_ etc

The problem I think is in the array, but the column names and thus the VAR, MEAN and N column names are unknown but of the form mentioned above, so I do not see another way to get them into the SD=VAR*(N-1)/MEAN formula.

Another problem I can think of is how to name them in the input/put statement since it is an array.

In brief:
I just do not know how to perform the SD=VAR*(N-1)/MEAN formula where the column names have the form of BROAN_00000041_00056646_VAR (and the same with MEAN and N in stead of VAR).

After searching Help in SAS, the sas site and google I am getting a bit discouraged so, does anybody have an idea?
N/A
Posts: 0

Re: combined dataset with variance from different files

Posted in reply to deleted_user
problem solved:
(example on table S4TIJDELIJK_5GROUP)

proc sql;
select name into :list
separated by ' '
from sashelp.vcolumn
where libname='WORK' and memname='S4TIJDELIJK_5GROUP' and name like 'BROAN___________________Var';
quit;

proc sql;
select name into :listn
separated by ' '
from sashelp.vcolumn
where libname='WORK' and memname='S4TIJDELIJK_5GROUP' and name like 'BROAN___________________N';
quit;

proc sql;
select name into :listm
separated by ' '
from sashelp.vcolumn
where libname='WORK' and memname='S4TIJDELIJK_5GROUP' and name like 'BROAN___________________Mean';
quit;

data s4_5group;
set work.s4tijdelijk_5group;
array BROAN {*} &list;
do i=1 to dim(BROAN);
array BROANN {*} &listn;
do i=1 to dim(BROANN);
array BROANM {*} &listm;
do i=1 to dim(BROANM);
BROAN{i}=BROAN{i}*(BROANN{i}-1)/BROANM{i};
end;
end;
end;
keep CUSIP
&list;
run;



However, in the where memname= statement I like to invoke a macro to loop trough different tables. The problem is that memname='S4TIJDELIJK_&i.GROUP' does not work because SAS does not automaticly replace &i with a number because of the quotation marks, but these quotation marks are needed in a sashelp.vcolumn query.

Does anybody know how to invoke &i when the argument is between quotation marks? Message was edited by: H. Klein
Super Contributor
Super Contributor
Posts: 3,174

Re: combined dataset with variance from different files

Posted in reply to deleted_user
Instead, use double-quotation marks so the macro variable is resolved.

Scott Barry
SBBWorks, Inc.
N/A
Posts: 0

Re: combined dataset with variance from different files

Posted in reply to deleted_user
@scott barry

It worked, thanks
Ask a Question
Discussion stats
  • 6 replies
  • 134 views
  • 0 likes
  • 3 in conversation