topic Seeking a more efficient way of looking at distribution for multiple character vars w same values in SAS Procedures

Seeking a more efficient way of looking at distribution for multiple character vars w same values

Walternate — Fri, 11 Oct 2019 19:56:59 GMT

Hi,

I have 3 data files and I want to look at the distribution of a set of variables. This is a set of 15 character variables that exists across all 3 files with the same names and the same set of 20 possible values.

To do this by hand, I'd have to run a freq on the set of variables within each of the 3 files and check all 45 distributions by hand.

I'm hoping there's a more efficient way to approach this, but I'm not sure what it is. The fewer outputs that need to be checked, the better!

Thanks!

Re: Seeking a more efficient way of looking at distribution for multiple character vars w same value

Reeza — Fri, 11 Oct 2019 19:58:41 GMT

Combine them via a view and then run PROC FREQ on the one table.

data have /view=have;
set t1 (keep = charVariable) t2 (keep = charVariable) t3 (keep = charVariable) indsname = source;
dsn = source;
run;

proc freq data=have;
table charVariable*dsn;
run;

@Walternate wrote:

Hi,

I have 3 data files and I want to look at the distribution of a set of variables. This is a set of 15 character variables that exists across all 3 files with the same names and the same set of 20 possible values.

To do this by hand, I'd have to run a freq on the set of variables within each of the 3 files and check all 45 distributions by hand.

I'm hoping there's a more efficient way to approach this, but I'm not sure what it is. The fewer outputs that need to be checked, the better!

Thanks!

Re: Seeking a more efficient way of looking at distribution for multiple character vars w same value

PGStats — Fri, 11 Oct 2019 20:56:39 GMT

Building on @Reeza's proposal, I would suggest:

%let charVariables=var1 var2 var3 var4; /* Adjust list */

data have /view=have;
set t1 t2 t3 indsname = source;
dsn = source;
run;

proc freq data = have;
by dsn notsorted;
table &charVariables. / out=freqs outpct;
run;

Re: Seeking a more efficient way of looking at distribution for multiple character vars w same value

ballardw — Fri, 11 Oct 2019 23:17:30 GMT

It may help to describe the question(s) you have about your distribution(s).

And about how may records are involved. If you have 20 possible values and only 100 records there probably isn't much to find.

And with the suggested Proc Freq solutions I would likely add a Chisq tables option.

From @PGStats

proc freq data = have;
table   dsn *( &charVariables. ) / chisq;
run;

Which will give you statistical test of distribution similarity for each variable compared with the data source.