Hi,
I have 3 data files and I want to look at the distribution of a set of variables. This is a set of 15 character variables that exists across all 3 files with the same names and the same set of 20 possible values.
To do this by hand, I'd have to run a freq on the set of variables within each of the 3 files and check all 45 distributions by hand.
I'm hoping there's a more efficient way to approach this, but I'm not sure what it is. The fewer outputs that need to be checked, the better!
Thanks!
Combine them via a view and then run PROC FREQ on the one table.
data have /view=have;
set t1 (keep = charVariable) t2 (keep = charVariable) t3 (keep = charVariable) indsname = source;
dsn = source;
run;
proc freq data=have;
table charVariable*dsn;
run;
@Walternate wrote:
Hi,
I have 3 data files and I want to look at the distribution of a set of variables. This is a set of 15 character variables that exists across all 3 files with the same names and the same set of 20 possible values.
To do this by hand, I'd have to run a freq on the set of variables within each of the 3 files and check all 45 distributions by hand.
I'm hoping there's a more efficient way to approach this, but I'm not sure what it is. The fewer outputs that need to be checked, the better!
Thanks!
Building on @Reeza's proposal, I would suggest:
%let charVariables=var1 var2 var3 var4; /* Adjust list */
data have /view=have;
set t1 t2 t3 indsname = source;
dsn = source;
run;
proc freq data = have;
by dsn notsorted;
table &charVariables. / out=freqs outpct;
run;
It may help to describe the question(s) you have about your distribution(s).
And about how may records are involved. If you have 20 possible values and only 100 records there probably isn't much to find.
And with the suggested Proc Freq solutions I would likely add a Chisq tables option.
From @PGStats
proc freq data = have; table dsn *( &charVariables. ) / chisq; run;
Which will give you statistical test of distribution similarity for each variable compared with the data source.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.