BookmarkSubscribeRSS Feed
Walternate
Obsidian | Level 7

Hi,

 

I have 3 data files and I want to look at the distribution of a set of variables. This is a set of 15 character variables that exists across all 3 files with the same names and the same set of 20 possible values. 

 

To do this by hand, I'd have to run a freq on the set of variables within each of the 3 files and check all 45 distributions by hand. 

 

I'm hoping there's a more efficient way to approach this, but I'm not sure what it is. The fewer outputs that need to be checked, the better!

 

Thanks!

3 REPLIES 3
Reeza
Super User

Combine them via a view and then run PROC FREQ on the one table.

 

data have /view=have;
set t1 (keep = charVariable) t2 (keep = charVariable) t3 (keep = charVariable) indsname = source;
dsn = source;
run;

proc freq data=have;
table charVariable*dsn;
run;

@Walternate wrote:

Hi,

 

I have 3 data files and I want to look at the distribution of a set of variables. This is a set of 15 character variables that exists across all 3 files with the same names and the same set of 20 possible values. 

 

To do this by hand, I'd have to run a freq on the set of variables within each of the 3 files and check all 45 distributions by hand. 

 

I'm hoping there's a more efficient way to approach this, but I'm not sure what it is. The fewer outputs that need to be checked, the better!

 

Thanks!


 

PGStats
Opal | Level 21

Building on @Reeza's proposal, I would suggest:

 

%let charVariables=var1 var2 var3 var4; /* Adjust list */

data have /view=have;
set t1 t2 t3 indsname = source;
dsn = source;
run;

proc freq data = have;
by dsn notsorted;
table &charVariables. / out=freqs outpct;
run;
PG
ballardw
Super User

It may help to describe the question(s) you have about your distribution(s).

And about how may records are involved. If you have 20 possible values and only 100 records there probably isn't much to find.

 

And with the suggested Proc Freq solutions I would likely add a Chisq tables option.

From @PGStats 

 

proc freq data = have;
table   dsn *( &charVariables. ) / chisq;
run;

Which will give you statistical test of distribution similarity for each variable compared with the data source.

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 264 views
  • 0 likes
  • 4 in conversation