BookmarkSubscribeRSS Feed
Walternate
Obsidian | Level 7

Hi,

 

I have 3 data files and I want to look at the distribution of a set of variables. This is a set of 15 character variables that exists across all 3 files with the same names and the same set of 20 possible values. 

 

To do this by hand, I'd have to run a freq on the set of variables within each of the 3 files and check all 45 distributions by hand. 

 

I'm hoping there's a more efficient way to approach this, but I'm not sure what it is. The fewer outputs that need to be checked, the better!

 

Thanks!

3 REPLIES 3
Reeza
Super User

Combine them via a view and then run PROC FREQ on the one table.

 

data have /view=have;
set t1 (keep = charVariable) t2 (keep = charVariable) t3 (keep = charVariable) indsname = source;
dsn = source;
run;

proc freq data=have;
table charVariable*dsn;
run;

@Walternate wrote:

Hi,

 

I have 3 data files and I want to look at the distribution of a set of variables. This is a set of 15 character variables that exists across all 3 files with the same names and the same set of 20 possible values. 

 

To do this by hand, I'd have to run a freq on the set of variables within each of the 3 files and check all 45 distributions by hand. 

 

I'm hoping there's a more efficient way to approach this, but I'm not sure what it is. The fewer outputs that need to be checked, the better!

 

Thanks!


 

PGStats
Opal | Level 21

Building on @Reeza's proposal, I would suggest:

 

%let charVariables=var1 var2 var3 var4; /* Adjust list */

data have /view=have;
set t1 t2 t3 indsname = source;
dsn = source;
run;

proc freq data = have;
by dsn notsorted;
table &charVariables. / out=freqs outpct;
run;
PG
ballardw
Super User

It may help to describe the question(s) you have about your distribution(s).

And about how may records are involved. If you have 20 possible values and only 100 records there probably isn't much to find.

 

And with the suggested Proc Freq solutions I would likely add a Chisq tables option.

From @PGStats 

 

proc freq data = have;
table   dsn *( &charVariables. ) / chisq;
run;

Which will give you statistical test of distribution similarity for each variable compared with the data source.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 557 views
  • 0 likes
  • 4 in conversation