BookmarkSubscribeRSS Feed
juliajulia
Obsidian | Level 7

i have a dataset with about 80 variables and half million records. i want to do a frequency check on each variable and compare the frequencies of each variable with the previous year's frequencies to see if there are any abnormal numbers with the previous year. i used the proc freq statement to get all the variables' frequencies and then i did the same thing to get the previous years frequencies. then i copy/paste both results to an excel file and compare them one by one. it is time consuming. so i am wondering if anybody know an easy way to do this task in sas. Example of the sas code that i used as following:

 

proc freq data=data2019;
table age diagage birthyear race raceeth hisp gender housing povertylevel
povlevpercent insurance risk................./missing;
run;

 

proc freq data=data2020;
table age diagage birthyear race raceeth hisp gender housing povertylevel
povlevpercent insurance risk................./missing;
run;

 

5 REPLIES 5
tarheel13
Rhodochrosite | Level 12
How about you just output the datasets from proc freq and do proc compare?
Reeza
Super User

1. Combine your data sets into one, with a Year to identify which records come from which data set.

 

data combined;
set data2019 (in=in2019) data2020 (in=in2020);
if in2019 then year=2019;
else year=2020;
run;

2. Run a PROC FREQ by year for each variable and save it to a data set

proc freq data=combined noprint;
table year*(age diagage birthyear race raceeth hisp gender housing ....) / missing out=want;
ods output onewayFreqs=want2
run;

3. Transpose (PROC TRANSPOSE) so you have variable counts and each year, ie: (could also do this via PROC TABULATE but only for display purposes)

Variable   2019 2020
Age
<5              5        7
5 to 10        8       10
10 to 15      10     12
....
...
...

Fully worked example:

data combined;
set sashelp.heart (in=in2019) sashelp.heart (in=in2020 obs=140);
if in2019 then year=2019;
else year=2020;
run;

ods select none;
proc freq data=combined ;
table year*(ageAtStart BP_STATUS ) / missprint ;
ods output crossTabFreqs=want;
run;
ods select all;

data long;
	length variable $32. variable_value $50.;
	set want;
	where not missing(year);
	Variable=scan(table, 2, '*');
	Variable_Value=strip(trim(vvaluex(variable)));
	presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
	if missing(variable_value) or variable_value = "." then variable_value = "MISSING";
	keep year variable  variable_value frequency percent presentation;
	label variable='Variable' variable_value='Variable Value';
run;

proc sort data=long;
by variable variable_value;
run;

proc transpose data=long out=wide (drop = _:);
by variable variable_value;
id year;
var frequency;
run;

@juliajulia wrote:

i have a dataset with about 80 variables and half million records. i want to do a frequency check on each variable and compare the frequencies of each variable with the previous year's frequencies to see if there are any abnormal numbers with the previous year. i used the proc freq statement to get all the variables' frequencies and then i did the same thing to get the previous years frequencies. then i copy/paste both results to an excel file and compare them one by one. it is time consuming. so i am wondering if anybody know an easy way to do this task in sas. Example of the sas code that i used as following:

 

proc freq data=data2019;
table age diagage birthyear race raceeth hisp gender housing povertylevel
povlevpercent insurance risk................./missing;
run;

 

proc freq data=data2020;
table age diagage birthyear race raceeth hisp gender housing povertylevel
povlevpercent insurance risk................./missing;
run;

 


 

Reeza
Super User
Re-thinking, there's probably a simpler way to do this via PROC REPORT.....with a COMPUTE and ACROSS statement once the data is combined.
ballardw
Super User

You may need think some of this through a bit more.

For instance, one of your variables is Birthyear. I would  be very surprised if a data set named Data2019 had any 2020 values while there could be some in the Data2020.

 

What sort of value is PovertyLevel? If it is one of the cut off limits for a specific categorical range of the Federal Poverty Guidelines then the boundary values change every year.

 

Personally I would probably start by combining the two data sets making sure there is a variable that captures which set a particular record is from. Then use that variable such as

 

proc freq data=Combined;
table ( age diagage birthyear race raceeth hisp gender housing povertylevel
povlevpercent insurance risk.................) * datayear /missing norowpct nopct;
run;

This will create data with the year as a column heading and you can see things next to each other.

 

It would likely be helpful if you have any variables that are continuous such as Height, Weight, Income, (and likely ages as well) to create groups, such as 5-year or 10-year age ranges, 10 or 20 pound/ kg weight bands, or BMI categories.

juliajulia
Obsidian | Level 7

Thank you very much. I only compare the distribution of the variables but not the values of the variables because the values are different from year to year. So i think i just combine the two years datasets and use the year as a crosstab to check the frequency. It seems served my purpose. 

Table of race/ethnicity by year
(Race/Ethnicity) year
2018 2019 Total
.
3332
0.62
5685
1.00
9017
 
American Indian/Alaska Native
2681
0.50
3070
0.54
5751
 
Asian
7592
1.42
8259
1.45
15851
 
Black/African American
249832
46.81
261734
46.10
511566
 
Hispanic/Latino
122848
23.02
131225
23.11
254073
 
Native Hawaiian/Pacific Islander
924
0.17
1058
0.19
1982
 
White
139667
26.17
149804
26.38
289471
 
Multiple races
6882
1.29
6968
1.23
13850
 
Total
533758
567803
1101561

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1660 views
  • 3 likes
  • 4 in conversation