Dears,
I have a big data set containing more than 100 variables, am trying to have a statistical table representing Each Variable (Blank cells count and Blank cells percentage and an amount(another variable in the data set) that reflect the amount of blank cells of each variable.
Data Sample
Socks | T-shirts | Shirts | Cost |
Large | 20 | ||
X large | Small | 40 | |
Small | Large | 60 |
Data output desired :
Variables | Blanks Count | Blanks % | Amount representing blank data | Amount % |
Socks | 1 | 33% | 40 | 33% |
T-shirts | 2 | 67% | 80 | 67% |
Shirts | 1 | 33% | 20 | 17% |
Either having it as a table representing all variables. Please not that i am a beginner on SAS enterprise guide and am not good with coding. my plan was to do it by the query builder or by statistics tables.
Please read https://communities.sas.com/t5/SAS-Support-Communities/How-to-ask-a-good-question-and-receive-a-fast... and help us so that we can help you.
Sorry for the inconveniences. I have updated the question for better understanding.
Thank you for your reply.
/*This program creates a report with the number and percent of
missing data for each variable in the data set.
The ony change should be to the macro variable, INPUT_DSN.
Author: F. Khurshed
Date: 2019-01-04
*/
*create sample data to work with;
data class;
set sashelp.class;
if age=14 then
call missing(height, weight, sex);
if name='Alfred' then
call missing(sex, age, height);
label age="Fancy Age Label";
run;
*set input data set name;
%let INPUT_DSN = class;
%let OUTPUT_DSN = want;
*create format for missing;
proc format;
value $ missfmt ' '="Missing" other="Not Missing";
value nmissfmt .="Missing" other="Not Missing";
run;
*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;
proc freq data=&INPUT_DSN.;
table _all_ / missing;
format _numeric_ nmissfmt. _character_ $missfmt.;
run;
ods select all;
*Format output;
data long;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
keep variable variable_value frequency percent cum: presentation;
label variable='Variable' variable_value='Variable Value';
run;
proc sort data=long;
by variable;
run;
*make it a wide data set for presentation, with values as N (Percent);
proc transpose data=long out=wide_presentation (drop=_name_);
by variable;
id variable_value;
var presentation;
run;
*transpose only N;
proc transpose data=long out=wide_N prefix=N_;
by variable;
id variable_value;
var frequency;
run;
*transpose only percents;
proc transpose data=long out=wide_PCT prefix=PCT_;
by variable;
id variable_value;
var percent;
run;
*final output file;
data &Output_DSN.;
merge wide_N wide_PCT wide_presentation;
by variable;
drop _name_;
label N_Missing='# Missing' N_Not_Missing='# Not Missing'
PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing'
Not_missing='Not Missing';
run;
title "Missing Report of &INPUT_DSN.";
proc print data=&output_dsn. noobs label;
run;
@drawerder wrote:
Dears,
I have a big data set containing more than 100 variables, am trying to have a statistical table representing Each Variable (Blank cells count and Blank cells percentage and an amount(another variable in the data set) that reflect the amount of blank cells of each variable
Thanks in advance.
@drawerder wrote:
Dears,
I have a big data set containing more than 100 variables, am trying to have a statistical table representing Each Variable (Blank cells count and Blank cells percentage and an amount(another variable in the data set) that reflect the amount of blank cells of each variable
Thanks in advance.
Hint: Create a small data set with 3 or 4 variables such that you can determine what the results for that set should look like. Provide the data and the desired result.
Is the result supposed to be a report that people read or a data set used by other processes?
I updated the post, Kindly check it
Thank you for your comment.
Socks Column has 3 cells with (Large,Blank,Small) so there are 2 non blanks and 1 blank. the amount 40 is the cost associated with the blank.
Your code already has variables from the data (CLass), after i entered the code and changed data name. all summaries and tables are representing (Height, sex, Age).
If you want to do this in EG yourself, you need to create a custom format to show MISSING/NON-MISSING, as I did above (see the comments). You can do this via the GUI interface to create a user format.
Then you apply the formats to your data set - not sure how in EG you'll need to google that. One option would be to add it in via Query Builder or via the assign column properties options.
Then run a One Way Frequency task on it twice, once with all the variables you want summarized.
Then again, with using the AMOUNT as the Weight variable. Then merge the two results together.
EDIT: Search on here and someone posted an answer that would solve this is less steps by using a transpose task first and then doing a one way frequency on that instead. I don't have EG to test that though. You may also want to try the Characterize Data task, it may have it all done for you but I don't recall the options and don't have EG anymore to check.
Here's a code version. First run it exactly as posted and see the results that are generated. If that's what you want then change the section in red, specifically replace CLASS to be the variable name of interest and WANT with the name of the output data set you want.
Coding is always more efficient, since I've now written this program in a generic fashion, the next time I need to do this I change the input data set name and it works 🙂 More likely though, I can copy that first section which is a useful program for me. There are other ways but this works quite well.
data class; set sashelp.class; if age=14 then call missing(height, weight, sex); if name='Alfred' then call missing(sex, age, height); label age="Fancy Age Label"; amount=rand('integer', 10, 100); run; *set input data set name; %let INPUT_DSN = class; *change these lines only - input data set name; %let OUTPUT_DSN = want; *output data set name; *create format for missing; proc format; value $ missfmt ' '="Missing" other="Not Missing"; value nmissfmt .="Missing" other="Not Missing"; run; *Proc freq to count missing/non missing; ods select none; *turns off the output so the results do not get too messy; ods table onewayfreqs=temp1; proc freq data=&INPUT_DSN.; table _all_ / missing out=temp_Check; format _numeric_ nmissfmt. _character_ $missfmt.; run; *clean up output to reduce clutter; data long1; length variable $32. variable_value $50.; set temp1; *find name of variable being summarized; Variable=scan(table, 2); *getting value from correct variable - using variable name as lookup retrieved via VVALUEX() function; *note this returns a character value; Variable_Value=strip(trim(vvaluex(variable))); *create a presentation variable that is N(##.#%) format; presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")"); *keep only variables of interest; keep variable variable_value frequency percent presentation; run; *sort for merging; proc sort data=long1; by variable; run; ods select all; *Proc freq to count missing/non missing; ods select none; *turns off the output so the results do not get too messy; ods table onewayfreqs=temp2; proc freq data=&INPUT_DSN.; table _all_ / missing; weight amount; format _numeric_ nmissfmt. _character_ $missfmt.; run; ods select all; *clean up output to reduce clutter; data long2; length variable $32. variable_value $50.; set temp2; *find name of variable being summarized; Variable=scan(table, 2); *getting value from correct variable - using variable name as lookup; *note this is a character value; Variable_Value=strip(trim(vvaluex(variable))); *create a presentation variable that is N(##.#%) format; presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")"); *keep only variables of interest; keep variable variable_value frequency percent presentation; run; *sort for merging; proc sort data=long2; by variable; run; data want; *merge but rename variables first to make them represent values you wanted; *note that if you merge without renaming it's problematic because the variables will overwrite each other; merge long1 (rename= (frequency = BlankCounts percent = BlankPct presentation=BlankPres)) long2 (rename= (frequency =AmountBlank percent =AmountBlankPct presentation=AmountBlankPres )); by variable variable_value; *commented out where clause that would limit it to missing only; where variable_value='Missing'; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.