I develop on a post from Rick Wicklin's Do Loop blog.
https://blogs.sas.com/content/iml/2017/11/29/visualize-patterns-missing-values.html .
I want to show the missvar value (comma separated list of variables that form a group due to shared missing values) in the chart right next to the barchart.
I can use the tip option to show their value while 'mousehovering'.
But as my grapgh goes to a pdf destination this is not a valid choice because as far as I know the pdf has no tip option.
I've tried to overlay a text graph. The values are shown but then barchart disappers.
data WORK.MISS2;
infile datalines dsd truncover;
input Group:4. diff_kms_Miss:$1. diff_days_Miss:$1. acum_numfac_Miss:$1. incurred_costs_Miss:$1. total_kms_est_Miss:$1. MESESBUNDLE_Miss:$1. MESESLONG_Miss:$1. ANTIGMESES_Miss:$1. KMULTPTA_Miss:$1. NUMCUOTAS_Miss:$1. SUM_IMPMENSUALCLI_Miss:$1. age_starting_longdrive_Miss:$1. KM_CONTRATO_Miss:$1. KM_VO_Miss:$1. RESTO_MESES_Miss:$1. SUBEST_OP_Miss:$1. power_in_kw_Miss:$1. cnt_facturas_Miss:$1. max_diff_kms_Miss:$1. result_Miss:$1. result_t0_Miss:$1. _age_contract_Miss:$1. _age_car_Miss:$1. deviation_kms_Miss:$1. last_entry_Miss:$1. months_paying_since_payment_Miss:$1. _deviation_kms_Miss:$1. result_x_10000kms_Miss:$1. result_x_12months_Miss:$1. Freq:8. Percent:8.2 Pattern:$29. missvar:$500. NumMiss:32. st:32. i:32. pos:32.;
datalines4;
1,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,294911,66.83,00000000000000000000000000000,,0,1,,
2,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,.,2246,0.51,00000000000000000000000000001,result_x_12months_Miss,1,29,2,29
3,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,.,X,32,0.01,00000000000000000000000000010,result_x_10000kms_Miss,1,28,2,28
4,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,.,.,43,0.01,00000000000000000000000000011,"result_x_10000kms_Miss, result_x_12months_Miss",2,29,3,29
5,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,.,X,X,X,X,X,X,9862,2.23,00000000000000000000001000000,_age_car_Miss,1,23,2,23
6,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,.,X,X,X,X,X,.,152,0.03,00000000000000000000001000001,"_age_car_Miss, result_x_12months_Miss",2,29,3,29
7,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,.,X,X,X,X,.,X,8,0.00,00000000000000000000001000010,"_age_car_Miss, result_x_10000kms_Miss",2,28,3,28
8,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,.,X,X,X,X,.,.,15,0.00,00000000000000000000001000011,"_age_car_Miss, result_x_10000kms_Miss, result_x_12months_Miss",3,29,4,29
9,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,.,X,X,X,X,X,X,X,X,X,X,X,X,99940,22.65,00000000000000001000000000000,power_in_kw_Miss,1,17,2,17
10,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,.,X,X,X,X,X,X,X,X,X,X,X,.,1120,0.25,00000000000000001000000000001,"power_in_kw_Miss, result_x_12months_Miss",2,29,3,29
11,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,.,X,X,X,X,X,X,X,X,X,X,.,X,16,0.00,00000000000000001000000000010,"power_in_kw_Miss, result_x_10000kms_Miss",2,28,3,28
12,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,.,X,X,X,X,X,X,X,X,X,X,.,.,44,0.01,00000000000000001000000000011,"power_in_kw_Miss, result_x_10000kms_Miss, result_x_12months_Miss",3,29,4,29
13,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,.,X,X,X,X,X,.,X,X,X,X,X,X,32428,7.35,00000000000000001000001000000,"power_in_kw_Miss, _age_car_Miss",2,23,3,23
14,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,.,X,X,X,X,X,.,X,X,X,X,X,.,402,0.09,00000000000000001000001000001,"power_in_kw_Miss, _age_car_Miss, result_x_12months_Miss",3,29,4,29
15,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,.,X,X,X,X,X,.,X,X,X,X,.,X,19,0.00,00000000000000001000001000010,"power_in_kw_Miss, _age_car_Miss, result_x_10000kms_Miss",3,28,4,28
16,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,.,X,X,X,X,X,.,X,X,X,X,.,.,61,0.01,00000000000000001000001000011,"power_in_kw_Miss, _age_car_Miss, result_x_10000kms_Miss, result_x_12months_Miss",4,29,5,29
17,X,X,X,X,.,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,.,X,X,.,.,X,3,0.00,00001000000000000000000100110,"total_kms_est_Miss, deviation_kms_Miss, _deviation_kms_Miss, result_x_10000kms_Miss",4,28,5,28
18,X,X,X,X,.,X,X,X,X,X,X,X,X,X,X,X,.,X,X,X,X,X,X,.,X,X,.,.,X,1,0.00,00001000000000001000000100110,"total_kms_est_Miss, power_in_kw_Miss, deviation_kms_Miss, _deviation_kms_Miss, result_x_10000kms_Miss",5,28,6,28
19,X,X,X,X,.,X,X,X,X,X,X,X,X,X,X,X,.,X,X,X,X,X,.,.,X,X,.,.,X,2,0.00,00001000000000001000001100110,"total_kms_est_Miss, power_in_kw_Miss, _age_car_Miss, deviation_kms_Miss, _deviation_kms_Miss, result_x_10000kms_Miss",6,28,7,28
;;;;
title "Pattern of Missing Values";
ods graphics / imagemap noborder;
proc sgplot data=Miss2;
hbarparm category=Pattern response=Freq / tip=(missvar) dataskin=sheen fillattrs=(color=bigb) datalabel
baseline=1;
yaxistable NumMiss / valuejustify=left label="Num Miss"
valueattrs=GraphValueText labelattrs=GraphLabelText;
yaxis labelposition=top;
xaxis grid type=log logbase=10 label="Frequency (log10 scale)";
run;
title;
@acordes wrote:
The binary representation of the values pretends to avoid exactly this.
You might explain what that is pretending to avoid. A 29 character label is 29 characters. If each position with a one is supposed to correspond to a variable name (?) then it is only useful when it is obvious what the position means and in this case it really doesn't as there are too many values to internalize that quickly for readers. So I submit that you can gain space by dropping the pattern and using the label text as the category with the SPLITCHAR option on the yaxis options. You may want to specify a different character, and include in the category value, than comma so you get better control of how long the text is and how many rows appear.
What exactly are you attempting to show with the graph, as in the main idea? I am seeing variables that have units such as Months, KW and Km in the names. Perhaps you could gain some clarity/control by grouping some of those. Maybe Sgpanel and Panelby.
I might also think about making the text of the variable names mixed case to read nicer.
I'm getting closer using a yaxistable.
But I can only get a good result when I use 40cm width which is by far to wide for my pdf portrait output.
If I shrink to 17cm then the bars are collapsed.
And I would like to split the missvar values by "," but I can't find the option. Can escapechar technique help out?
title "Pattern of Missing Values";
ods graphics / imagemap noborder width=40cm;
proc sgplot data=Miss2;
hbarparm category=Pattern response=Freq / tip=(missvar) dataskin=sheen fillattrs=(color=bigb) datalabel
baseline=1;
yaxistable NumMiss / valuejustify=left label="Num Miss"
valueattrs=GraphValueText labelattrs=GraphLabelText;
yaxistable missvar;
yaxis labelposition=top;
xaxis grid type=log logbase=10 label="Frequency (log10 scale)" offsetmax=0.4 offsetmin=0.01;
run;
title;
I think that my design is faulted.
It's too much text for placing into that chart.
The binary representation of the values pretends to avoid exactly this.
Playing around with hbarbasic I actually can use the text graph with split option.
But it looks weird.
@acordes wrote:
The binary representation of the values pretends to avoid exactly this.
You might explain what that is pretending to avoid. A 29 character label is 29 characters. If each position with a one is supposed to correspond to a variable name (?) then it is only useful when it is obvious what the position means and in this case it really doesn't as there are too many values to internalize that quickly for readers. So I submit that you can gain space by dropping the pattern and using the label text as the category with the SPLITCHAR option on the yaxis options. You may want to specify a different character, and include in the category value, than comma so you get better control of how long the text is and how many rows appear.
What exactly are you attempting to show with the graph, as in the main idea? I am seeing variables that have units such as Months, KW and Km in the names. Perhaps you could gain some clarity/control by grouping some of those. Maybe Sgpanel and Panelby.
I might also think about making the text of the variable names mixed case to read nicer.
You're guessing right, the 1's belong to the var that has missing values and the patterns are the different combinations between variables that display missing values. This comes from PROC MI.
I could replace the binary string by the missing variable names, stored as a comma separated string in the missvar variable.
As I had written before, I like most the TIP option but for pdf it's not available.
And I'm doing the whole stuff as part of an analytical project report which acts as a container for the project's mission, collected data, ETL, data quality, modelling, scoring.
The result output to pdf is quite ok, now I'm into solving the minor problems that persist.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.