Hello,
We receive daily data. The daily data for each day contain missing for the variables alc, odn, accounting that we have to send back to the customer to validate and populate the missing. We want to look at each daily data and see the trend of missing from each day for those variables. The reason we want to do is to look at the trend to know the differences and see how we can improve the process to get less missing. I am looking for some statistical procedures that shows the daily trend.
Thank you!
Mauri
well, that's not really what I wanted, but here is some sample data that I invented. I suggest you use PROC MEANS NMISS and perform aBY-group analysis of the missing values for each day. You can then generate a report or (even better) create a graph of the daily activity.
data Have;
input daily_batch alc odn accounting;
datalines;
1 . . 1
1 . . 7
1 . 2 1
1 . 3 .
1 . . 8
2 4 . 1
2 0 . 7
2 . 2 1
2 . 3 .
2 . 4 8
3 0 . .
3 . 0 7
3 2 2 1
3 . 3 2
3 . . 2
;
proc means data=Have NMISS noprint;
by daily_batch;
var alc odn accounting;
output out=Want NMISS=;
run;
proc print data=Want;
var daily_batch alc odn accounting;
run;
proc sgplot data=Want;
series x=daily_batch y=alc / markers curvelabel;
series x=daily_batch y=odn / markers curvelabel;
series x=daily_batch y=accounting / markers curvelabel;
run;
Please provide samples of the data and tell us what results you would expect from the example data. Probably the best would be a single data set that has a DAY variable with values 1, 2, 3, ....
Thank you for the quick reply. The daily sas dataset has 50,000 obs and 76 vars. One of the variables is called batch_id(20190518-0000000001). The odn, ipac, and accounting fields contain a lot of missing. We what to look at the trend of missing on a daily basis from these daily files. Please see 10 observation of the dataset.
It would be nice to see a graphical view as well.
well, that's not really what I wanted, but here is some sample data that I invented. I suggest you use PROC MEANS NMISS and perform aBY-group analysis of the missing values for each day. You can then generate a report or (even better) create a graph of the daily activity.
data Have;
input daily_batch alc odn accounting;
datalines;
1 . . 1
1 . . 7
1 . 2 1
1 . 3 .
1 . . 8
2 4 . 1
2 0 . 7
2 . 2 1
2 . 3 .
2 . 4 8
3 0 . .
3 . 0 7
3 2 2 1
3 . 3 2
3 . . 2
;
proc means data=Have NMISS noprint;
by daily_batch;
var alc odn accounting;
output out=Want NMISS=;
run;
proc print data=Want;
var daily_batch alc odn accounting;
run;
proc sgplot data=Want;
series x=daily_batch y=alc / markers curvelabel;
series x=daily_batch y=odn / markers curvelabel;
series x=daily_batch y=accounting / markers curvelabel;
run;
Sorry. My mistake. The alc, odn, and accounting are character variables and not numeric.
Okay, so convert the character variables to a numeric indicator variable and then use my original solution.
data Two / view=Two;
set Have(rename=(alc=char_alc odn=char_odn accounting=char_accounting));
alc = ifn(char_alc=" ", ., 1);
odn = ifn(char_odn=" ", ., 1);
accounting = ifn(char_accounting=" ", ., 1);
run;
proc means data=Two NMISS noprint;
by daily_batch;
var alc odn accounting;
output out=Want NMISS=;
run;
proc print data=Want;
var daily_batch alc odn accounting;
run;
proc sgplot data=Want;
series x=daily_batch y=alc / markers curvelabel;
series x=daily_batch y=odn / markers curvelabel;
series x=daily_batch y=accounting / markers curvelabel;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.