08-23-2016 01:35 PM
I'm working with a dataset that looks similar to the following:
I would like to impute missing material data through the following condition:
if there are more than one identical construction_date and the material from at least two of those dates match, then make it so that all other identical missing construction_date have the same material.
From the little I know, I would first have to sort my construction_date and material variables, and then I get lost from that point on.
proc sort data=data;
by construction_date material;
Note that I am a very new to coding language, and this is a large dataset.
Would appreciate any and all help!
08-23-2016 02:58 PM
your data contains 4 records with constructionb_date = 1991:
2 of then with material = cement, 1 with wood and one missing value.
what output and how many records do you expect for this date 1991 ?
08-23-2016 03:09 PM
My dataset contains over 10,000 dates, and so there will be a variety of date repititions. 1991 will be one of the many repeating dates, and I'm hoping to fill in the missing material variable with the most common material for that construction_date.
Does that answer your question?
08-23-2016 03:15 PM
what will be your preferences if you have N times material A and same number of times material B -
what would you put to replace missing values?
the situation needs analyze before defining the algoritm to bo done by SAS
08-23-2016 03:37 PM
in such case you need:
1) count frequency of each material per construction_date
2) select those dates with one most common material
3) update your data where material is missing using 2ns step output
to count frequencies you may use severeal methods:
either count by SQL, group by construction_date and material
or use proc FREQ with OUTPUT statement.
or use data step and retain counters
Would you like to struggle it by yourself?
08-23-2016 05:15 PM