BookmarkSubscribeRSS Feed
Shinypants
Fluorite | Level 6

Hi,

 

 I'm working with a dataset that looks similar to the following:

 

Construction_date     Material

1991                           Cement

1972                           Wood

1991                           Cement

1991                          

2002                           Wood

1983                           Wood

1991                           Wood

 

I would like to impute missing material data through the following condition:

 

if there are more than one identical construction_date and the material from at least two of those dates match, then make it so that all other identical missing construction_date have the same material.

 

From the little I know, I would first have to sort my construction_date and material variables, and then I get lost from that point on.

 

proc sort data=data;
by construction_date material;
run;

 

Note that I am a very new to coding language, and this is a large dataset.

 

Would appreciate any and all help!

9 REPLIES 9
PGStats
Opal | Level 21

What's the imputed material here?

 

Construction_date     Material

1991                           Cement

1972                           Wood

1991                           Cement

1991                          ?

2002                           Wood

1983                           Wood

1991                           Wood

1991                           Wood

PG
Shmuel
Garnet | Level 18

 

your data contains 4 records with constructionb_date = 1991:

   2 of then with material = cement, 1 with wood and one missing value.

 

what output and how many records do you expect for this date 1991 ? 

 

Shinypants
Fluorite | Level 6

My dataset contains over 10,000 dates, and so there will be a variety of date repititions. 1991 will be one of the many repeating dates, and I'm hoping to fill in the missing material variable with the most common material for that construction_date.

 

Does that answer your question?

Shmuel
Garnet | Level 18

what will be your preferences if you have N times material A and same number of times material B -

what would you put to replace missing values?

 

the situation needs analyze before defining the algoritm to bo done by SAS

Shinypants
Fluorite | Level 6

Good question! In that case I would just leave the missing as missing.

Shmuel
Garnet | Level 18

in such case you need:

1) count frequency of each material per construction_date

2) select those dates with one most common material

3) update your data where material is missing using 2ns step output

 

to count frequencies you may use severeal methods:

either count by SQL, group by construction_date and material

or use proc FREQ with OUTPUT statement.

or use data step and retain counters

 

Would you like to struggle it by yourself? 

 

Shinypants
Fluorite | Level 6

I would love to have some immediate help! 

 

Thank you!

Shinypants
Fluorite | Level 6

I do have at least a thousand dates though, and was hoping for a more automated process if possible?

Shmuel
Garnet | Level 18

run the attached program and follow the steps, as I designed previously.

 

try to understand each step what it is doing, how and why.

 

I hope you will be able to addapt it to your needs.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1223 views
  • 1 like
  • 3 in conversation