Hi Edited @novinosrin I am having trouble to retain the non-missing value in replace for the following missing value. The dataset with missings is as of the following: session_id name time
1 pd_confirm_resp_for_quest_adr_yes_ind_time 22.7 1 pd_confirm_next_time 23.7
1 pd_load_time 13.3 1 pd_confirm_lang_entry_ind . 1 pd_confirm_lang_exit_ind . 1 pd_confirm_close_browser_time 24
1 pd_login_resp_user_id_time 2.1
1 pd_login_login_time 12.2 1 pd_login_lang_entry_ind . 1 pd_login_lang_exit_ind . 1 pd_login_close_browser_time 36.8 1 pd_respondent_p_first_name_time 40.3 1 pd_respondent_p_middle_name_time 41.6 1 pd_respondent_p_last_name_time 35.8 1 pd_respondent_load_time 35.8 1 pd_respondent_lang_entry_ind . 1 pd_respondent_lang_exit_ind . 1 pd_respondent_closemodal_time 137.6 1 pd_respondent_close_browser_time 140 1 pd_respondent_backmodal_time 75 1 pd_verifyaddress_next_time 34.6 1 pd_verifyaddress_load_time 24.7 1 pd_verifyaddress_lang_entry_ind . 1 pd_verifyaddress_lang_exit_ind . 1 pd_verifyaddress_h_occ_yes_ind_time 33.7 1 pd_verifyaddress_close_browser_time 35.1 For each name contains string "_sessionstart_time" or "_load_time" or "_xxx_time" there will always be a time value, and I hope to use only "_sessionstart_time" or "_load_time" time values to fill in the missing values. If "_load_time" and "_sessionstart_time" exist for the same "yyy_xxx" entries (ex: "entry_login") then we will take "_sessionstart_time". Desired output: As a reminder, for simplicity, the sample dataset only has one session_id, but in reality, there are many session_id each defines a respondent. And names end with _ind is not the only factor that determines missing time value. Missing time value is determined by name not ending with _time. For example, names end with _test or _xxx other than _time have missing time values. Notice pd_login_lang_entry_ind and pd_login_lang_exit_ind should have missing time values because this group (login_lang) of names does not have _sessionstart_time or _load_time entries. session_id name time
1 pd_confirm_resp_for_quest_adr_yes_ind_time 22.7 1 pd_confirm_next_time 23.7
1 pd_confirm_load_time 13.3 1 pd_confirm_lang_entry_ind 13.3 1 pd_confirm_lang_exit_ind 13.3 1 pd_confirm_close_browser_time 24
1 pd_login_resp_user_id_time 2.1
1 pd_login_login_time 12.2 1 pd_login_lang_entry_ind . 1 pd_login_lang_exit_ind . 1 pd_login_close_browser_time 36.8 1 pd_respondent_p_first_name_time 40.3 1 pd_respondent_p_middle_name_time 41.6 1 pd_respondent_p_last_name_time 35.8 1 pd_respondent_load_time 35.8 1 pd_respondent_lang_entry_ind 35.8 1 pd_respondent_lang_exit_ind 35.8 1 pd_respondent_closemodal_time 137.6 1 pd_respondent_close_browser_time 140 1 pd_respondent_backmodal_time 75 1 pd_verifyaddress_next_time 34.6 1 pd_verifyaddress_load_time 24.7 1 pd_verifyaddress_lang_entry_ind 24.7 1 pd_verifyaddress_lang_exit_ind 24.7 1 pd_verifyaddress_h_occ_yes_ind_time 33.7 1 pd_verifyaddress_close_browser_time 35.1 The code I have so far does not produce the desired output, but the outcome is very close: data have;
set have;
t1 = scan(strip(name), 2, '_');
t3 = scan(strip(name), 3, '_');
run;
proc sort data = have; by session_id t1 descending t3;run;
data want;
drop temp;
set have;
by session_id t1 descending t3 value;
retain temp;
if first.session_id or first.t1 or first.t3 then temp=.;
if t3 = "load" or t3 = "sessionstart" and not missing(time) then temp=time;
else if missing(time) then time = temp;
run; Code output: session_id name time
1 pd_confirm_resp_for_quest_adr_yes_ind_time 22.7 1 pd_confirm_next_time 23.7
1 pd_confirm_load_time 13.3 1 pd_confirm_lang_entry_ind 13.3 1 pd_confirm_lang_exit_ind 13.3 1 pd_confirm_close_browser_time 24
1 pd_login_resp_user_id_time 2.1
1 pd_login_login_time 12.2 1 pd_login_lang_entry_ind 35.8 1 pd_login_lang_exit_ind 35.8 1 pd_login_close_browser_time 36.8 1 pd_respondent_p_first_name_time 40.3 1 pd_respondent_p_middle_name_time 41.6 1 pd_respondent_p_last_name_time 35.8 1 pd_respondent_load_time 35.8 1 pd_respondent_lang_entry_ind 35.8 1 pd_respondent_lang_exit_ind 35.8 1 pd_respondent_closemodal_time 137.6 1 pd_respondent_close_browser_time 140 1 pd_respondent_backmodal_time 75 1 pd_verifyaddress_next_time 34.6 1 pd_verifyaddress_load_time 24.7 1 pd_verifyaddress_lang_entry_ind 24.7 1 pd_verifyaddress_lang_exit_ind 24.7 1 pd_verifyaddress_h_occ_yes_ind_time 33.7 1 pd_verifyaddress_close_browser_time 35.1 The code does not work on pd_login_lang_entry_ind and pd_login_lang_exit_ind instead of being missing, they took the value of pd_respondent_load_time. Grateful for any help!
... View more