BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lydiawawa
Lapis Lazuli | Level 10
@novinosrin This is a para-dataset. Each session_id represents a user, and the user is jumping from one screen to another, a screen name is defined by the first or the first+second word after pd_. Therefore, the answer would be no for each session there will be multiple load_time and start_time since the user is switching screens. Thank you for your patience.
lydiawawa
Lapis Lazuli | Level 10
@novinosrin I changed sample dataset to imitate the actual name values.
novinosrin
Tourmaline | Level 20
data have;
input session_id         name  : $12.       time;
cards;
    1            pd_load           12
    1            entry_load        13
    1            entry_start       12
    1            pd_time           13
    1            entry_time        9
    1            entry_text         .
    1            pd_text            .
    1            pd_device          .
    2            pd_device          .
    2            pd_text            .
    2            pd_time            7
    2            pd_load            11 
    3            pd_text            .
    3            pd_load            10
    3            entry_start        4
    3            entry_text         .
	;

proc sql;
create table want(drop=t1 t2) as
select a.session_id,a.name,t1,ifn(a.time=.,b.time,a.time)as time ,t2
from have a left join (select session_id,scan(name,1,'_') as t1,scan(name,-1,'_') as t2,time from have where scan(name,-1,'_') in ('start','load')) b
on a.session_id=b.session_id and scan(a.name,1,'_')=t1
group by a.session_id,a.name,t1
having max(t2)=t2;
quit;
novinosrin
Tourmaline | Level 20

Hi @lydiawawa 

 

Alright, I think  if my understanding is correct, the following should solve the problem

 

data have;
input session_id         name  : $50.       time;
cards;
    1            pd_login_load_time                     12
    1            pd_entry_confirm_load_time             13
    1            pd_entry_confirm_start_time            12
    1            pd_save_time                           13
    1            pd_entry_login_time                     9
    1            pd_entry_confirm_text                  12
    1            pd_login_text                          12
    1            pd_login_device                        12
    2            pd_confirm_device                      11
    2            pd_confirm_text                        11
    2            pd_login_time                           7
    2            pd_confirm_load_time                   11 
    3            pd_login_text                          10
    3            pd_login_load_time                     10
    3            pd_entry_confirm_start_time             4
    3            pd_entry_confirm_text                   4
    4            pd_HH_dash_lang_entry_ind              5
    4            pd_HH_dash_load_time                   5
    5            pd_dob_close_browser_time              6
    5            pd_dob_lang_entry_ind                  23
    5            pd_dob_lang_exit_ind                   23
    5            pd_dob_load_time                       23
    5            pd_dob_next_time                       11
    5            pd_home_lang_entry_ind                 16
    5            pd_home_lang_exit_ind                  16
    5            pd_home_lang_load_time                 16
    5            pd_home_lang_next_time                 18
	;

proc sort data=have out=_have;
by session_id descending time;
run;

data want;
if _n_=1 then do;
dcl hash H (multidata:'y') ;
   h.definekey  ("session_id") ;
   h.definedata ("t1","ti") ;
   h.definedone () ;
end;
do until(last.session_id);
set _have end=lr;
by session_id;
if time>. then 
	do;
		k=findw(name,'load','_');
		k1=findw(name,'start','_');
		ti=time;
		if k>0 then t1='load';
		else if k1>0 then t1='start';
		if sum(k,k1)>0 then rc=h.add() ;
   end;
else if time=. then do;
    do rc=h.find() by 0 while(rc=0);
    if t1='start' then start=ti;
     else if t1='load' then load=ti;
      rc=h.find_next();
     end;
time=Coalesce(start,load);
end;
output;
end;
keep session_id name time;
run;

And for some reason, I feel silly that somewhere I overlooked and didn't comprehend properly. Really have to work on my communication and comprehension skills.

 

Let me know

 

PS 

1. No need to find patterns

2. We are going with session_id has keys

3. Populate non missings with start,load in hash object

4. Look up using session_id as key

5. Fetch the time

 

and we are done!

novinosrin
Tourmaline | Level 20

@lydiawawa   Did the above work for you?

lydiawawa
Lapis Lazuli | Level 10
@novinosrin I made the updates, and I think you should be more clear now. Let me know of any questions. Thank you!
lydiawawa
Lapis Lazuli | Level 10
Not quiet.. @novinosrin I will attempt to retype the sample data and attach of a snap of the actual dataset by a single sesion_id. I will notify you by tonight.. thank you for all the help.
novinosrin
Tourmaline | Level 20

HI @lydiawawa   Just had a look at your edited sample. 

 

I can't spot "_sessionstart_time"

 

So for your sample, session we go with _load_time?

 

lydiawawa
Lapis Lazuli | Level 10
Hi @novinosrin usually for each screen (pd_xxx_yy_etc) there will be a sessionstart_time or load_time. However, they could appear together (rare) and when they do we will be taking sessionstart_time instead of load_time. In this case, session_id 1 does not have sessionstart_time so we take load_time.
novinosrin
Tourmaline | Level 20

Hi @lydiawawa  I followed your code to understand the split of the name that forms the group. I think t3 is perhaps not needed to accomplish what you want.

 

I made some minor adjustments to your code on noticing your comment "The code does not work on pd_login_lang_entry_ind and  pd_login_lang_exit_ind instead of being missing, they took the value of 

pd_respondent_load_time."

 

Please review your code that i adjusted below

 

data have;
input session_id         name  : $70.       time;
rownum=_n_;
cards;
1            pd_confirm_resp_for_quest_adr_yes_ind_time            22.7      
    1            pd_confirm_next_time                                  23.7
    1            pd_confirm_load_time                                         13.3
    1            pd_confirm_lang_entry_ind                             .
    1            pd_confirm_lang_exit_ind                              .
    1            pd_confirm_close_browser_time                         24
    1            pd_login_resp_user_id_time                           2.1
    1            pd_login_login_time                                  12.2
    1            pd_login_lang_entry_ind                              .
    1            pd_login_lang_exit_ind                               .
    1            pd_login_close_browser_time                          36.8
    1            pd_respondent_p_first_name_time                       40.3
    1            pd_respondent_p_middle_name_time                     41.6
    1            pd_respondent_p_last_name_time                       35.8
    1            pd_respondent_load_time                              35.8
    1            pd_respondent_lang_entry_ind                         .
    1            pd_respondent_lang_exit_ind                          .
    1            pd_respondent_closemodal_time                        137.6
    1            pd_respondent_close_browser_time                     140
    1            pd_respondent_backmodal_time                         75
    1            pd_verifyaddress_next_time                           34.6
    1            pd_verifyaddress_load_time                           24.7
    1            pd_verifyaddress_lang_entry_ind                      .
    1            pd_verifyaddress_lang_exit_ind                       .
    1            pd_verifyaddress_h_occ_yes_ind_time                  33.7
    1            pd_verifyaddress_close_browser_time                  35.1
	;

data have; 
set have;
t1 = scan(strip(name), 2, '_');
t3 = scan(strip(name), 3, '_');
run;

proc sort data=have out=_have;
by session_id t1  descending time;
run;

data want;
set _have;
by session_id t1  descending time;
   retain temp ;
    if first.session_id or first.t1 then  temp=.;
    if (t3 = "load" or t3 = "sessionstart") and not missing(time) then  temp=time;
    else if missing(time) then time = temp;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 24 replies
  • 1411 views
  • 13 likes
  • 3 in conversation