Solved: How to filling in missing by the none-missing value in the middle?

lydiawawa · Posted 05-10-2019 02:56 PM

Hi All,

I have a dataset that carries missing by terms containing _LOAD_TIME, LANG_ENTRY_TIME, LANG_EXIT_TIME and CLOSE_BROWSER_TIME. The flag variable is created from the X value extracted from DASHBOARD_X_START_TIME.

This can be achieved by using:

flag = strip(compress(strip(name),,'kd'));

Which I have done so already. Now I need to populate the missing cells by using the later created none-missing flag value, and I wish to maintain the original order.

The dataset with missings:

Name	Flag
DASHBOARD_LOAD_TIME
DASHBOARD_LANG_ENTRY_IND
DASHBOARD_1_START_TIME	1
DASHBOARD_LANG_EXT_IND
DASHBOARD_CLOSE_BROWSER_TIME
DASHBOARD_LOAD_TIME
DASHBOARD_LANG_ENTRY_IND
DASHBOARD_2_START_TIME	2
DASHBOARD_LANG_EXT_IND
DASHBOARD_CLOSE_BROWSER_TIME
DASHBOARD_LOAD_TIME
DASHBOARD_LANG_ENTRY_IND
DASHBOARD_3_START_TIME	3
DASHBOARD_LANG_EXT_IND
DASHBOARD_CLOSE_BROWSER_TIME

The desired outcome:

Name	Flag
DASHBOARD_LOAD_TIME	1
DASHBOARD_LANG_ENTRY_IND	1
DASHBOARD_1_START_TIME	1
DASHBOARD_LANG_EXT_IND	1
DASHBOARD_CLOSE_BROWSER_TIME	1
DASHBOARD_LOAD_TIME	2
DASHBOARD_LANG_ENTRY_IND	2
DASHBOARD_2_START_TIME	2
DASHBOARD_LANG_EXT_IND	2
DASHBOARD_CLOSE_BROWSER_TIME	2
DASHBOARD_LOAD_TIME	3
DASHBOARD_LANG_ENTRY_IND	3
DASHBOARD_3_START_TIME	3
DASHBOARD_LANG_EXT_IND	3
DASHBOARD_CLOSE_BROWSER_TIME	3

Thank you for any inputs!

novinosrin · Posted 05-10-2019 05:29 PM

Hello @lydiawawa



data have;
infile cards truncover;
input Name :$30.	Flag ;
cards;
DASHBOARD_LOAD_TIME	 
DASHBOARD_LANG_ENTRY_IND	 
DASHBOARD_1_START_TIME	1
DASHBOARD_LANG_EXT_IND	 
DASHBOARD_CLOSE_BROWSER_TIME	 
DASHBOARD_LOAD_TIME	 
DASHBOARD_LANG_ENTRY_IND	 
DASHBOARD_2_START_TIME	2
DASHBOARD_LANG_EXT_IND	 
DASHBOARD_CLOSE_BROWSER_TIME	 
DASHBOARD_LOAD_TIME	 
DASHBOARD_LANG_ENTRY_IND	 
DASHBOARD_3_START_TIME	3
DASHBOARD_LANG_EXT_IND	 
DASHBOARD_CLOSE_BROWSER_TIME	
; 
 
data temp;
set have;
if 	name='DASHBOARD_LOAD_TIME' then grp+1;
n+1;
run;

proc sql;
create table want(drop=n) as
select name, grp,n, max(flag) as flag
from temp 
group by grp 
order by n;
quit;

View solution in original post

Reeza · Posted 05-10-2019 03:28 PM

How do you know those colour coded groups go together. Is the order guaranteed or is there another way to obtain the groups?

lydiawawa · Posted 05-10-2019 03:40 PM

The order is guaranteed, meaning the none-missing flag is always in the middle flanked by DASHBOARD_LOAD_TIME, DASHBOARD_LANG_ENTRY_IND and DASHBOARD_LANG_EXT_IND, DASHBOARD_CLOSE_BROWSER_TIME. You do not have to obtain the groups, the dataset has already been shaped that way that is why I requested to maintain the original order. Thanks!

lydiawawa · Posted 05-10-2019 03:48 PM

But I think the challenge is how to populate the missing so that they take the middle none-missing flag. After filling in the missing we will be able to use flag to indicate that those are in a group. Even though the original structure has already shaped that way, having a flag variable is to help us to detect the actual duplicates, and to avoid treating none-dups as dups. (unduplication is irrelevant to this post, I just need to create the flag variable)

Reeza · Posted 05-10-2019 04:51 PM

@lydiawawa wrote:

But I think the challenge is how to populate the missing so that they take the middle none-missing flag. After filling in the missing we will be able to use flag to indicate that those are in a group. Even though the original structure has already shaped that way, having a flag variable is to help us to detect the actual duplicates, and to avoid treating none-dups as dups. (unduplication is irrelevant to this post, I just need to create the flag variable)

But SAS can't know that or that those observations belong to a group so we need to make it easier to do that by adding a grouping variable. Then order won't matter because you can always resort.

Shmuel · Posted 05-10-2019 04:10 PM

Step-1: I would add two serial numbers for sorting:

1) a serial number per group:

1 DASHBOARD_LOAD_TIME	 
1 DASHBOARD_LANG_ENTRY_IND	 
1 DASHBOARD_1_START_TIME	
1 DASHBOARD_LANG_EXT_IND	 
1 DASHBOARD_CLOSE_BROWSER_TIME
2 DASHBOARD_LOAD_TIME 
2 DASHBOARD_LANG_ENTRY_IND 
2 DASHBOARD_1_START_TIME 
2 DASHBOARD_LANG_EXT_IND 
2 DASHBOARD_CLOSE_BROWSER_TIME

2) a serial number inside the dataset, to keep original order.

the code to do it:

data temp;
 set have;
       retain serial1 1;
       if index(variable,'LOAD_TIME') > 0 then serial1 + 1;
       serial2 = _N_;
run;

Step-2: sort by serial1 descending flag

Step-3: Retain the non missing flag (1st obs in the group) to assign when it is missing

Step-4: Sort by serial2, back to original order and drop useless variables.

Reeza · Posted 05-10-2019 04:50 PM

For steps 2-4 you could replace that with a single SQL query. I agree with adding the index first of course.

novinosrin · Posted 05-10-2019 05:29 PM

Hello @lydiawawa



data have;
infile cards truncover;
input Name :$30.	Flag ;
cards;
DASHBOARD_LOAD_TIME	 
DASHBOARD_LANG_ENTRY_IND	 
DASHBOARD_1_START_TIME	1
DASHBOARD_LANG_EXT_IND	 
DASHBOARD_CLOSE_BROWSER_TIME	 
DASHBOARD_LOAD_TIME	 
DASHBOARD_LANG_ENTRY_IND	 
DASHBOARD_2_START_TIME	2
DASHBOARD_LANG_EXT_IND	 
DASHBOARD_CLOSE_BROWSER_TIME	 
DASHBOARD_LOAD_TIME	 
DASHBOARD_LANG_ENTRY_IND	 
DASHBOARD_3_START_TIME	3
DASHBOARD_LANG_EXT_IND	 
DASHBOARD_CLOSE_BROWSER_TIME	
; 
 
data temp;
set have;
if 	name='DASHBOARD_LOAD_TIME' then grp+1;
n+1;
run;

proc sql;
create table want(drop=n) as
select name, grp,n, max(flag) as flag
from temp 
group by grp 
order by n;
quit;

Patrick · Posted 05-11-2019 12:52 AM

@lydiawawa

The following code should return what you want IF your source data complies with the following requirements:

- A new group starts always with value "DASHBOARD_LOAD_TIME"

- Each group has a row with value "DASHBOARD_<digits>_START_TIME"

data have;
  infile cards truncover;
  input Name :$30.;
  cards;
DASHBOARD_LOAD_TIME  
DASHBOARD_LANG_ENTRY_IND   
DASHBOARD_1_START_TIME
DASHBOARD_LANG_EXT_IND   
DASHBOARD_CLOSE_BROWSER_TIME   
DASHBOARD_LOAD_TIME  
DASHBOARD_LANG_ENTRY_IND   
DASHBOARD_2_START_TIME
DASHBOARD_LANG_EXT_IND   
DASHBOARD_CLOSE_BROWSER_TIME   
DASHBOARD_LOAD_TIME  
DASHBOARD_LANG_ENTRY_IND   
DASHBOARD_LANG_EXT_IND   
DASHBOARD_CLOSE_BROWSER_TIME  
DASHBOARD_3_START_TIME
;
run;

data want(drop=_:);
  length id start_time_num 8;
  retain start_time_num;
  id=_n_;
  set have;
  if name='DASHBOARD_LOAD_TIME' then
    do;
      call missing(start_time_num);
      do _point=_n_+1 to _nobs;
        set have(keep=name rename=(name=_name)) point=_point nobs=_nobs;
        if prxmatch('/DASHBOARD_\d+_START_TIME/oi',_name)>0 then
          do;
            start_time_num=input(scan(_name,2,'_'),?? best32.);
            leave;
          end;
      end;
    end;
run;

How to filling in missing by the none-missing value in the middle?

Re: How to filling in missing by the none-missing value in the middle?

Re: How to filling in missing by the none-missing value in the middle?

Re: How to filling in missing by the none-missing value in the middle?

Re: How to filling in missing by the none-missing value in the middle?

Re: How to filling in missing by the none-missing value in the middle?

Re: How to filling in missing by the none-missing value in the middle?

Re: How to filling in missing by the none-missing value in the middle?

Re: How to filling in missing by the none-missing value in the middle?

Re: How to filling in missing by the none-missing value in the middle?

Registration is open

SAS Training: Just a Click Away