BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lydiawawa
Lapis Lazuli | Level 10

Hi All,

 

I have a dataset that carries missing by terms containing _LOAD_TIME, LANG_ENTRY_TIME, LANG_EXIT_TIME and CLOSE_BROWSER_TIME. The flag variable is created from the X value extracted from DASHBOARD_X_START_TIME.

This can be achieved by using:

 

flag = strip(compress(strip(name),,'kd'));

 

Which I have done so already. Now I need to populate the missing cells by using the later created none-missing flag value, and I wish to maintain the original order.

 

The dataset with missings:

 

 

NameFlag
DASHBOARD_LOAD_TIME 
DASHBOARD_LANG_ENTRY_IND 
DASHBOARD_1_START_TIME1
DASHBOARD_LANG_EXT_IND 
DASHBOARD_CLOSE_BROWSER_TIME 
DASHBOARD_LOAD_TIME 
DASHBOARD_LANG_ENTRY_IND 
DASHBOARD_2_START_TIME2
DASHBOARD_LANG_EXT_IND 
DASHBOARD_CLOSE_BROWSER_TIME 
DASHBOARD_LOAD_TIME 
DASHBOARD_LANG_ENTRY_IND 
DASHBOARD_3_START_TIME3
DASHBOARD_LANG_EXT_IND 
DASHBOARD_CLOSE_BROWSER_TIME 

 

 

The desired outcome:

 

NameFlag
DASHBOARD_LOAD_TIME1
DASHBOARD_LANG_ENTRY_IND1
DASHBOARD_1_START_TIME1
DASHBOARD_LANG_EXT_IND1
DASHBOARD_CLOSE_BROWSER_TIME1
DASHBOARD_LOAD_TIME2
DASHBOARD_LANG_ENTRY_IND2
DASHBOARD_2_START_TIME2
DASHBOARD_LANG_EXT_IND2
DASHBOARD_CLOSE_BROWSER_TIME2
DASHBOARD_LOAD_TIME3
DASHBOARD_LANG_ENTRY_IND3
DASHBOARD_3_START_TIME3
DASHBOARD_LANG_EXT_IND3
DASHBOARD_CLOSE_BROWSER_TIME3

 

Thank you for any inputs!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hello @lydiawawa 

 



data have;
infile cards truncover;
input Name :$30.	Flag ;
cards;
DASHBOARD_LOAD_TIME	 
DASHBOARD_LANG_ENTRY_IND	 
DASHBOARD_1_START_TIME	1
DASHBOARD_LANG_EXT_IND	 
DASHBOARD_CLOSE_BROWSER_TIME	 
DASHBOARD_LOAD_TIME	 
DASHBOARD_LANG_ENTRY_IND	 
DASHBOARD_2_START_TIME	2
DASHBOARD_LANG_EXT_IND	 
DASHBOARD_CLOSE_BROWSER_TIME	 
DASHBOARD_LOAD_TIME	 
DASHBOARD_LANG_ENTRY_IND	 
DASHBOARD_3_START_TIME	3
DASHBOARD_LANG_EXT_IND	 
DASHBOARD_CLOSE_BROWSER_TIME	
; 
 
data temp;
set have;
if 	name='DASHBOARD_LOAD_TIME' then grp+1;
n+1;
run;

proc sql;
create table want(drop=n) as
select name, grp,n, max(flag) as flag
from temp 
group by grp 
order by n;
quit;

View solution in original post

8 REPLIES 8
Reeza
Super User
How do you know those colour coded groups go together. Is the order guaranteed or is there another way to obtain the groups?
lydiawawa
Lapis Lazuli | Level 10
The order is guaranteed, meaning the none-missing flag is always in the middle flanked by DASHBOARD_LOAD_TIME, DASHBOARD_LANG_ENTRY_IND and DASHBOARD_LANG_EXT_IND, DASHBOARD_CLOSE_BROWSER_TIME. You do not have to obtain the groups, the dataset has already been shaped that way that is why I requested to maintain the original order. Thanks!
lydiawawa
Lapis Lazuli | Level 10

But I think the challenge is how to populate the missing so that they take the middle none-missing flag. After filling in the missing we will be able to use flag to indicate that those are in a group. Even though the original structure has already shaped that way, having a flag variable is to help us to detect the actual duplicates, and to avoid treating none-dups as dups. (unduplication is irrelevant to this post, I just need to create the flag variable)

Reeza
Super User

@lydiawawa wrote:

But I think the challenge is how to populate the missing so that they take the middle none-missing flag. After filling in the missing we will be able to use flag to indicate that those are in a group. Even though the original structure has already shaped that way, having a flag variable is to help us to detect the actual duplicates, and to avoid treating none-dups as dups. (unduplication is irrelevant to this post, I just need to create the flag variable)


But SAS can't know that or that those observations belong to a group so we need to make it easier to do that by adding a grouping variable. Then order won't matter because you can always resort. 

 

 

Shmuel
Garnet | Level 18

Step-1:  I would add two serial numbers for sorting:

1) a serial number per group:

1 DASHBOARD_LOAD_TIME	 
1 DASHBOARD_LANG_ENTRY_IND	 
1 DASHBOARD_1_START_TIME	
1 DASHBOARD_LANG_EXT_IND	 
1 DASHBOARD_CLOSE_BROWSER_TIME
2 DASHBOARD_LOAD_TIME
2 DASHBOARD_LANG_ENTRY_IND
2 DASHBOARD_1_START_TIME
2 DASHBOARD_LANG_EXT_IND
2 DASHBOARD_CLOSE_BROWSER_TIME

2) a serial number inside the dataset, to keep original order.

 

the code to do it:

data temp;
 set have;
       retain serial1 1;
       if index(variable,'LOAD_TIME') > 0 then serial1 + 1;
       serial2 = _N_;
run;

 

Step-2: sort by serial1 descending flag

Step-3: Retain the non missing flag (1st obs in the group) to assign when it is missing

Step-4: Sort by serial2, back to original order and drop useless variables.

Reeza
Super User
For steps 2-4 you could replace that with a single SQL query. I agree with adding the index first of course.
novinosrin
Tourmaline | Level 20

Hello @lydiawawa 

 



data have;
infile cards truncover;
input Name :$30.	Flag ;
cards;
DASHBOARD_LOAD_TIME	 
DASHBOARD_LANG_ENTRY_IND	 
DASHBOARD_1_START_TIME	1
DASHBOARD_LANG_EXT_IND	 
DASHBOARD_CLOSE_BROWSER_TIME	 
DASHBOARD_LOAD_TIME	 
DASHBOARD_LANG_ENTRY_IND	 
DASHBOARD_2_START_TIME	2
DASHBOARD_LANG_EXT_IND	 
DASHBOARD_CLOSE_BROWSER_TIME	 
DASHBOARD_LOAD_TIME	 
DASHBOARD_LANG_ENTRY_IND	 
DASHBOARD_3_START_TIME	3
DASHBOARD_LANG_EXT_IND	 
DASHBOARD_CLOSE_BROWSER_TIME	
; 
 
data temp;
set have;
if 	name='DASHBOARD_LOAD_TIME' then grp+1;
n+1;
run;

proc sql;
create table want(drop=n) as
select name, grp,n, max(flag) as flag
from temp 
group by grp 
order by n;
quit;
Patrick
Opal | Level 21

@lydiawawa 

The following code should return what you want IF your source data complies with the following requirements:

- A new group starts always with value "DASHBOARD_LOAD_TIME"

- Each group has a row with value "DASHBOARD_<digits>_START_TIME"

 

data have;
  infile cards truncover;
  input Name :$30.;
  cards;
DASHBOARD_LOAD_TIME  
DASHBOARD_LANG_ENTRY_IND   
DASHBOARD_1_START_TIME
DASHBOARD_LANG_EXT_IND   
DASHBOARD_CLOSE_BROWSER_TIME   
DASHBOARD_LOAD_TIME  
DASHBOARD_LANG_ENTRY_IND   
DASHBOARD_2_START_TIME
DASHBOARD_LANG_EXT_IND   
DASHBOARD_CLOSE_BROWSER_TIME   
DASHBOARD_LOAD_TIME  
DASHBOARD_LANG_ENTRY_IND   
DASHBOARD_LANG_EXT_IND   
DASHBOARD_CLOSE_BROWSER_TIME  
DASHBOARD_3_START_TIME
;
run;

data want(drop=_:);
  length id start_time_num 8;
  retain start_time_num;
  id=_n_;
  set have;
  if name='DASHBOARD_LOAD_TIME' then
    do;
      call missing(start_time_num);
      do _point=_n_+1 to _nobs;
        set have(keep=name rename=(name=_name)) point=_point nobs=_nobs;
        if prxmatch('/DASHBOARD_\d+_START_TIME/oi',_name)>0 then
          do;
            start_time_num=input(scan(_name,2,'_'),?? best32.);
            leave;
          end;
      end;
    end;
run;

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1571 views
  • 5 likes
  • 5 in conversation