BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lydiawawa
Lapis Lazuli | Level 10

Hi All,

 

I have a dataset that carries missing by terms containing _LOAD_TIME, LANG_ENTRY_TIME, LANG_EXIT_TIME and CLOSE_BROWSER_TIME. The flag variable is created from the X value extracted from DASHBOARD_X_START_TIME.

This can be achieved by using:

 

flag = strip(compress(strip(name),,'kd'));

 

Which I have done so already. Now I need to populate the missing cells by using the later created none-missing flag value, and I wish to maintain the original order.

 

The dataset with missings:

 

 

NameFlag
DASHBOARD_LOAD_TIME 
DASHBOARD_LANG_ENTRY_IND 
DASHBOARD_1_START_TIME1
DASHBOARD_LANG_EXT_IND 
DASHBOARD_CLOSE_BROWSER_TIME 
DASHBOARD_LOAD_TIME 
DASHBOARD_LANG_ENTRY_IND 
DASHBOARD_2_START_TIME2
DASHBOARD_LANG_EXT_IND 
DASHBOARD_CLOSE_BROWSER_TIME 
DASHBOARD_LOAD_TIME 
DASHBOARD_LANG_ENTRY_IND 
DASHBOARD_3_START_TIME3
DASHBOARD_LANG_EXT_IND 
DASHBOARD_CLOSE_BROWSER_TIME 

 

 

The desired outcome:

 

NameFlag
DASHBOARD_LOAD_TIME1
DASHBOARD_LANG_ENTRY_IND1
DASHBOARD_1_START_TIME1
DASHBOARD_LANG_EXT_IND1
DASHBOARD_CLOSE_BROWSER_TIME1
DASHBOARD_LOAD_TIME2
DASHBOARD_LANG_ENTRY_IND2
DASHBOARD_2_START_TIME2
DASHBOARD_LANG_EXT_IND2
DASHBOARD_CLOSE_BROWSER_TIME2
DASHBOARD_LOAD_TIME3
DASHBOARD_LANG_ENTRY_IND3
DASHBOARD_3_START_TIME3
DASHBOARD_LANG_EXT_IND3
DASHBOARD_CLOSE_BROWSER_TIME3

 

Thank you for any inputs!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hello @lydiawawa 

 



data have;
infile cards truncover;
input Name :$30.	Flag ;
cards;
DASHBOARD_LOAD_TIME	 
DASHBOARD_LANG_ENTRY_IND	 
DASHBOARD_1_START_TIME	1
DASHBOARD_LANG_EXT_IND	 
DASHBOARD_CLOSE_BROWSER_TIME	 
DASHBOARD_LOAD_TIME	 
DASHBOARD_LANG_ENTRY_IND	 
DASHBOARD_2_START_TIME	2
DASHBOARD_LANG_EXT_IND	 
DASHBOARD_CLOSE_BROWSER_TIME	 
DASHBOARD_LOAD_TIME	 
DASHBOARD_LANG_ENTRY_IND	 
DASHBOARD_3_START_TIME	3
DASHBOARD_LANG_EXT_IND	 
DASHBOARD_CLOSE_BROWSER_TIME	
; 
 
data temp;
set have;
if 	name='DASHBOARD_LOAD_TIME' then grp+1;
n+1;
run;

proc sql;
create table want(drop=n) as
select name, grp,n, max(flag) as flag
from temp 
group by grp 
order by n;
quit;

View solution in original post

8 REPLIES 8
Reeza
Super User
How do you know those colour coded groups go together. Is the order guaranteed or is there another way to obtain the groups?
lydiawawa
Lapis Lazuli | Level 10
The order is guaranteed, meaning the none-missing flag is always in the middle flanked by DASHBOARD_LOAD_TIME, DASHBOARD_LANG_ENTRY_IND and DASHBOARD_LANG_EXT_IND, DASHBOARD_CLOSE_BROWSER_TIME. You do not have to obtain the groups, the dataset has already been shaped that way that is why I requested to maintain the original order. Thanks!
lydiawawa
Lapis Lazuli | Level 10

But I think the challenge is how to populate the missing so that they take the middle none-missing flag. After filling in the missing we will be able to use flag to indicate that those are in a group. Even though the original structure has already shaped that way, having a flag variable is to help us to detect the actual duplicates, and to avoid treating none-dups as dups. (unduplication is irrelevant to this post, I just need to create the flag variable)

Reeza
Super User

@lydiawawa wrote:

But I think the challenge is how to populate the missing so that they take the middle none-missing flag. After filling in the missing we will be able to use flag to indicate that those are in a group. Even though the original structure has already shaped that way, having a flag variable is to help us to detect the actual duplicates, and to avoid treating none-dups as dups. (unduplication is irrelevant to this post, I just need to create the flag variable)


But SAS can't know that or that those observations belong to a group so we need to make it easier to do that by adding a grouping variable. Then order won't matter because you can always resort. 

 

 

Shmuel
Garnet | Level 18

Step-1:  I would add two serial numbers for sorting:

1) a serial number per group:

1 DASHBOARD_LOAD_TIME	 
1 DASHBOARD_LANG_ENTRY_IND	 
1 DASHBOARD_1_START_TIME	
1 DASHBOARD_LANG_EXT_IND	 
1 DASHBOARD_CLOSE_BROWSER_TIME
2 DASHBOARD_LOAD_TIME
2 DASHBOARD_LANG_ENTRY_IND
2 DASHBOARD_1_START_TIME
2 DASHBOARD_LANG_EXT_IND
2 DASHBOARD_CLOSE_BROWSER_TIME

2) a serial number inside the dataset, to keep original order.

 

the code to do it:

data temp;
 set have;
       retain serial1 1;
       if index(variable,'LOAD_TIME') > 0 then serial1 + 1;
       serial2 = _N_;
run;

 

Step-2: sort by serial1 descending flag

Step-3: Retain the non missing flag (1st obs in the group) to assign when it is missing

Step-4: Sort by serial2, back to original order and drop useless variables.

Reeza
Super User
For steps 2-4 you could replace that with a single SQL query. I agree with adding the index first of course.
novinosrin
Tourmaline | Level 20

Hello @lydiawawa 

 



data have;
infile cards truncover;
input Name :$30.	Flag ;
cards;
DASHBOARD_LOAD_TIME	 
DASHBOARD_LANG_ENTRY_IND	 
DASHBOARD_1_START_TIME	1
DASHBOARD_LANG_EXT_IND	 
DASHBOARD_CLOSE_BROWSER_TIME	 
DASHBOARD_LOAD_TIME	 
DASHBOARD_LANG_ENTRY_IND	 
DASHBOARD_2_START_TIME	2
DASHBOARD_LANG_EXT_IND	 
DASHBOARD_CLOSE_BROWSER_TIME	 
DASHBOARD_LOAD_TIME	 
DASHBOARD_LANG_ENTRY_IND	 
DASHBOARD_3_START_TIME	3
DASHBOARD_LANG_EXT_IND	 
DASHBOARD_CLOSE_BROWSER_TIME	
; 
 
data temp;
set have;
if 	name='DASHBOARD_LOAD_TIME' then grp+1;
n+1;
run;

proc sql;
create table want(drop=n) as
select name, grp,n, max(flag) as flag
from temp 
group by grp 
order by n;
quit;
Patrick
Opal | Level 21

@lydiawawa 

The following code should return what you want IF your source data complies with the following requirements:

- A new group starts always with value "DASHBOARD_LOAD_TIME"

- Each group has a row with value "DASHBOARD_<digits>_START_TIME"

 

data have;
  infile cards truncover;
  input Name :$30.;
  cards;
DASHBOARD_LOAD_TIME  
DASHBOARD_LANG_ENTRY_IND   
DASHBOARD_1_START_TIME
DASHBOARD_LANG_EXT_IND   
DASHBOARD_CLOSE_BROWSER_TIME   
DASHBOARD_LOAD_TIME  
DASHBOARD_LANG_ENTRY_IND   
DASHBOARD_2_START_TIME
DASHBOARD_LANG_EXT_IND   
DASHBOARD_CLOSE_BROWSER_TIME   
DASHBOARD_LOAD_TIME  
DASHBOARD_LANG_ENTRY_IND   
DASHBOARD_LANG_EXT_IND   
DASHBOARD_CLOSE_BROWSER_TIME  
DASHBOARD_3_START_TIME
;
run;

data want(drop=_:);
  length id start_time_num 8;
  retain start_time_num;
  id=_n_;
  set have;
  if name='DASHBOARD_LOAD_TIME' then
    do;
      call missing(start_time_num);
      do _point=_n_+1 to _nobs;
        set have(keep=name rename=(name=_name)) point=_point nobs=_nobs;
        if prxmatch('/DASHBOARD_\d+_START_TIME/oi',_name)>0 then
          do;
            start_time_num=input(scan(_name,2,'_'),?? best32.);
            leave;
          end;
      end;
    end;
run;

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 763 views
  • 5 likes
  • 5 in conversation