Hi All,
I have a dataset that carries missing by terms containing _LOAD_TIME, LANG_ENTRY_TIME, LANG_EXIT_TIME and CLOSE_BROWSER_TIME. The flag variable is created from the X value extracted from DASHBOARD_X_START_TIME.
This can be achieved by using:
flag = strip(compress(strip(name),,'kd'));
Which I have done so already. Now I need to populate the missing cells by using the later created none-missing flag value, and I wish to maintain the original order.
The dataset with missings:
Name | Flag |
DASHBOARD_LOAD_TIME | |
DASHBOARD_LANG_ENTRY_IND | |
DASHBOARD_1_START_TIME | 1 |
DASHBOARD_LANG_EXT_IND | |
DASHBOARD_CLOSE_BROWSER_TIME | |
DASHBOARD_LOAD_TIME | |
DASHBOARD_LANG_ENTRY_IND | |
DASHBOARD_2_START_TIME | 2 |
DASHBOARD_LANG_EXT_IND | |
DASHBOARD_CLOSE_BROWSER_TIME | |
DASHBOARD_LOAD_TIME | |
DASHBOARD_LANG_ENTRY_IND | |
DASHBOARD_3_START_TIME | 3 |
DASHBOARD_LANG_EXT_IND | |
DASHBOARD_CLOSE_BROWSER_TIME |
The desired outcome:
Name | Flag |
DASHBOARD_LOAD_TIME | 1 |
DASHBOARD_LANG_ENTRY_IND | 1 |
DASHBOARD_1_START_TIME | 1 |
DASHBOARD_LANG_EXT_IND | 1 |
DASHBOARD_CLOSE_BROWSER_TIME | 1 |
DASHBOARD_LOAD_TIME | 2 |
DASHBOARD_LANG_ENTRY_IND | 2 |
DASHBOARD_2_START_TIME | 2 |
DASHBOARD_LANG_EXT_IND | 2 |
DASHBOARD_CLOSE_BROWSER_TIME | 2 |
DASHBOARD_LOAD_TIME | 3 |
DASHBOARD_LANG_ENTRY_IND | 3 |
DASHBOARD_3_START_TIME | 3 |
DASHBOARD_LANG_EXT_IND | 3 |
DASHBOARD_CLOSE_BROWSER_TIME | 3 |
Thank you for any inputs!
Hello @lydiawawa
data have;
infile cards truncover;
input Name :$30. Flag ;
cards;
DASHBOARD_LOAD_TIME
DASHBOARD_LANG_ENTRY_IND
DASHBOARD_1_START_TIME 1
DASHBOARD_LANG_EXT_IND
DASHBOARD_CLOSE_BROWSER_TIME
DASHBOARD_LOAD_TIME
DASHBOARD_LANG_ENTRY_IND
DASHBOARD_2_START_TIME 2
DASHBOARD_LANG_EXT_IND
DASHBOARD_CLOSE_BROWSER_TIME
DASHBOARD_LOAD_TIME
DASHBOARD_LANG_ENTRY_IND
DASHBOARD_3_START_TIME 3
DASHBOARD_LANG_EXT_IND
DASHBOARD_CLOSE_BROWSER_TIME
;
data temp;
set have;
if name='DASHBOARD_LOAD_TIME' then grp+1;
n+1;
run;
proc sql;
create table want(drop=n) as
select name, grp,n, max(flag) as flag
from temp
group by grp
order by n;
quit;
But I think the challenge is how to populate the missing so that they take the middle none-missing flag. After filling in the missing we will be able to use flag to indicate that those are in a group. Even though the original structure has already shaped that way, having a flag variable is to help us to detect the actual duplicates, and to avoid treating none-dups as dups. (unduplication is irrelevant to this post, I just need to create the flag variable)
@lydiawawa wrote:
But I think the challenge is how to populate the missing so that they take the middle none-missing flag. After filling in the missing we will be able to use flag to indicate that those are in a group. Even though the original structure has already shaped that way, having a flag variable is to help us to detect the actual duplicates, and to avoid treating none-dups as dups. (unduplication is irrelevant to this post, I just need to create the flag variable)
But SAS can't know that or that those observations belong to a group so we need to make it easier to do that by adding a grouping variable. Then order won't matter because you can always resort.
Step-1: I would add two serial numbers for sorting:
1) a serial number per group:
1 DASHBOARD_LOAD_TIME 1 DASHBOARD_LANG_ENTRY_IND 1 DASHBOARD_1_START_TIME 1 DASHBOARD_LANG_EXT_IND 1 DASHBOARD_CLOSE_BROWSER_TIME
2 DASHBOARD_LOAD_TIME
2 DASHBOARD_LANG_ENTRY_IND
2 DASHBOARD_1_START_TIME
2 DASHBOARD_LANG_EXT_IND
2 DASHBOARD_CLOSE_BROWSER_TIME
2) a serial number inside the dataset, to keep original order.
the code to do it:
data temp;
set have;
retain serial1 1;
if index(variable,'LOAD_TIME') > 0 then serial1 + 1;
serial2 = _N_;
run;
Step-2: sort by serial1 descending flag
Step-3: Retain the non missing flag (1st obs in the group) to assign when it is missing
Step-4: Sort by serial2, back to original order and drop useless variables.
Hello @lydiawawa
data have;
infile cards truncover;
input Name :$30. Flag ;
cards;
DASHBOARD_LOAD_TIME
DASHBOARD_LANG_ENTRY_IND
DASHBOARD_1_START_TIME 1
DASHBOARD_LANG_EXT_IND
DASHBOARD_CLOSE_BROWSER_TIME
DASHBOARD_LOAD_TIME
DASHBOARD_LANG_ENTRY_IND
DASHBOARD_2_START_TIME 2
DASHBOARD_LANG_EXT_IND
DASHBOARD_CLOSE_BROWSER_TIME
DASHBOARD_LOAD_TIME
DASHBOARD_LANG_ENTRY_IND
DASHBOARD_3_START_TIME 3
DASHBOARD_LANG_EXT_IND
DASHBOARD_CLOSE_BROWSER_TIME
;
data temp;
set have;
if name='DASHBOARD_LOAD_TIME' then grp+1;
n+1;
run;
proc sql;
create table want(drop=n) as
select name, grp,n, max(flag) as flag
from temp
group by grp
order by n;
quit;
The following code should return what you want IF your source data complies with the following requirements:
- A new group starts always with value "DASHBOARD_LOAD_TIME"
- Each group has a row with value "DASHBOARD_<digits>
_START_TIME
"
data have;
infile cards truncover;
input Name :$30.;
cards;
DASHBOARD_LOAD_TIME
DASHBOARD_LANG_ENTRY_IND
DASHBOARD_1_START_TIME
DASHBOARD_LANG_EXT_IND
DASHBOARD_CLOSE_BROWSER_TIME
DASHBOARD_LOAD_TIME
DASHBOARD_LANG_ENTRY_IND
DASHBOARD_2_START_TIME
DASHBOARD_LANG_EXT_IND
DASHBOARD_CLOSE_BROWSER_TIME
DASHBOARD_LOAD_TIME
DASHBOARD_LANG_ENTRY_IND
DASHBOARD_LANG_EXT_IND
DASHBOARD_CLOSE_BROWSER_TIME
DASHBOARD_3_START_TIME
;
run;
data want(drop=_:);
length id start_time_num 8;
retain start_time_num;
id=_n_;
set have;
if name='DASHBOARD_LOAD_TIME' then
do;
call missing(start_time_num);
do _point=_n_+1 to _nobs;
set have(keep=name rename=(name=_name)) point=_point nobs=_nobs;
if prxmatch('/DASHBOARD_\d+_START_TIME/oi',_name)>0 then
do;
start_time_num=input(scan(_name,2,'_'),?? best32.);
leave;
end;
end;
end;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.