Hi All,
I have a dataset that carries missing by terms containing _LOAD_TIME, LANG_ENTRY_TIME, LANG_EXIT_TIME and CLOSE_BROWSER_TIME. The flag variable is created from the X value extracted from DASHBOARD_X_START_TIME.
This can be achieved by using:
flag = strip(compress(strip(name),,'kd'));
Which I have done so already. Now I need to populate the missing cells by using the later created none-missing flag value, and I wish to maintain the original order.
The dataset with missings:
Name | Flag |
DASHBOARD_LOAD_TIME | |
DASHBOARD_LANG_ENTRY_IND | |
DASHBOARD_1_START_TIME | 1 |
DASHBOARD_LANG_EXT_IND | |
DASHBOARD_CLOSE_BROWSER_TIME | |
DASHBOARD_LOAD_TIME | |
DASHBOARD_LANG_ENTRY_IND | |
DASHBOARD_2_START_TIME | 2 |
DASHBOARD_LANG_EXT_IND | |
DASHBOARD_CLOSE_BROWSER_TIME | |
DASHBOARD_LOAD_TIME | |
DASHBOARD_LANG_ENTRY_IND | |
DASHBOARD_3_START_TIME | 3 |
DASHBOARD_LANG_EXT_IND | |
DASHBOARD_CLOSE_BROWSER_TIME |
The desired outcome:
Name | Flag |
DASHBOARD_LOAD_TIME | 1 |
DASHBOARD_LANG_ENTRY_IND | 1 |
DASHBOARD_1_START_TIME | 1 |
DASHBOARD_LANG_EXT_IND | 1 |
DASHBOARD_CLOSE_BROWSER_TIME | 1 |
DASHBOARD_LOAD_TIME | 2 |
DASHBOARD_LANG_ENTRY_IND | 2 |
DASHBOARD_2_START_TIME | 2 |
DASHBOARD_LANG_EXT_IND | 2 |
DASHBOARD_CLOSE_BROWSER_TIME | 2 |
DASHBOARD_LOAD_TIME | 3 |
DASHBOARD_LANG_ENTRY_IND | 3 |
DASHBOARD_3_START_TIME | 3 |
DASHBOARD_LANG_EXT_IND | 3 |
DASHBOARD_CLOSE_BROWSER_TIME | 3 |
Thank you for any inputs!
Hello @lydiawawa
data have;
infile cards truncover;
input Name :$30. Flag ;
cards;
DASHBOARD_LOAD_TIME
DASHBOARD_LANG_ENTRY_IND
DASHBOARD_1_START_TIME 1
DASHBOARD_LANG_EXT_IND
DASHBOARD_CLOSE_BROWSER_TIME
DASHBOARD_LOAD_TIME
DASHBOARD_LANG_ENTRY_IND
DASHBOARD_2_START_TIME 2
DASHBOARD_LANG_EXT_IND
DASHBOARD_CLOSE_BROWSER_TIME
DASHBOARD_LOAD_TIME
DASHBOARD_LANG_ENTRY_IND
DASHBOARD_3_START_TIME 3
DASHBOARD_LANG_EXT_IND
DASHBOARD_CLOSE_BROWSER_TIME
;
data temp;
set have;
if name='DASHBOARD_LOAD_TIME' then grp+1;
n+1;
run;
proc sql;
create table want(drop=n) as
select name, grp,n, max(flag) as flag
from temp
group by grp
order by n;
quit;
But I think the challenge is how to populate the missing so that they take the middle none-missing flag. After filling in the missing we will be able to use flag to indicate that those are in a group. Even though the original structure has already shaped that way, having a flag variable is to help us to detect the actual duplicates, and to avoid treating none-dups as dups. (unduplication is irrelevant to this post, I just need to create the flag variable)
@lydiawawa wrote:
But I think the challenge is how to populate the missing so that they take the middle none-missing flag. After filling in the missing we will be able to use flag to indicate that those are in a group. Even though the original structure has already shaped that way, having a flag variable is to help us to detect the actual duplicates, and to avoid treating none-dups as dups. (unduplication is irrelevant to this post, I just need to create the flag variable)
But SAS can't know that or that those observations belong to a group so we need to make it easier to do that by adding a grouping variable. Then order won't matter because you can always resort.
Step-1: I would add two serial numbers for sorting:
1) a serial number per group:
1 DASHBOARD_LOAD_TIME 1 DASHBOARD_LANG_ENTRY_IND 1 DASHBOARD_1_START_TIME 1 DASHBOARD_LANG_EXT_IND 1 DASHBOARD_CLOSE_BROWSER_TIME
2 DASHBOARD_LOAD_TIME
2 DASHBOARD_LANG_ENTRY_IND
2 DASHBOARD_1_START_TIME
2 DASHBOARD_LANG_EXT_IND
2 DASHBOARD_CLOSE_BROWSER_TIME
2) a serial number inside the dataset, to keep original order.
the code to do it:
data temp;
set have;
retain serial1 1;
if index(variable,'LOAD_TIME') > 0 then serial1 + 1;
serial2 = _N_;
run;
Step-2: sort by serial1 descending flag
Step-3: Retain the non missing flag (1st obs in the group) to assign when it is missing
Step-4: Sort by serial2, back to original order and drop useless variables.
Hello @lydiawawa
data have;
infile cards truncover;
input Name :$30. Flag ;
cards;
DASHBOARD_LOAD_TIME
DASHBOARD_LANG_ENTRY_IND
DASHBOARD_1_START_TIME 1
DASHBOARD_LANG_EXT_IND
DASHBOARD_CLOSE_BROWSER_TIME
DASHBOARD_LOAD_TIME
DASHBOARD_LANG_ENTRY_IND
DASHBOARD_2_START_TIME 2
DASHBOARD_LANG_EXT_IND
DASHBOARD_CLOSE_BROWSER_TIME
DASHBOARD_LOAD_TIME
DASHBOARD_LANG_ENTRY_IND
DASHBOARD_3_START_TIME 3
DASHBOARD_LANG_EXT_IND
DASHBOARD_CLOSE_BROWSER_TIME
;
data temp;
set have;
if name='DASHBOARD_LOAD_TIME' then grp+1;
n+1;
run;
proc sql;
create table want(drop=n) as
select name, grp,n, max(flag) as flag
from temp
group by grp
order by n;
quit;
The following code should return what you want IF your source data complies with the following requirements:
- A new group starts always with value "DASHBOARD_LOAD_TIME"
- Each group has a row with value "DASHBOARD_<digits>
_START_TIME
"
data have;
infile cards truncover;
input Name :$30.;
cards;
DASHBOARD_LOAD_TIME
DASHBOARD_LANG_ENTRY_IND
DASHBOARD_1_START_TIME
DASHBOARD_LANG_EXT_IND
DASHBOARD_CLOSE_BROWSER_TIME
DASHBOARD_LOAD_TIME
DASHBOARD_LANG_ENTRY_IND
DASHBOARD_2_START_TIME
DASHBOARD_LANG_EXT_IND
DASHBOARD_CLOSE_BROWSER_TIME
DASHBOARD_LOAD_TIME
DASHBOARD_LANG_ENTRY_IND
DASHBOARD_LANG_EXT_IND
DASHBOARD_CLOSE_BROWSER_TIME
DASHBOARD_3_START_TIME
;
run;
data want(drop=_:);
length id start_time_num 8;
retain start_time_num;
id=_n_;
set have;
if name='DASHBOARD_LOAD_TIME' then
do;
call missing(start_time_num);
do _point=_n_+1 to _nobs;
set have(keep=name rename=(name=_name)) point=_point nobs=_nobs;
if prxmatch('/DASHBOARD_\d+_START_TIME/oi',_name)>0 then
do;
start_time_num=input(scan(_name,2,'_'),?? best32.);
leave;
end;
end;
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.