Hi Friends, I want to subset my dataset when the type variable starts from Xin and ends with Xin or blank value. How should I write the code? Thanks.
data test;
input id visit type $ ;
datalines;
1 1 xin
1 2 .
1 3 xin
2 1 other
2 2 xin
2 3 .
2 4 other
3 1 .
3 2 xin
3 3 .
3 4 .
;
run;
data want;
input id visit type $ ;
datalines;
1 1 xin
1 2 .
1 3 xin
2 2 xin
2 3 .
3 2 xin
3 3 .
3 4 .
;
run;
Not certain I understand your request fully. You seem to indicate you want all observations between and including the obs where type="xin", but then for ID=2 you don't pick up observation 2 4 other, yet there is an opening xin (2 2 xin) and no closing xin.
data have ;
input id visit type $ ;
datalines;
1 1 xin
1 2 .
1 3 xin
2 1 other
2 2 xin
2 3 .
2 4 other
3 1 .
3 2 xin
3 3 .
3 4 .
;
run;
data want ;
drop flag ;
retain flag 0 ;
set have ;
by id ;
/* reset flag if this is the first occurrence of ID */
if first.id then
flag=0 ;
/* if type="xin" flip flag (0|1) and output */
if type="xin" then do ;
flag=mod(flag+1,2) ;
output ;
end ;
/* if type ne "xin" and flag=1 and output */
else if flag=1 then
output ;
run ;
@superbibi, I'm not sure that I fully understand your requirement "starts from Xin and ends with Xin".
I have assumed you only want a type of "xin" (lower case) or missing.
data test;
input id visit type : $ ;
datalines;
1 1 xin
1 2 .
1 3 xin
2 1 other
2 2 xin
2 3 .
2 4 other
3 1 .
3 2 xin
3 3 .
3 4 .
;
run;
data want;
set test;
where missing(type) or type eq 'xin';
run;
Kind regards,
Amir.
Not sure if I understood your requirement. But, for what it is worth -
data want;
array k[10] $ k1-k10;
do _n_=1 by 1 until(last.id);
set test;
by id;
k[_n_]=type;
end;
st=whichc('xin',of k1-k10);
do _n_=1 to _n_ ;
set test;
if visit >= st then do;
if type in ('xin','') then output;
end;
end;
drop k: st;
run;
Thank you very much for the help. It works well in this sample dataset, I will see if it goes the same in the full dataset.
Thank you again for the solution. I just realized there is another situation for id2, the "type" variable might have "other xin blank other blank " sequence and I do not want any records after the second "other" as shown in the new code. It means for id 2, visit before the first "xin" and all visits equal to or larger than 4 are supposed to be dropped. What is your suggestion? Thanks.
data test;
input id visit type $ ;
datalines;
1 1 xin
1 2 .
1 3 xin
2 1 other
2 2 xin
2 3 .
2 4 other
2 5 .
3 1 .
3 2 xin
3 3 .
3 4 .
;
run;
data want;
input id visit type $ ;
datalines;
1 1 xin
1 2 .
1 3 xin
2 2 xin
2 3 .
3 2 xin
3 3 .
3 4 .
;
run;
"
data test;
input id visit type $ ;
datalines;
1 1 xin
1 2 .
1 3 xin
2 1 other
2 2 xin
2 3 .
2 4 other
3 1 .
3 2 xin
3 3 .
3 4 .
;
run;
data temp;
set test;
by id;
n+1;
if type='xin' or missing(type);
run;
data temp1;
set temp;
by id;
if type='xin' or (last.id and missing(type));
run;
data temp2;
set temp1;
by id;
if first.id then seq=0;
seq+1;
if mod(seq,2)=1 then group+1;
run;
data temp3;
merge temp2 temp2(keep=group n rename=(group=_group n=_n) firstobs=2);
output;
if group=_group then do;
do i=n+1 to _n-1;
n=i;output;
end;
end;
keep n;
run;
data want;
if _n_=1 then do;
if 0 then set temp3;
declare hash h(dataset:'temp3');
h.definekey('n');
h.definedone();
end;
set test;
if h.check(key:_n_)=0;
drop n;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.