Hi guys,
I'm trying to create the indicator variable N_CNT which indicates the number of distinct levels that seq_date had occurred. Hope image below helps. The last column framed in red is what I need. The code creates a mock data and how I failed.
I'll appreciate your time and correcting my code for me, if possible?
Thank you in advance.
DATA HAVE;
INPUT PAT_ID DATE_DIAGNOSIS SER_DATE_DX CODE WGT_DX DATE_DIFF;
CARDS;
1 17007 17006 197 3 1
1 17007 17006 153 1 1
1 17007 17006 197 3 1
1 17007 17013 197 3 6
1 17007 17013 153 1 6
1 17007 17028 197 3 21
1 17007 17028 153 1 21
1 17007 17028 197 3 21
1 17007 17031 153 1 24
1 17007 17032 153 1 25
1 17007 17045 153 1 38
2 18216 18216 153 1 0
2 18216 18216 153 1 0
2 18216 18216 153 1 0
2 18216 18216 153 1 0
2 18216 18217 153 1 1
2 18216 18217 211 6 1
2 18216 18217 153 1 1
2 18216 18217 211 6 1
2 18216 18217 153 1 1
2 18216 18219 235 3 3
2 18216 18219 235 3 3
2 18216 18219 153 1 3
2 18216 18220 153 1 4
2 18216 18223 153 1 7
2 18216 18224 153 1 8
2 18216 18225 153 1 9
2 18216 18226 153 1 10
2 18216 18241 154 1 25
2 18216 18242 154 1 26
;
PROC SORT DATA=HAVE;
BY PAT_ID SER_DATE_DX;
RUN;
DATA HAVE1; SET HAVE;
BY PAT_ID SER_DATE_DX;
IF FIRST.PAT_ID THEN SEQ_DATE=1; ELSE IF FIRST.SER_DATE_DX THEN SEQ_DATE+1;
N_CNT=LAST.SEQ_DATE;
RUN;
DATA HAVE2; SET HAVE1;
BY PAT_ID SER_DATE_DX;
IF LAST.PAT_ID AND LAST.SER_DATE_DX THEN N_CNT=LAST.SEQ_DATE;
RUN;
This gives you what you want. Changed a little in your code and replaced the last data step with an SQL Procedure.
data have;
input pat_id date_diagnosis ser_date_dx code wgt_dx date_diff;
cards;
1 17007 17006 197 3 1
1 17007 17006 153 1 1
1 17007 17006 197 3 1
1 17007 17013 197 3 6
1 17007 17013 153 1 6
1 17007 17028 197 3 21
1 17007 17028 153 1 21
1 17007 17028 197 3 21
1 17007 17031 153 1 24
1 17007 17032 153 1 25
1 17007 17045 153 1 38
2 18216 18216 153 1 0
2 18216 18216 153 1 0
2 18216 18216 153 1 0
2 18216 18216 153 1 0
2 18216 18217 153 1 1
2 18216 18217 211 6 1
2 18216 18217 153 1 1
2 18216 18217 211 6 1
2 18216 18217 153 1 1
2 18216 18219 235 3 3
2 18216 18219 235 3 3
2 18216 18219 153 1 3
2 18216 18220 153 1 4
2 18216 18223 153 1 7
2 18216 18224 153 1 8
2 18216 18225 153 1 9
2 18216 18226 153 1 10
2 18216 18241 154 1 25
2 18216 18242 154 1 26
;
proc sort data=have;
by pat_id ser_date_dx;
run;
data have1; set have;
by pat_id ser_date_dx;
if first.pat_id then seq_date=1;
else if first.ser_date_dx then seq_date+1;
run;
proc sql;
create table want as
select *,
count(distinct seq_date) as N_cnt
from have1
group by pat_id;
quit;
This gives you what you want. Changed a little in your code and replaced the last data step with an SQL Procedure.
data have;
input pat_id date_diagnosis ser_date_dx code wgt_dx date_diff;
cards;
1 17007 17006 197 3 1
1 17007 17006 153 1 1
1 17007 17006 197 3 1
1 17007 17013 197 3 6
1 17007 17013 153 1 6
1 17007 17028 197 3 21
1 17007 17028 153 1 21
1 17007 17028 197 3 21
1 17007 17031 153 1 24
1 17007 17032 153 1 25
1 17007 17045 153 1 38
2 18216 18216 153 1 0
2 18216 18216 153 1 0
2 18216 18216 153 1 0
2 18216 18216 153 1 0
2 18216 18217 153 1 1
2 18216 18217 211 6 1
2 18216 18217 153 1 1
2 18216 18217 211 6 1
2 18216 18217 153 1 1
2 18216 18219 235 3 3
2 18216 18219 235 3 3
2 18216 18219 153 1 3
2 18216 18220 153 1 4
2 18216 18223 153 1 7
2 18216 18224 153 1 8
2 18216 18225 153 1 9
2 18216 18226 153 1 10
2 18216 18241 154 1 25
2 18216 18242 154 1 26
;
proc sort data=have;
by pat_id ser_date_dx;
run;
data have1; set have;
by pat_id ser_date_dx;
if first.pat_id then seq_date=1;
else if first.ser_date_dx then seq_date+1;
run;
proc sql;
create table want as
select *,
count(distinct seq_date) as N_cnt
from have1
group by pat_id;
quit;
DATA HAVE;
INPUT PAT_ID DATE_DIAGNOSIS SER_DATE_DX CODE WGT_DX DATE_DIFF;
CARDS;
1 17007 17006 197 3 1
1 17007 17006 153 1 1
1 17007 17006 197 3 1
1 17007 17013 197 3 6
1 17007 17013 153 1 6
1 17007 17028 197 3 21
1 17007 17028 153 1 21
1 17007 17028 197 3 21
1 17007 17031 153 1 24
1 17007 17032 153 1 25
1 17007 17045 153 1 38
2 18216 18216 153 1 0
2 18216 18216 153 1 0
2 18216 18216 153 1 0
2 18216 18216 153 1 0
2 18216 18217 153 1 1
2 18216 18217 211 6 1
2 18216 18217 153 1 1
2 18216 18217 211 6 1
2 18216 18217 153 1 1
2 18216 18219 235 3 3
2 18216 18219 235 3 3
2 18216 18219 153 1 3
2 18216 18220 153 1 4
2 18216 18223 153 1 7
2 18216 18224 153 1 8
2 18216 18225 153 1 9
2 18216 18226 153 1 10
2 18216 18241 154 1 25
2 18216 18242 154 1 26
;
data want;
do until(last.pat_id);
set have;
by pat_id ser_date_dx;
if first.ser_date_dx then seq_date=sum(seq_date,1);
end;
ncount=seq_date;
seq_date=0;
do until(last.pat_id);
set have ;
by pat_id ser_date_dx;
if first.ser_date_dx then seq_date=sum(seq_date,1);
output;
end;
run;
Great solution but less intuitive with the level of my SAS proficiency. what is this approach? hashtag? sorry for being clueless
Basically uses a construct called DOW loop. DOW stands for DO Loop and Whitlock honoring the greatest SAS user Ian Whitlock ever to have used SAS. Ian showed the use of such loops for various scenarios. All you need to do is google "SAS DOW loop"
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.