data inter (keep=id diag);
set have;
array diags {*} dx1-dx10;
diag = 0;
do i = 1 to dim(diags);
if diags{i} in (72632,72631,71922 71923) then diag = 1;
end;
run;
proc sort data=inter;
by id;
run;
data result (keep=id hasdiag visitcount) totalcount (keep=total_patients);
set inter end=done;
retain total_patients 0 visitcount hasdiag;
if first.id
then do;
visitcount = 0;
hasdiag = 0;
end;
if diag then do;
visitcount + 1;
hasdiag = 1;
end;
if last.id then do;
output result;
total_patients + hasdiag;
end;
if done then output totalcount;
run;
If your initial dataset is already sorted by id, you can omit the sort step.
Some edits done for typos.
I tried to use your code, but received the following error:
244 data le.leDxTest2 (keep=id hasdiag visitcount diag YEAR eupide DISCHNO ADMDT DISDT ADMWK
244! DISCHWK ADMHR DISHR AGE1 AGE DOB SEX ETHNIC Insurance RACE_M RACE DISP REIMB HOSPCO
244! FACNAME PATST PATCO PATCITY RESIND
245 SDDISCH HSA REG ZIP OPER OPTIME1 OPTIME TOTCHG TOTANCHG TOTCHG1 TOTANCHG1
245! SEQ CNT DX01 DX02 DX03 DX04 DX05 DX06 DX07 DX08 DX09 DX10 HCPCCD1 HCPCCD2 HCPCCD3
246 HCPCCD4 HCPCCD5 HCPCCD6 HCPCCD7 HCPCM1 HCPCM2 HCPCM3 HCPCM4 HCPCM5 HCPCM6
246! HCPCM7 HCPC2M1 HCPC2M2 HCPC2M3 HCPC2M4 HCPC2M5 HCPC2M6 HCPC2M7 DISCHNO1 HCPCCD8
247 HCPCCD9 HCPCCD10 HCPCCD11 HCPCCD12 HCPCCD13 HCPCCD14 HCPCCD15 HCPCCD16
247! HCPCCD17 HCPCCD18 HCPCM8 HCPCM9 HCPCM10 HCPCM11 HCPCM12 HCPCM13 HCPCM14 HCPCM15 HCPCM16
248 HCPCM17 HCPCM18 HCPC2M8 HCPC2M9 HCPC2M10 HCPC2M11 HCPC2M12 HCPC2M13
248! HCPC2M14 HCPC2M15 HCPC2M16 HCPC2M17 HCPC2M18 Side Discharge Died CPI visit) totalcount
248! (keep=total_patients);
249 set le.leDxTest1 end=done;
250 retain total_patients 0 visitcount hasdiag;
251 if first.eupide
252 then do;
253 visitcount = 0;
254 hasdiag = 0;
255 end;
256 if diag then do;
257 visitcount + 1;
258 hasdiag = 1;
259 end;
260 if last.eupide then do;
261 output result;
------
455
ERROR 455-185: Data set was not specified on the DATA statement.
262 total_patients + hasdiag;
263 end;
264 if done then output totalcount;
265 run;
WARNING: The variable id in the DROP, KEEP, or RENAME list has never been referenced.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set LE.LEDXTEST2 may be incomplete. When this step was stopped there were 0
observations and 111 variables.
WARNING: Data set LE.LEDXTEST2 was not replaced because this step was stopped.
WARNING: The data set WORK.TOTALCOUNT may be incomplete. When this step was stopped there
were 0 observations and 1 variables.
WARNING: Data set WORK.TOTALCOUNT was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.09 seconds
cpu time 0.01 seconds
Do you know why this may be a problem?
Thank you!
on log line 244 your code is:
data le.leDxTest2 (keep=id ...
The ERROR message is very clear:
261 output result; <<<<< should be output le.leDxTest2 <<<<<<<
------
455
ERROR 455-185: Data set was not specified on the DATA statement
after log line 742 you got the notes:
NOTE: Variable first.eupide is uninitialized.
NOTE: Variable last.eupide is uninitialized.
on order to check first/last occurence you need a BY statement.
your code should be:
data le.LeDxKB1 (keep = ....);
set le.leDxKB end= done;
by eupide;
.... your code ...
run;
That fixed the problem. Thank you @Shmuel!
And keep in mind that retain has no effect on variables contained in the input dataset(s), as these will always be set in the PDV when an observation is read. So you might want to streamline your code there.
And you have a lot of variables present in all of the long lists. Putting them into a macro variable will also help to make the code more readable.
Never underestimate the benefits of proper code formatting. I've seen lots of semantic errors in code here that would have stood out like a beacon if proper indentation had made the blocks easy to determine visually.
Thank you @Kurt_Bremser! I will try to re-work my code to make it more ideal -- still learning at this point. Do you have any good sources on how to create macros and how to line everything up to see typos easily?
Hi David, I highly recommend Michelle Burlew's book on SAS Macro Programming Made Easy, as well as Art Carpenter's Complete Guide to SAS Macro Language (I'd start off with Burlew's, Carpenter's is a massive book that is very intimidating). As far as formatting, if you're using SAS Studio or SAS University Edition, there's a formatting button that will do it for you. If you're using EG or Base SAS, my general rule of thumb is (mostly for SQL, but applies to other PROCs as well):
1) When listing variables, no more than 5 per line;
2) If i'm doing conditional logic, sums / averages, I put those on separate lines;
3) A new line after every semi-colon;
4) Lots and lots of comments (what i'm doing, options i've used, changes I've made, etc.).
So for me, this is an example of "good" code:
PROC SQL;
select var_A, var_B, var_C, var_D,
max(var_E) as E,
sum(var_F) as F
from work.test
group by var_A, var_B, var_C, var_D
order by var_A, var_B, var_C, var_D;
quit;
Sure, it'll work if you put everything on one line, but maintenance, finding variables, and as you've seen, debugging become a nightmare. I highly recommend going to www.lexjansen.com (repository of all papers from SAS Regional Conferences) and searching for papers on topics you're interested in; you'll very quickly see different formatting rules used, but they all have these basic concepts in common. Good luck Chris
Hi @Kurt_Bremser.
I am trying to learn by following your code. Could you tell me about total_patients? There was no mention of the variable earlier in the code... Does total create a command implicitly, just like sum?
Thanks!
@DMMD wrote:
Hi @Kurt_Bremser.
I am trying to learn by following your code. Could you tell me about total_patients? There was no mention of the variable earlier in the code... Does total create a command implicitly, just like sum?
Thanks!
total_patients is RETAINed and, in the same statement, initialized to 0 when the data step starts executing. The naming has no effect, it is all explicitly coded.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.