BookmarkSubscribeRSS Feed
wheddingsjr
Pyrite | Level 9

Hi all. As a new employee I inherited the following code, but for some reason it runs without errors but the only exclusion that works is the first one (Excluding age <18). The rest do not make the exclusions but the code still runs. Any suggestions would be greatly appreciated.

 

/** Excluding age <18 */

data data_1a;

set inpt_hosp_1;

if age <18 then delete;

run;

 

/**Exclude index admissions for patients who died during the admission (Patient

Discharge Status Code indicating �Expired� (20)):*/

 

data data_1b;

set data_1a;

if disch_stat_cd='20' then delete;

run;

 

/**Exclude index admissions for patients discharged Against Medical Advice (AMA)

(Patient Discharge Status Code indicating �Left Against Medical Advice� (07)):**/

 

 

data data_1c;

set data_1b;

if disch_stat_cd='7' then delete;

run;

 

/**Exclude index admissions for patients transferred to another acute care hospital during

the index admission (Patient Discharge Status Code indicating transfer to an acute care

facility (02, 05, 09, 30, 43, 66, 69)):**/

data data_1d;

set data_1c;

if disch_stat_cd in ('02', '05', '09', '30', '43', '66', '69') then delete;

run;

 

 

/** Exclude index admissions for patients discharged with a planned readmission (Patient

Discharge Status Code indicating discharge with a planned readmission (81-95)):**/

 

data data_1e;

set data_1d;

if disch_stat_cd in ('81','82','83','84','85','86','87','88','89','90','91','92','93','94','95') then delete;

run;
18 REPLIES 18
Patrick
Opal | Level 21

Check if the codes the logic excludes exist in the data at all.

proc freq data=inpt_hosp_1(where=(age <18));
  table disch_stat_cd;
run;

Another reason for the code not doing what you expect could be that variable disch_stat_cd is numeric and not character. If that's the case then you will find Notes in the SAS Log as below.

NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).

You can also execute below to get the list of variables and their types.

proc contents data=inpt_hosp_1;
run;quit;

And if variable disch_stat_cd in numeric then you need to remove the quotes around the numbers in your code for a numerical comparison.

if disch_stat_cd=20 then delete;
wheddingsjr
Pyrite | Level 9

Thanks Patrick, I definitely know that the codes exist in the data because I wrote this code so that I can see that the data exists.

proc sql;
create table Delete20 as
select distinct *
from data_1a
where disch_stat_cd= '20'
;quit;

I did not get the note:

NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).

I also executed the code you suggested to see if the variable was not numeric and it isn't.

Patrick
Opal | Level 21

If you execute below code exactly "as is" then the count in cnt_disch_stat_cd will contain the number of rows you will have less in data1_b as compared to data_1a.

Are you telling me that's not the case? If so then please prove it by posting the SAS execution log.

%let cnt_disch_stat_cd=0;
proc sql;
  select count(*) into :cnt_disch_stat_cd trimmed
  from data_1a
  where disch_stat_cd= '20'
  ;
quit;
%put &=cnt_disch_stat_cd;

data data_1b;
  set data_1a;
  if disch_stat_cd='20' then delete;
run;

Whitespace characters like leading blanks or control characters could lead to the comparison never being true - but that would apply for both the SQL Where clause and the data step IF condition.

If variable cnt_disch_stat_cd should only contain digits then something like below would remove any other non-digit character.

cnt_disch_stat_cd=compress(cnt_disch_stat_cd,,'kd');
wheddingsjr
Pyrite | Level 9

Patrick, let me tell you what I am looking for as an end result. I am trying to get the "Den" number based on the member_id. So for 1A through 1Fif those codes exist they should be deleted so that they don't end up in the "DEN" count, but as you can see from my code, I check to see if those codes in every step are exists. Then I run a count of each of those steps. As you can see from the log, the lines are reducing, which means those codes are being deleted. Therefore, when I run the 

 

data den_final;
set data_1f;
by mrn;
gap = dif(admit_dt); 
if first.mrn then gap =.;
IF ADMIT_DT NE .  AND gap LE 0 THEN do;
Logic_dt = FIRST_SRV_dt;
ID = cats(mrn, logic_dt); /* as encounter */

end ;
run;
proc sql;
create table den as select
count(distinct member_id) as pt_inpt
, count(distinct int_clm_num) as clm_inpt
from den_final
;quit;

 I should get a certain number because there ARE deletions. But if I replace 1F in the above I get the same number when I use 1B, or 1C, etc., which lets me know that for whatever reasons, those deletions are not being deducted from the total. I hope that makes sense. I attached the log because its rather long.

qatman28
Obsidian | Level 7
It looks like your final data set is meant to be one record per patient. When you are subsetting your records, is "disch_stat_cd" the same for all related records for a person? If not, then you are only removing some of the records, so for example if someone has disch_stat_cd=20 for the first record and =missing for a different record, then the subsequent record would still be kept and the patient would still end up in the last file.
wheddingsjr
Pyrite | Level 9

Thanks for the response Qatman28, when I run this code:

proc sql;
create table Delete20 as 
select distinct *
from data_1a
where disch_stat_cd='20'
;quit;
proc sql;
create table Delete20_CNT as select
count(distinct member_id) as pt_inpt
, count(distinct int_clm_num) as clm_inpt
from Delete20
;quit;
data data_1b;
set data_1a;
if ' then delete;
run;

The first thing I want to do is get a listing of all the disch_stat_cd='20 lines. I have that list and there are 20 lines, all the same member id. I then do a count of the member id, which should be 1since all the lines are the same member id. So I know there should be at least 1 deletion in the next dataset when it runs. But for some reason, that number/line is still included in my final number. 

qatman28
Obsidian | Level 7

Maybe try

if compress(disch_stat_cd) = '20';

etc. If the original has any spaces or other strange characters, then the comparisons might not work properly. Very often I find that e.g. ='7' does not work since the actual value is '7 ' with a space.

 

I rarely use PROC SQL so I won't comment on that part. But I would troubleshoot first by checking the data_1f set:

 

data data_1f check;
set inpt_hosp_1;
if age < 18 then flag = 1;
else select (disch_stat_cd);
   when (‘20’) flag = 2;
   when (‘7’) flag = 3;
   when (‘02’,’05’,’09’,’30’,’43’,’66’,’69’) flag = 4;
   when ('81','82','83','84','85','86','87','88','89','90','91','92','93','94','95') flag = 5;
   otherwise flag = 0;
end;
output check;
if flag = 0 then output data_1f;
run;

The number of records should be the same as what you are getting in data_1f.  If not then there is an issue likely with how you are setting the flags.  If it is the same, then there is something going on with your summary data step.  

wheddingsjr
Pyrite | Level 9

Thanks again Qatman28, the numbers are not the same. The 1F number should be what the the check number is which includes the deletions. Now I just have to figure out why those lines are not deleting even though it says it is in the log.

wheddingsjr
Pyrite | Level 9
I misspoke, the 1F data is the correct amount. The check number is the total amount before the deletions happen. So if 1F is correct, meaning it deleted the lines, why does the final tally come out as if they weren't deleted? So its the issue is the way in which the flags were set up
wheddingsjr
Pyrite | Level 9

By the way, all the flags in the check run are 0

qatman28
Obsidian | Level 7
Yes all of the people in data_1f will have flag=0. If the people in check all have flag=0 then there is something else going on - either the variable names or formats are off, or these records were already removed from the data so there is nothing left to exclude (and therefore the number of records would be the same at each step). I would also check to make sure you haven't overwritten your original data set while testing.

If this is still not working then it would be useful to see a PROC CONTENTS, PROC FREQ of age and disch_stat_cd, and PROC PRINT of a small set of records (variables of interest here only) to parse what is going on.
wheddingsjr
Pyrite | Level 9

Thanks Qatman. I was wrong again, all of the 1A (<18 step) had the correct flag (1). That dataset works, its 1B through 1F that seems to have the issues. And you are correct, after 1A the number of records stay the same. Very weird. I will try your recommendations to see what I get. You have been a great deal of help and I truly appreciate all your efforts.

Patrick
Opal | Level 21

As far as I understand you've got multiple observations per member in your source data. Your selection logic then removes all rows where you find a certain code.

If this code for a member is not always the same then you won't remove all member records this way.

 

Have a look at below. Only the data step with the "alternate logic" will remove all member records for members with a code in you exclusion list. 

Which logic do you need?

/* sample data */
data data_1a;
  do member_id=1,2;
    disch_stat_cd='20';
    int_clm_num+1;
    output;
  end;

  member_id=2;
  disch_stat_cd='99';
  int_clm_num+1;
  output;
  stop;
run;

/* current logic only deleting all rows with disch_stat_cd='20' */
proc sql;
  create table Delete20 as 
  select distinct *
  from data_1a
  where disch_stat_cd='20'
  ;
quit;

proc sql;
  create table Delete20_CNT as select
    count(distinct member_id) as pt_inpt
    , count(distinct int_clm_num) as clm_inpt
  from Delete20
  ;
quit;

data data_1b;
  set data_1a;
  if disch_stat_cd='20' then delete;
run;

/* alternate logic deleting all rows belonging to a member
   if at least one row for this member has disch_stat_cd='20' */
data data_1b;
  if _n_=1 then
    do;
      dcl hash h1(dataset:"data_1a(where=(disch_stat_cd='20'))");
      h1.defineKey('member_id');
      h1.defineDone();
    end;
  set data_1a;
  if h1.check() = 0 then delete;
run;
wheddingsjr
Pyrite | Level 9

Hi Patrick

 

Thanks for responding. In the current logic, the first 2 steps of each exclusion are informational only and I don't expect to lose any records. That's just so that I can see the members in that particular exclusion and to get a count of the distinctive members so that I know what to expect to be removed. The third step is what's supposed to do the exclusion and it does. But when I run the summary, its as if its ignoring the exclusions that have already happened. Example, the Inpt_Hosp_1 table has 4,470 records. After step 1A where it excludes those < 18 I have 4,055 records. I can see that all the exclusions are working because the last table, 1F only has 4,002 records, just the ways its supposed to after the exclusions. However, when I run my summary based on the 4,002 records in table 1F, the summary seems to be using the records from table 1A with the 4,055 records instead of the 1F with 4,002.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 18 replies
  • 1776 views
  • 0 likes
  • 3 in conversation