BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
chuakp
Obsidian | Level 7

Hi jwillis - I made the changes (code below) and get the same error.  The value of some diagnosis fields is "E9949" and that makes SAS unhappy - I get "NOTE: Invalid numeric data, 'E9' , at line 989 column 8" a bunch of times.  It doesn't prevent the datasets from outputting.

proc format;

value $ dxcodes

'78060' = 1

'7784' = 2

'4659' = 3

'07999' = 4

'46611' = 5

'77989' = 6

'5990' = 7

'53081' = 8

'27651' = 9

'77189' = 10

'46619' = 11

'79982' = 12

'77931' = 13

'7746' = 14

'V290' = 15

'77182' = 16

'7755' = 17

'V070' = 18

'6910' = 19

'78791' = 20;

other = 99;

run;

data want1 want2 want3 want4 want5 want6 want7 want8 want9 want10 want11 want12 want13 want14 want15 want16 want17 want18 want19 want20;

set main;

array dx $ dx_code1-dx_code41;

flag=0;

do i=1 to dim(dx);

  if put(dx(i), $dxcodes.)= 1 then output want1;

  if put(dx(i), $dxcodes.)= 2 then output want2;

  if put(dx(i), $dxcodes.)= 3 then output want3;

  if put(dx(i), $dxcodes.)= 4 then output want4;

  if put(dx(i), $dxcodes.)= 5 then output want5;

  if put(dx(i), $dxcodes.)= 6 then output want6;

  if put(dx(i), $dxcodes.)= 7 then output want7;

  if put(dx(i), $dxcodes.)= 8 then output want8;

  if put(dx(i), $dxcodes.)= 9 then output want9;

  if put(dx(i), $dxcodes.)= 10 then output want10;

  if put(dx(i), $dxcodes.)= 11 then output want11;

  if put(dx(i), $dxcodes.)= 12 then output want12;

  if put(dx(i), $dxcodes.)= 13 then output want13;

  if put(dx(i), $dxcodes.)= 14 then output want14;

  if put(dx(i), $dxcodes.)= 15 then output want15;

  if put(dx(i), $dxcodes.)= 16 then output want16;

  if put(dx(i), $dxcodes.)= 17 then output want17;

  if put(dx(i), $dxcodes.)= 18 then output want18;

  if put(dx(i), $dxcodes.)= 19 then output want19;

  if put(dx(i), $dxcodes.)= 20 then output want20;

  end;

run;

jwillis
Quartz | Level 8

Smiley Happy  My programming instructor always told me "It's your data, not your code".  Smiley Happy

When I read DX codes from an EXCEL spreadsheet I would frequently receive errors where "E" DX codes followed by an number were interpreted by EXCEL as numbers in scientific notation. First, Please check your data source for any scientific notation. Second, Please make the format values as character numbers '1', '2','3','4'.   Lastly, Please change the if statements to place quotes around the numbers. Let me know what happens then.  Once the E codes are processing, we can address which rows go to which output file and how many rows you actually want written.  One of the threads suggest to me that you only want one row for a DX code written to an output table as opposed to my suggestion that writes every DX match row to and output table.

proc format;

value $ dxcodes

'78060' = '1'

'7784'  = '2'

'4659'  = '3'

'07999' = '4'

'46611' = '5'

'77989' = '6'

'5990' = '7'

'53081' = '8'

'27651' = '9'

'77189' = '10'

'46619' = '11'

'79982' = '12'

'77931' = '13'

'7746' = '14'

'V290' = '15'

'77182' = 16

'7755' = '17'

'V070' = '18'

'6910' = '19'

'78791' = '20'

other = '99';

run;

  if put(dx(i), $dxcodes.)= '1' then output want1;

  if put(dx(i), $dxcodes.)= '2' then output want2;

  if put(dx(i), $dxcodes.)= '3' then output want3;

  if put(dx(i), $dxcodes.)= '4' then output want4;

  if put(dx(i), $dxcodes.)= '5' then output want5;

  if put(dx(i), $dxcodes.)= '6' then output want6;

  if put(dx(i), $dxcodes.)= '7' then output want7;

  if put(dx(i), $dxcodes.)= '8' then output want8;

  if put(dx(i), $dxcodes.)= '9' then output want9;

  if put(dx(i), $dxcodes.)= '10' then output want10;

  if put(dx(i), $dxcodes.)= '11' then output want11;

  if put(dx(i), $dxcodes.)= '12' then output want12;

  if put(dx(i), $dxcodes.)= '13' then output want13;

  if put(dx(i), $dxcodes.)= '14' then output want14;

  if put(dx(i), $dxcodes.)= '15' then output want15;

  if put(dx(i), $dxcodes.)= '16' then output want16;

  if put(dx(i), $dxcodes.)= '17' then output want17;

  if put(dx(i), $dxcodes.)= '18' then output want18;

  if put(dx(i), $dxcodes.)= '19' then output want19;

  if put(dx(i), $dxcodes.)= '20' then output want20;

chuakp
Obsidian | Level 7

Hi, making that change got rid of the error messages.  To clarify, I want dataset "want1" to contain unique observations for which at least one of the 41 diagnosis codes was "78060."  [There really shouldn't be more than one diagnosis code that contains this code, but there appear to be some data quality issues, leading to a situation in which this is in fact the case for some observations]. 

jwillis
Quartz | Level 8

Hi chuakp;

I've attached some untested code based on my memory of something I desgned years ago.  The code is rudimentary and has many opportunities for typo errors on my part and improvement and enhancement on your part.  I would make this concept code work before you tried to finesse it out.  This concept code should allow you room to expand it if you need more than 20 output datasets or if you need to read more than 41 diagnosis fields.  I see a good opportunity to convert the code to a macro should the code work.

chuakp
Obsidian | Level 7

Thanks, this code does work.  I will definitely use a macro to make it more efficient. 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, you could try the below, one thing I would check is what should happen if the code appears in more than one variable, say col1 and col2 both have 46611, should that row output twice or once?  The below outputs twice, however if you want it only once then change the if with the embedded output, to a select/when as KurtBremser has suggested.  The main perk with the below code is that it generates the code based on the data, so if you have lots of codes in a dataset, and lots of variables, the code doesn't need to change you can use that to generate the code (in this example I put the list in the do loop, however you could as easy have it in a dataset and put that I as a set).

data test;
  attrib code1 code2 code3 code4 format=$20.;
  code1='7784'; code4='46611'; output;
  code2='7784'; code4='46611'; output;
  code1='7784'; code2='5990'; output;
run;

proc sql;
  select  distinct NAME
  into    :VAR_LIST separated by '","'
  from    SASHELP.VCOLUMN
  where   LIBNAME="WORK"
      and MEMNAME="TEST";
quit;

data temp;
  do i= '78060','7784','4659','07999','46611','77989','5990';
    call execute('data code_'||strip(i)||';
                    set work.test;');
    do j="&VAR_LIST.";
      call execute('if '||strip(j)||'="'||strip(i)||'" then output;');
    end;
    call execute('run;');
  end;
run;

chuakp
Obsidian | Level 7

Hi RW9, diagnosis codes should not appear in more than one of the 41 diagnosis code fields.  Even if they did, I would only want the row to output once. Still, Kurt's code and my code are giving slightly different numbers.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Just change the section:

data temp;
  do i= '78060','7784','4659','07999','46611','77989','5990';
    call execute('data code_'||strip(i)||';
                    set work.test;');
    do j="&VAR_LIST.";
      call execute('if '||strip(j)||'="'||strip(i)||'" then output;');
    end;
    call execute('run;');
  end;
run;

To something like:

data temp;
  do i= '78060','7784','4659','07999','46611','77989','5990';
    call execute('data code_'||strip(i)||';
                    set work.test;

                         select('||strip(j)');
    do j="&VAR_LIST.";
      call execute('when '||strip(i) then mark=1;');
    end;
    call execute('end;where mark=1;run;');
  end;
run;

Kurt_Bremser
Super User

How about this:

data

  diags_78060

  .....

  diags_78791

;

set patients;

array a_dx dx_code1-dx_code41;

do i = 1 to dim(a_dx);

  select (a_dx(i));

    when ('78060') output diags_78060;

    .....

    when ('78791') output diags_78791;

    otherwise;

  end;

end;

run;

chuakp
Obsidian | Level 7

Hi Kurt, the code you provided above gives me fewer observations for each output dataset than the following code that I have been using: 

data diag1; set main;

array dx dx_code1-dx_code41;

flag=0;

do i=1 to dim(dx);

  if dx(i) in: ('78060') then flag=1;

end;

if flag>0 then output;

drop i flag;

run;

I ended up putting the above code into a macro and typing 20 macro lines, changing the output dataset and the diagnosis code so that I get SAS to search the 41 diagnosis code fields for the first diagnosis code and output it to data set 1, then search the 41 diagnosis code fields for the second diagnoss code and output it to data set 2.  It seems inefficient but I guess it will work.

Reeza
Super User

Are you familiar with BY processing?

You usually don't need to separate the diagnosis into separate files, unless its for final output.

Your code uses the in: (in with colon) where 78060 will also match 780605 or 78060.3 for example, while Kurt's will only match exactly 78060.

chuakp
Obsidian | Level 7

Thanks Reeza - since I'm only searching for one code, I changed to an equals sign instead of (in with colon).  Even after these change, the number of observations do not change in my output datasets.  Since diagnosis codes are five digits maximum (e.g., there is no 78060.5 value), I would not expecting that making this change would make a difference for five-digit codes like 78060.  In theory, a four-digit code like 7784 could be problematic if it matched both 77841 and 07784 (although empirically when I change to an equals sign, the number of observations don't change either). 

The reason I am wanting 20 output datasets is that I then am going to run proc freq's on each of them individually.

chuakp
Obsidian | Level 7

Kurt, you and Reeza make a good point.  But that does not explain the discrepancy between the number of observations between the code I am now using (30933 observations) and the code you suggested (31082 observations). 

data diag1; set main;

array dx dx_code1-dx_code41;

flag=0;

do i=1 to dim(dx);

  if dx(i) = '78060' then flag=1;

end;

if flag>0 then output;

drop i flag;

run;

Tom
Super User Tom
Super User

So if for some reason the same DX code appeared multiple times in the array then the code with the SELECT will output the same observation multiple times.

But the code that sets a single flag variable will only output an observation once.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 31 replies
  • 1900 views
  • 3 likes
  • 9 in conversation