Searching Large Data Set for Specific Codes - Simplify Coding

Reply
Learner
Posts: 1

Searching Large Data Set for Specific Codes - Simplify Coding

I would like to search a very large SAS dataset (hospital discharge) for specific ICD-10 codes. I have the list of ICD-10 codes I would like to search for in excel. Is there a simple way to go about doing this? I previously used the following code:

 

data SA2009 out2;
set ED2009;
if DX1 in ('2910', '2911', '2912', '2913', '2914', '2915', '29181', '29182', '29189')

then output SA2009;
else output out2;
Run;

 

The problem is that I had hundreds of codes included in the parentheses. I pasted these in, but then had to had the single quotations and adjust spacing, which took a significant amount of time. 

 

I would like to be able to reference the excel doc with all the codes or write code that would not require I manually type each diagnosis code in. 

 

Any suggestions are appreciated!

Attachment
SAS Super FREQ
Posts: 9,184

Re: Searching Large Data Set for Specific Codes - Simplify Coding

Posted in reply to erinmurray

Hi:

  Consider this example...SASHELP.CLASS has 19 rows, where the ages vary from 11 to 16. If I have this file (created here with DATALINES but could have been created from importing an EXCEL list):

data need_ages;
  infile datalines;
  input age;
return;
datalines;
13
15
16
;
run;
proc sort data=sashelp.class out=class;
  by age;
run;

proc sort data=need_ages;
  by age;
run;

Now the file need_ages has the 3 values for AGE that I want to extract from SASHELP.CLASS. My copy of SASHELP.CLASS is not sorted by AGE, so I needed to sort it first.

 

Then, I do a merge to essentially "split" the file into 2 separate files -- one with the observations that have the age I want and the other that are the rows for other ages that I don't want.

 

Here's the MERGE code:


data got_ages not_these;
  merge need_ages(in=need) class(in=allrec);
  by age;
  if need=1 and allrec=1 then output got_ages;
  else if allrec = 1 and need=0 then output not_these;
run;

proc print data=got_ages;
  title '1) Only wanted ages 13, 15 and 16';
run;

proc print data=not_these;
  title '2) Did NOT want these rows because they are not in the ages list';
run;

And here are the results:

concept_of_merge.png

 

You could also do this with an SQL join, which might be easier. but you only get 1 file for each SQL step. The SQL has the advantage of not needing to sort the files. This example also uses WORK.NEED_AGES:

 


proc sql;
  create table work.sq_join_age as
  select b.name, b.age, b.sex, b.height, b.weight
  from need_ages a, sashelp.class b
  where a.age = b.age
  order by b.age;
quit;

proc print data=sq_join_age;
  title '3) Using an SQL join';
run;

But the downside is that if you need a dataset with the nonmatches, you have to run a second SQL step.

 

Hope this helps.

 

Cynthia

Super Contributor
Posts: 317

Re: Searching Large Data Set for Specific Codes - Simplify Coding

Posted in reply to Cynthia_sas

This is not a reply to the question. Seek to know how to render the PROC PRINT output in the form Cynthia_sas has done. I used to cut and paste the Proc print's output here, which happens to be mere text.

 

 

SAS Super FREQ
Posts: 9,184

Re: Searching Large Data Set for Specific Codes - Simplify Coding

[ Edited ]

Hi:
I used the Photos button under Reply (note, if you choose "Quick Reply" you do not have the opportunity to post either code or pictures).

To make the screen shots, sometimes I use FullShot, sometimes GreenShot and sometimes the Windows Snipping tool and sometimes just Prnt Scr. To add arrows, comments, etc to images, I alternate between using FullShot and pasting the image in PowerPoint and adding arrows and text with PPT (which requires usually taking another screen shot of the final PPT file).

Cynthia

Super User
Posts: 6,355

Re: Searching Large Data Set for Specific Codes - Simplify Coding

Posted in reply to erinmurray

Do you really need the data set OUT2?  If not, you can do this in one step:

 

proc sql;

create table SA2009 as

select * from ED2009

where dx in (select values from excel_list);

quit;

 

You have to modify the interior SELECT clause, using the real name of the data set and variable holding the list of codes from Excel.

 

Whether this is helpful or not, the techniques that @Cynthia_sas posted are required learning if you want to program in SAS.

PROC Star
Posts: 217

Re: Searching Large Data Set for Specific Codes - Simplify Coding

Posted in reply to erinmurray

As long as your total IN clause does not go above 64K, and assuming you can read your Excel file into a SAS data set, something like this should do:

proc sql noprint;
  select quote(trim(code),"'") into :codes separated by ',' from <your Excel data>;
quit;

data SA2009 out2;
  set ED2009;
  if DX1 in(&codes) then....

I put the "'" in the quote function in case some of your codes contain "&" or "%", which would be interpreted as macro tokens if the codes were in double quotes. 

Ask a Question
Discussion stats
  • 5 replies
  • 120 views
  • 3 likes
  • 5 in conversation