I have two datasets
1. Contains records for all the surgeries/procedures I want to identify (variable of interest CODE, there are 40+ observations/records = 40+ different procedure codes)
2. Adminstrative records, where for each record/observation there are up to 50 procedure variables (eg PROCEDURE1-PROCEDURE50).
Objective - For each record in dataset 2 (admin records), I want to check all the non-missing values for the procedure variables with the procedure codes from dataset 1 to see if the admin record contains a procedure of interest.
Example - DATATSET 1
CODE | CODE_NODASH |
---|---|
14215-00 | 1421500 |
14215-01 | 1421501 |
30511-00 | 3051100 |
30511-01 | 3051101 |
30511-02 | 3051102 |
30511-03 | 3051103 |
30511-04 | 3051104 |
30511-05 | 3051105 |
30511-06 | 3051106 |
30511-07 | 3051107 |
30511-08 | 3051108 |
Example - DATASET 2
(I have made up this dataset - it contains 13 records with only 10 procedure codes. There should be 3 records in this dataset that contain at least one of the procedure codes listed in the example dataset above, IDs 141781, 143255, 259028)
SAS Output
ID | DT_PROCEDURE | PROCEDURE1 | PROCEDURE2 | PROCEDURE3 | PROCEDURE4 | PROCEDURE5 | PROCEDURE6 | PROCEDURE7 | PROCEDURE8 | PROCEDURE9 | PROCEDURE10 |
---|---|---|---|---|---|---|---|---|---|---|---|
127280 | 28DEC2001 | 95550-03 | |||||||||
127282 | 12NOV2002 | 48239-00 | 47726-00 | 92514-99 | 95550-03 | ||||||
127284 | 26DEC2006 | 90465-01 | |||||||||
127284 | 25NOV2011 | 90465-05 | |||||||||
127286 | 21DEC2016 | 35608-00 | 92514-29 | ||||||||
127288 | 11JAN2007 | 90468-01 | 90472-00 | 92507-99 | |||||||
127302 | 23AUG2012 | 92508-99 | 95550-01 | ||||||||
127303 | 30JUL2002 | 97324-00 | 92514-99 | ||||||||
127303 | 28AUG2008 | 35703-00 | 35630-00 | 92514-19 | |||||||
127303 | 19FEB2010 | 90465-05 | 92507-99 | ||||||||
141781 | 18MAY2015 | 14215-01 | 92515-99 | ||||||||
143255 | 05DEC2013 | 30511-06 | 92514-49 | 95550-03 | |||||||
259028 | 01NOV2007 | 14215-00 | 92515-29 |
I hope this makes sense! I realise i could use an array for procedures1-procedures10 and then cut-n-paste the desired procedures code into a if-in statement , such as
array codes [*] procedure_code: ;
do i=1 to dim(codes); if codes[i] in ('14215-00' '14215-01' '30511-00' '30511-01' '30511-02' '30511-03' '30511-04' '30511-05' '30511-06' '30511-07' '30511-08' ) then surgery=1; end;
But sometimes the codes are of interest will change. Currently, I can pull the codes from an existing dataset.
I was wondering if there is anyway this can be coded so that I don't have to manually list the codes of interest.
thanks
(and apologies for the centre-justified text. I don't know how to correct it!)
First, post test data in the form of a datastep, not here to type that in.
Second, it is never really, from a programming point of view, a good idea to have transposed data. For instance, if the number of procedures changes, then you need to re-do your array and loop, or write code to try to work out how many. Normalised is a far better structure as the structure does not change with more variables, but observation counts increase. So:
ID | DT_PROCEDURE | PROCEDURE1 |
---|
Could become:
ID DT_PROCEDURE PROCEDURE_NO PROCEDURE
In this way you can still transpose if you need to for an output. You can minimise the data stored, as you don't need the obs*variables total, only the variables*actual data (i.e. row one of your test data, you have 49 blank data items which are totally irrelevant and only there due to the choice in structure), can simplify your issue here because you could then merge data together or use loopkups, and you will simplify your code a lot. There really are no benefits to the Excel (transposed) way of thinking other than as a report for someone to look at - which you can do at the end when all processing is done.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.