05-09-2018 04:58 AM
I have two datasets
1. Contains records for all the surgeries/procedures I want to identify (variable of interest CODE, there are 40+ observations/records = 40+ different procedure codes)
2. Adminstrative records, where for each record/observation there are up to 50 procedure variables (eg PROCEDURE1-PROCEDURE50).
Objective - For each record in dataset 2 (admin records), I want to check all the non-missing values for the procedure variables with the procedure codes from dataset 1 to see if the admin record contains a procedure of interest.
Example - DATATSET 1
Example - DATASET 2
(I have made up this dataset - it contains 13 records with only 10 procedure codes. There should be 3 records in this dataset that contain at least one of the procedure codes listed in the example dataset above, IDs 141781, 143255, 259028)
I hope this makes sense! I realise i could use an array for procedures1-procedures10 and then cut-n-paste the desired procedures code into a if-in statement , such as
array codes [*] procedure_code: ;
do i=1 to dim(codes); if codes[i] in ('14215-00' '14215-01' '30511-00' '30511-01' '30511-02' '30511-03' '30511-04' '30511-05' '30511-06' '30511-07' '30511-08' ) then surgery=1; end;
But sometimes the codes are of interest will change. Currently, I can pull the codes from an existing dataset.
I was wondering if there is anyway this can be coded so that I don't have to manually list the codes of interest.
(and apologies for the centre-justified text. I don't know how to correct it!)
05-09-2018 06:26 AM
First, post test data in the form of a datastep, not here to type that in.
Second, it is never really, from a programming point of view, a good idea to have transposed data. For instance, if the number of procedures changes, then you need to re-do your array and loop, or write code to try to work out how many. Normalised is a far better structure as the structure does not change with more variables, but observation counts increase. So:
ID DT_PROCEDURE PROCEDURE_NO PROCEDURE
In this way you can still transpose if you need to for an output. You can minimise the data stored, as you don't need the obs*variables total, only the variables*actual data (i.e. row one of your test data, you have 49 blank data items which are totally irrelevant and only there due to the choice in structure), can simplify your issue here because you could then merge data together or use loopkups, and you will simplify your code a lot. There really are no benefits to the Excel (transposed) way of thinking other than as a report for someone to look at - which you can do at the end when all processing is done.
05-09-2018 09:46 AM