After uploading the data from various sources we realized that patient birth year is wrong.How to update the dataset with new values for patient birth year without effecting the data and without having to upload each file to fix the wrong field.
The identifying key is by the provider.We have 15 providers like:
Aetna,Axium etc etc.So for each provider I need to update the patient birth year.
I tried the following:
update final.spdata_2008_v1 as S set patient_birth_yr = (select r.patient_birth_yr from up_date as R where S.provider=R.provider);
I got this error message:
5417 proc sql;
5418 update final.spdata_2008_v1 as S set patient_birth_yr = (select r.patient_birth_yr from try as R where S.provider=R.provider);
ERROR: Subquery evaluated to more than one row.
NOTE: Correlation values are: Provider='DIPLOMAT PHARMACY' .
I tried this:
update final.spdata_2008_v1 as S set patient_birth_yr = (select distinct r.patient_birth_yr from up_date as R where S.provider=R.provider and s.patient_id=r.patient_id);
Still gettig the error.
Actually Linus, the unique condition only needs to be present on the subquery. It can update multiple records on the target table with a single record from the subquery.
That said, the thing that you must obtain is a unique record returned from the subquery for each record on the target table. We can only guess at the overall structure of your data, so you must figure out the best way to get a single record returned. Distinct may do it, assigning a record number and selecting the first or last one in a partition, are a couple of options.
Try collecting data from your source for a single patient to see the structure of the duplication. You may want to create a temporary dataset to eliminate dupplicates if that is easier for you to clean up.