BookmarkSubscribeRSS Feed
SASPhile
Quartz | Level 8
After uploading the data from various sources we realized that patient birth year is wrong.How to update the dataset with new values for patient birth year without effecting the data and without having to upload each file to fix the wrong field.
The identifying key is by the provider.We have 15 providers like:
Aetna,Axium etc etc.So for each provider I need to update the patient birth year.
7 REPLIES 7
Flip
Fluorite | Level 6
A starting point.

proc sql;
update datatable set birthdate = (select birthdate from source_table where
**match criteria**);
Quit;
SASPhile
Quartz | Level 8
I tried the following:
proc sql;
update final.spdata_2008_v1 as S set patient_birth_yr = (select r.patient_birth_yr from up_date as R where S.provider=R.provider);
Quit;

I got this error message:
5417 proc sql;
5418 update final.spdata_2008_v1 as S set patient_birth_yr = (select r.patient_birth_yr from try as R where S.provider=R.provider);
ERROR: Subquery evaluated to more than one row.
NOTE: Correlation values are: Provider='DIPLOMAT PHARMACY' .
5419 Quit;
Flip
Fluorite | Level 6
Are all birth years for a given provider the same? That would seem strange. If not, you need more identifying info such as a patient identifier. If they are then a 'distinct' on the select might help.

The message is self explanitory. You are trying to update a single value with more than one result. Your subquery must return a single row.
SASPhile
Quartz | Level 8
Flip,
I tried this:
proc sql;
update final.spdata_2008_v1 as S set patient_birth_yr = (select distinct r.patient_birth_yr from up_date as R where S.provider=R.provider and s.patient_id=r.patient_id);
Quit;
Still gettig the error.
LinusH
Tourmaline | Level 20
The combination of provider and patient id must be unique in both tables to make this work, otherwise your subquery may result in with multiple rows.
/Linus
Data never sleeps
SAS_user
Calcite | Level 5
the problem when using distinct is that your subquery still gets more than one row for the same provider and patient_id. That's why distinct will not help.

Simple fix is that you update your where clause with some extra condition or group by statement.

The simliest way could be using min or max function with distinct.
But i don't think that is correct with some business logic.

(select distinct min(r.patient_birth_yr) from up_date as R where S.provider=R.provider and s.patient_id=r.patient_id);
Flip
Fluorite | Level 6
Actually Linus, the unique condition only needs to be present on the subquery. It can update multiple records on the target table with a single record from the subquery.

That said, the thing that you must obtain is a unique record returned from the subquery for each record on the target table. We can only guess at the overall structure of your data, so you must figure out the best way to get a single record returned. Distinct may do it, assigning a record number and selecting the first or last one in a partition, are a couple of options.

Try collecting data from your source for a single patient to see the structure of the duplication. You may want to create a temporary dataset to eliminate dupplicates if that is easier for you to clean up.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2399 views
  • 0 likes
  • 4 in conversation