BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wheddingsjr
Pyrite | Level 9

Hi all

 

I imported an excel sheet into SAS and wanted to add additional information from my database. The idea was to add the column Race to the imported data using a table labeled RACE. I created the code, however, I am getting more lines back then the original import. The RACE table unfortunately has more than one RACE for some individuals so that's why I get more lines back than there should be. The way around that is to have a column called LineNBR which gives a count number of the races assigned to an individual and we just want to take line 1 (which we accept as the definitive race for an individual). When I run the code I get back all LineNBR 1's like I am supposed to, but I am still getting multiple races for a lot of people. I was hoping someone could tell me the reason for that. I am attaching the code I wrote and hoping some one could tell me if they can see anything wrong with what I did. Thanks in advance. (I hope I explained this sufficiently).

 

LIBNAME pat odbc complete="driver=SQL Server; database=Epic; server=PHSSQL2195" schema=Patient;

PROC IMPORT DATAFILE= "\\Cifs2\pop_lab$\P4P\BCBSMA\FY21\Actively Managed\Working Files\HighRisk_01MAY2020_31MAR2021.xlsx"
	OUT= work.cohort REPLACE DBMS= XLSX;
	RUN;

PROC SQL;
CREATE TABLE work.cohort_race AS

SELECT DISTINCT
		a.*
	,R.LineNBR
	,CASE r.PatientRaceDSC
		WHEN 'Asian' THEN 'Asian'
		WHEN 'Black or African American' THEN 'Black or African American'
	    WHEN 'Hispanic or Latino' THEN 'Hispanic or Latino'
	    WHEN 'White or Caucasian' THEN 'White or Caucasian'
		WHEN 'White' THEN 'White or Caucasian'
		WHEN 'Declined' THEN 'Unknown'
		WHEN 'Unavailable' THEN 'Unknown'
		WHEN 'NULL' THEN 'Unknown'
		ELSE 'Other'
	  	END as RACE
FROM work.cohort a 
LEFT JOIN Pat.Identity I ON a.MRN=I.PatientIdentityID	
LEFT JOIN Pat.Race R on I.PatientID = R.PatientID
WHERE 
(R.LineNBR = 1 OR R.LineNBR IS NULL)
Order By a.MRN
;quit;
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

ANY join on a variable can create duplicates if the value is duplicated in one data set. If it is duplicated in both data sets you get multiples. Example if value A is on 3 records in set one and value A is on 2 records in set the join will generally create 6 output records.

data one;
   input x $ y;
datalines;
a 1
a 2
a 3
;

data two;
  input x $ z;
datalines;
a 10
a 20
;

proc sql;
   create table example as
   select a.x, a.y, b.z
   from one as a 
        left join
        two as b
        on a.x = b.x
   ;
quit;

Since you are joining on something that looks like PatientId from 3 different sets, and some patients tend to repeat anything: tests, visits, diagnosis, prescriptions, then I suggest you look closely at your PatientID variables for duplicates within each set and decide if what you are doing makes sense in that scenario.

Perhaps somewhere you want Distinct patients.

 

And depending on your data you may also find that "Race" changes from visit to visit for the same patient. Depending on your record database and how consistent your data entry people are you may find this happening more often than you expect.

I deal with some medical testing. I have people that change race 2 or 3 times in a year. Ethnicity, sex and date of birth as well.

I have had data with the same patient id that change sex and date of birth on the same day (different tests). So there can be very sloppy data entry compounding the multiple visit/ test/ diagnosis/ prescription paradigm.

View solution in original post

5 REPLIES 5
ChrisNZ
Tourmaline | Level 20

Know your data.

 

You have duplicates somewhere. Look at each table individually to find them.

wheddingsjr
Pyrite | Level 9

ChrisNZ, I do know my data. That response was less than helpful and in fact rather rude and condescending. I came here for help not not to be patronized.

ChrisNZ
Tourmaline | Level 20

Sorry you take it personally.

This is a data issue; We do not know your data..

Look at one patient for which there are duplicates in all tables and show us that data if you can't find what the issue is.

 

ballardw
Super User

ANY join on a variable can create duplicates if the value is duplicated in one data set. If it is duplicated in both data sets you get multiples. Example if value A is on 3 records in set one and value A is on 2 records in set the join will generally create 6 output records.

data one;
   input x $ y;
datalines;
a 1
a 2
a 3
;

data two;
  input x $ z;
datalines;
a 10
a 20
;

proc sql;
   create table example as
   select a.x, a.y, b.z
   from one as a 
        left join
        two as b
        on a.x = b.x
   ;
quit;

Since you are joining on something that looks like PatientId from 3 different sets, and some patients tend to repeat anything: tests, visits, diagnosis, prescriptions, then I suggest you look closely at your PatientID variables for duplicates within each set and decide if what you are doing makes sense in that scenario.

Perhaps somewhere you want Distinct patients.

 

And depending on your data you may also find that "Race" changes from visit to visit for the same patient. Depending on your record database and how consistent your data entry people are you may find this happening more often than you expect.

I deal with some medical testing. I have people that change race 2 or 3 times in a year. Ethnicity, sex and date of birth as well.

I have had data with the same patient id that change sex and date of birth on the same day (different tests). So there can be very sloppy data entry compounding the multiple visit/ test/ diagnosis/ prescription paradigm.

wheddingsjr
Pyrite | Level 9
Thanks Ballard. That was what I was looking for. I looked into the data and saw exactly what you are talking about and I wanted another pair of eyes to look at the code since I am fairly new to SAS and I did use the logic that excludes all races that did have not LineNBR 1 but what still seeing multiple races. Its exactly as you said, different visits with inconsistent data entry. Same patients but the race changes from visit to visit. Extremely annoying to say the least

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 797 views
  • 1 like
  • 3 in conversation