Solved: Re: SAS Code Validation

wheddingsjr · Posted 04-15-2021 09:45 PM

Hi all

I imported an excel sheet into SAS and wanted to add additional information from my database. The idea was to add the column Race to the imported data using a table labeled RACE. I created the code, however, I am getting more lines back then the original import. The RACE table unfortunately has more than one RACE for some individuals so that's why I get more lines back than there should be. The way around that is to have a column called LineNBR which gives a count number of the races assigned to an individual and we just want to take line 1 (which we accept as the definitive race for an individual). When I run the code I get back all LineNBR 1's like I am supposed to, but I am still getting multiple races for a lot of people. I was hoping someone could tell me the reason for that. I am attaching the code I wrote and hoping some one could tell me if they can see anything wrong with what I did. Thanks in advance. (I hope I explained this sufficiently).

LIBNAME pat odbc complete="driver=SQL Server; database=Epic; server=PHSSQL2195" schema=Patient;

PROC IMPORT DATAFILE= "\\Cifs2\pop_lab$\P4P\BCBSMA\FY21\Actively Managed\Working Files\HighRisk_01MAY2020_31MAR2021.xlsx"
	OUT= work.cohort REPLACE DBMS= XLSX;
	RUN;

PROC SQL;
CREATE TABLE work.cohort_race AS

SELECT DISTINCT
		a.*
	,R.LineNBR
	,CASE r.PatientRaceDSC
		WHEN 'Asian' THEN 'Asian'
		WHEN 'Black or African American' THEN 'Black or African American'
	    WHEN 'Hispanic or Latino' THEN 'Hispanic or Latino'
	    WHEN 'White or Caucasian' THEN 'White or Caucasian'
		WHEN 'White' THEN 'White or Caucasian'
		WHEN 'Declined' THEN 'Unknown'
		WHEN 'Unavailable' THEN 'Unknown'
		WHEN 'NULL' THEN 'Unknown'
		ELSE 'Other'
	  	END as RACE
FROM work.cohort a 
LEFT JOIN Pat.Identity I ON a.MRN=I.PatientIdentityID	
LEFT JOIN Pat.Race R on I.PatientID = R.PatientID
WHERE 
(R.LineNBR = 1 OR R.LineNBR IS NULL)
Order By a.MRN
;quit;

ballardw · Posted 04-16-2021 11:04 AM

ANY join on a variable can create duplicates if the value is duplicated in one data set. If it is duplicated in both data sets you get multiples. Example if value A is on 3 records in set one and value A is on 2 records in set the join will generally create 6 output records.

data one;
   input x $ y;
datalines;
a 1
a 2
a 3
;

data two;
  input x $ z;
datalines;
a 10
a 20
;

proc sql;
   create table example as
   select a.x, a.y, b.z
   from one as a 
        left join
        two as b
        on a.x = b.x
   ;
quit;

Since you are joining on something that looks like PatientId from 3 different sets, and some patients tend to repeat anything: tests, visits, diagnosis, prescriptions, then I suggest you look closely at your PatientID variables for duplicates within each set and decide if what you are doing makes sense in that scenario.

Perhaps somewhere you want Distinct patients.

And depending on your data you may also find that "Race" changes from visit to visit for the same patient. Depending on your record database and how consistent your data entry people are you may find this happening more often than you expect.

I deal with some medical testing. I have people that change race 2 or 3 times in a year. Ethnicity, sex and date of birth as well.

I have had data with the same patient id that change sex and date of birth on the same day (different tests). So there can be very sloppy data entry compounding the multiple visit/ test/ diagnosis/ prescription paradigm.

View solution in original post

ChrisNZ · Posted 04-15-2021 11:22 PM

Know your data.

You have duplicates somewhere. Look at each table individually to find them.

High-Performance SAS Coding - Third Edition

wheddingsjr · Posted 04-15-2021 11:37 PM

ChrisNZ, I do know my data. That response was less than helpful and in fact rather rude and condescending. I came here for help not not to be patronized.

ChrisNZ · Posted 04-16-2021 12:03 AM

Sorry you take it personally.

This is a data issue; We do not know your data..

Look at one patient for which there are duplicates in all tables and show us that data if you can't find what the issue is.

High-Performance SAS Coding - Third Edition

ballardw · Posted 04-16-2021 11:04 AM

ANY join on a variable can create duplicates if the value is duplicated in one data set. If it is duplicated in both data sets you get multiples. Example if value A is on 3 records in set one and value A is on 2 records in set the join will generally create 6 output records.

data one;
   input x $ y;
datalines;
a 1
a 2
a 3
;

data two;
  input x $ z;
datalines;
a 10
a 20
;

proc sql;
   create table example as
   select a.x, a.y, b.z
   from one as a 
        left join
        two as b
        on a.x = b.x
   ;
quit;

Since you are joining on something that looks like PatientId from 3 different sets, and some patients tend to repeat anything: tests, visits, diagnosis, prescriptions, then I suggest you look closely at your PatientID variables for duplicates within each set and decide if what you are doing makes sense in that scenario.

Perhaps somewhere you want Distinct patients.

And depending on your data you may also find that "Race" changes from visit to visit for the same patient. Depending on your record database and how consistent your data entry people are you may find this happening more often than you expect.

I deal with some medical testing. I have people that change race 2 or 3 times in a year. Ethnicity, sex and date of birth as well.

I have had data with the same patient id that change sex and date of birth on the same day (different tests). So there can be very sloppy data entry compounding the multiple visit/ test/ diagnosis/ prescription paradigm.

wheddingsjr · Posted 04-16-2021 03:16 PM

Thanks Ballard. That was what I was looking for. I looked into the data and saw exactly what you are talking about and I wanted another pair of eyes to look at the code since I am fairly new to SAS and I did use the logic that excludes all races that did have not LineNBR 1 but what still seeing multiple races. Its exactly as you said, different visits with inconsistent data entry. Same patients but the race changes from visit to visit. Extremely annoying to say the least

SAS Code Validation

Re: SAS Code Validation

Re: SAS Code Validation

Re: SAS Code Validation

Re: SAS Code Validation

Re: SAS Code Validation

Re: SAS Code Validation

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away