Hi all,
I am trying to add a variable called height to my cohort dataset by left joining. I have 92 observations in my cohort so I am expecting the final number to be 92 after the join since I am only left joining (adding variables to the existing conditions). I am including both scenarios where height is missing and not missing. I am only getting 64 observations. I only want to see the most recent record of height 30 days prior to the index date. Attached is my code.
Please let me know how can I solve this.
*height most recent height within 30 days prior to the index date; proc sql ; create table _02a_ht as select a.*, b.labcomponent, b.TestResultCleaned,b.TestDate, case when (a.index_date-30) le b.TestDate le a.index_date and b.labcomponent ='Body Height' and b.TestResultCleaned ne . then b.TestResultCleaned else . end as height from _02a_ecog2 as a inner join dfi_panc.vitals as b on a.PatientID=b.PatientID and b.labcomponent ='Body Height' order by a.PatientID,b.TestDate; quit ;/*3463 */ *making sure the testdate is 30 days prior the index date; data _02a_testdate; set _02a_ht ; by PatientID TestDate ; if last.PatientID; where height = . or height ne .; run ;/*64 */
There's no error
I am trying to add a variable called height to my cohort dataset by left joining.
*height most recent height within 30 days prior to the index date; proc sql ; create table _02a_ht as select a.*, b.labcomponent, b.TestResultCleaned,b.TestDate, case when (a.index_date-30) le b.TestDate le a.index_date and b.labcomponent ='Body Height' and b.TestResultCleaned ne . then b.TestResultCleaned else . end as height from _02a_ecog2 as a inner join dfi_panc.vitals as b on a.PatientID=b.PatientID and b.labcomponent ='Body Height' order by a.PatientID,b.TestDate; quit ;/*3463 */ *making sure the testdate is 30 days prior the index date; data _02a_testdate; set _02a_ht ; by PatientID TestDate ; if last.PatientID; where height = . or height ne .; run ;/*64 */
Try changing inner join to left join?
While you may start with 92 in your cohort this bit of code:
inner join dfi_panc.vitals as b
on a.PatientID=b.PatientID and b.labcomponent ='Body Height'
has the potential to remove patientid values either because it does not appear in both data sets OR the only match in set b does not have "Body Height" as the value for labcomponent. Possibly due to spelling differences.
By definition from the online documentation:
An inner join returns only the subset of rows from the first table that matches rows from the second table.
I changed that to a left join but now i have 89 obsv still not 92
WHERE filters out the data. If you remove that do you get the records you want?
Look at the missing records and determine what condition is excluding them....with less than a 100 records you can manually verify it.
@manya92 wrote:
I changed that to a left join but now i have 89 obsv still not 92
Okay, now i have 89 obvs , i want to see which obsv from the prev dataset haev been deleted ? how do i see that?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.