There seems to be a slight confusion. WGT_DX_PARTIAL is calculated when DOD_DIFF does not exist. WGT_DX_FULL is calculated using DOD_DIFF.
Under assumption of DOD_DIFF does not exist, we will use SER_DATE_DX instead DOD_DIFF and do following:
- Choose the earliest 'SER_DATE_DX' (min('SER_DATE_DX')) instead min (DOD_DIFF)
- if there are multiple records for that data, choose the lowest WGT_DX, call is WGT_DX_PARTIAL
- Pick one row per patient when these conditions meet
Does this clarify?
Take a look at these and test it on the sample you posted for just one id, i did for ID=1
/*WGT_DX_FULL*/
proc sql;
create table WGT_DX_FULL as
select distinct *,min(WGT_DX) as WGT_DX_FULL
from
(select *, min(DOD_DIFF) as min_DOD_DIFF
from have1(where=(id=1))
group by id
having DOD_DIFF=min_DOD_DIFF)
group by id, min_DOD_DIFF
having min(wgt_dx)=wgt_dx;
quit;
/*WGT_DX_PARTIAL*/
proc sql;
create table WGT_DX_PARTIAL as
select distinct *,min(WGT_DX) as WGT_DX_PARTIAL
from
(select *, min(SER_DATE_DX) as min_SER_DATE_DX
from have1(where=(id=1))
group by id
having SER_DATE_DX=min_SER_DATE_DX)
group by id, min_SER_DATE_DX
having min(wgt_dx)=wgt_dx;
quit;
Now, this should help you get the hang of where I am at.
WGT_DX_FULL uses DOD_DIFF
WGT_DX_PARTIAL uses SER_DATE_DX
and we take it from here
I'm running downstairs to test the codes. Will take me 2 min. However, the codes make sense and you're correct on WGT_DX_FULL uses DOD_DIFF and WGT_DX_PARTIAL uses SER_DATE_DX. And I assume, that we will have to merge the resulting datasets from these proc sqls. I will get back to you asap. 2min.
Take your time, I am here till 6pm and right now it is 4:45. Central time
I just ran the codes. Yes, Eurika!!!! Exactly. I'm jumping up and down here. Exactly Novinosrin!
I am starting to feel better. Okay, I am about to leave, catch the train and go home. If anything do post, and we can catch up tomorrow
I figured why. Duplicate patients have ser_date_dx and wgt_dx tied. Distinct not doing its job here?
Distinct is doing it's job, but code is making each of them distinct
Duplicate patients have ser_date_dx and wgt_dx tied -- this is perfectly fine
ONE-TO-ONE NOW. THANKS @novinosrin
/*WGT_DX_FULL*/
proc sql;
create table P.WGT_DX_FULL as /*N=6,375*/
select distinct DOD_DIFF,ID,min(WGT_DX) as WGT_DX_FULL
from
(select DISTINCT *, min(DISTINCT(DOD_DIFF)) as min_DOD_DIFF
from P.MYDATA
group by id
having DOD_DIFF=min_DOD_DIFF)
group by id, min_DOD_DIFF
having min(DISTINCT(wgt_dx))=wgt_dx;
quit;
PROC SQL; SELECT COUNT (DISTINCT ID) FROM P.DAYS30_DX_DEF; QUIT; /*N=6,375 PATIENTS*/
/*WGT_DX_PARTIAL*/
proc sql;
create table P.WGT_DX_PARTIAL as /*N=6,375*/
select distinct SER_DATE_DX,ID,min(WGT_DX) as WGT_DX_PARTIAL
from
(select DISTINCT *, min(DISTINCT(SER_DATE_DX)) as min_SER_DATE_DX
from P.MYDATA
group by id
having SER_DATE_DX=min_SER_DATE_DX)
group by id, min_SER_DATE_DX
having min(DISTINCT(wgt_dx))=wgt_dx;
quit;
PROC SQL; SELECT COUNT (DISTINCT ID) FROM P.DAYS30_DX_DEF; QUIT; /*N=6,375 PATIENTS*/
@novinosrin, Now, I have A and B handled. Moving onto C. I have to do a separate SQL to achieve C.2:
C.2. DIFF_DATE=ABS(DIF('SER_DATE_DX' selected using DOD_DIFF, 'SER_DATE_DX' selected without using DOD_DIFF)
Correct?
A. Without using DOD_DIFF (accomplished):
B. Using DOD_DIFF(accomplished):
C. Calculate measurement error
At the end, each patient has one ERROR value.
Sounds like you want a calc difference between the two dates from the two results?
If yes, straight forward merge and subtract. Make sure you have difference names the date variables to avoid ambiguous reference or alias
Yes, however, to do that subtraction I need SER_DATE_DX in both sql tables. I have SER_DATE_DX in WGT_DX_PARTIAL tables but not in WGT_DX_FULL.
Please let me check with you on this. What if I include SER_DATE_DX in sql for WGT_DX_FULL and then proc nodupkey by ID, DOD_DIFF and WGT_DX_FULL ? then merge and subtract between two SER_DATE_DX came from two sql tables? I'll do it more comfortably if you OK this approach 🙂
proc sql;
create table P.WGT_DX_FULL as /*N=6,375*/
select distinct SER_DATE_DX, DOD_DIFF,ID,min(WGT_DX) as WGT_DX_FULL
from
(select DISTINCT *, min(DISTINCT(DOD_DIFF)) as min_DOD_DIFF
from P.DAYS30_DX_DEF
group by id
having DOD_DIFF=min_DOD_DIFF)
group by id, min_DOD_DIFF
having min(DISTINCT(wgt_dx))=wgt_dx;
quit;
I failed to pay attention to the fact you dropped
SER_DATE_DX
from the process that computes WGT_DX_FULL earlier. I don't think you need to drop the date var
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.