Solved: Re: Calculate error with variable known and unknown scenarios using re... - Page 2

Cruise · Posted 11-15-2018 05:26 PM

There seems to be a slight confusion. WGT_DX_PARTIAL is calculated when DOD_DIFF does not exist. WGT_DX_FULL is calculated using DOD_DIFF.

Under assumption of DOD_DIFF does not exist, we will use SER_DATE_DX instead DOD_DIFF and do following:

- Choose the earliest 'SER_DATE_DX' (min('SER_DATE_DX')) instead min (DOD_DIFF)

- if there are multiple records for that data, choose the lowest WGT_DX, call is WGT_DX_PARTIAL
- Pick one row per patient when these conditions meet

Does this clarify?

Cruise · Posted 11-15-2018 05:31 PM

the scenario without DOD_DIFF, I can imagine the exact same approach used with DOD_DIFF but only difference would be to use SER_DATE_DX instead DOD_DIFF because of our assumption of its non-existence.

novinosrin · Posted 11-15-2018 05:37 PM

Take a look at these and test it on the sample you posted for just one id, i did for ID=1

/*WGT_DX_FULL*/

proc sql;
create table WGT_DX_FULL as
select distinct *,min(WGT_DX) as WGT_DX_FULL
from
(select  *, min(DOD_DIFF) as min_DOD_DIFF
from have1(where=(id=1))
group by id
having DOD_DIFF=min_DOD_DIFF)
group by id, min_DOD_DIFF
having min(wgt_dx)=wgt_dx;
quit;


/*WGT_DX_PARTIAL*/

proc sql;
create table WGT_DX_PARTIAL as
select distinct *,min(WGT_DX) as WGT_DX_PARTIAL
from
(select  *, min(SER_DATE_DX) as min_SER_DATE_DX
from have1(where=(id=1))
group by id
having SER_DATE_DX=min_SER_DATE_DX)
group by id, min_SER_DATE_DX
having min(wgt_dx)=wgt_dx;
quit;

Now, this should help you get the hang of where I am at.

WGT_DX_FULL uses DOD_DIFF

WGT_DX_PARTIAL uses SER_DATE_DX

and we take it from here

Cruise · Posted 11-15-2018 05:42 PM

I'm running downstairs to test the codes. Will take me 2 min. However, the codes make sense and you're correct on WGT_DX_FULL uses DOD_DIFF and WGT_DX_PARTIAL uses SER_DATE_DX. And I assume, that we will have to merge the resulting datasets from these proc sqls. I will get back to you asap. 2min.

novinosrin · Posted 11-15-2018 05:44 PM

Take your time, I am here till 6pm and right now it is 4:45. Central time

ovinosrin · Posted 11-15-2018 05:52 PM

I just ran the codes. Yes, Eurika!!!! Exactly. I'm jumping up and down here. Exactly Novinosrin!

novinosrin · Posted 11-15-2018 06:42 PM

I am starting to feel better. Okay, I am about to leave, catch the train and go home. If anything do post, and we can catch up tomorrow

Cruise · Posted 11-16-2018 09:20 AM

@novinosrin

I figured why. Duplicate patients have ser_date_dx and wgt_dx tied. Distinct not doing its job here?

novinosrin · Posted 11-16-2018 11:25 AM

Distinct is doing it's job, but code is making each of them distinct

Duplicate patients have ser_date_dx and wgt_dx tied -- this is perfectly fine

Cruise · Posted 11-16-2018 11:17 AM

ONE-TO-ONE NOW. THANKS @novinosrin

/*WGT_DX_FULL*/
proc sql;
create table P.WGT_DX_FULL as /*N=6,375*/
select distinct DOD_DIFF,ID,min(WGT_DX) as WGT_DX_FULL
from
(select DISTINCT *, min(DISTINCT(DOD_DIFF)) as min_DOD_DIFF
from P.MYDATA
group by id 
having DOD_DIFF=min_DOD_DIFF)
group by id, min_DOD_DIFF
having min(DISTINCT(wgt_dx))=wgt_dx;
quit;

PROC SQL; SELECT COUNT (DISTINCT ID) FROM P.DAYS30_DX_DEF; QUIT; /*N=6,375 PATIENTS*/ 

/*WGT_DX_PARTIAL*/
proc sql;
create table P.WGT_DX_PARTIAL as /*N=6,375*/
select distinct SER_DATE_DX,ID,min(WGT_DX) as WGT_DX_PARTIAL
from
(select DISTINCT *, min(DISTINCT(SER_DATE_DX)) as min_SER_DATE_DX
from P.MYDATA
group by id 
having SER_DATE_DX=min_SER_DATE_DX)
group by id, min_SER_DATE_DX
having min(DISTINCT(wgt_dx))=wgt_dx;
quit;
PROC SQL; SELECT COUNT (DISTINCT ID) FROM P.DAYS30_DX_DEF; QUIT; /*N=6,375 PATIENTS*/

Cruise · Posted 11-16-2018 11:28 AM

@novinosrin, Now, I have A and B handled. Moving onto C. I have to do a separate SQL to achieve C.2:

C.2. DIFF_DATE=ABS(DIF('SER_DATE_DX' selected using DOD_DIFF, 'SER_DATE_DX' selected without using DOD_DIFF)

Correct?

A. Without using DOD_DIFF (accomplished):

Choose the earliest date of visit
If there are multiple records for that data, choose the lowest WGT_DX, call is WGT_DX_PARTIAL
Pick one row per patient when these conditions meet

B. Using DOD_DIFF(accomplished):

Choose the minimum DOD_DIFF
If there are multiple records for that value of DOD_DIFF, choose the lowestWGT_DX, call it WGT_DX_FULL
Pick one row per patient when these conditions meet

C. Calculate measurement error

DIFF_WGT_DX= WGT_DX_FULL - WGT_DX_PARTIAL (accomplished):
DIFF_DATE=ABS(DIF('SER_DATE_DX' selected using DOD_DIFF, 'SER_DATE_DX' selected without using DOD_DIFF)
ERROR= DIFF_WGT_DX*DIFF_DATE

At the end, each patient has one ERROR value.

novinosrin · Posted 11-16-2018 11:31 AM

Sounds like you want a calc difference between the two dates from the two results?

If yes, straight forward merge and subtract. Make sure you have difference names the date variables to avoid ambiguous reference or alias

Cruise · Posted 11-16-2018 11:37 AM

Yes, however, to do that subtraction I need SER_DATE_DX in both sql tables. I have SER_DATE_DX in WGT_DX_PARTIAL tables but not in WGT_DX_FULL.

Please let me check with you on this. What if I include SER_DATE_DX in sql for WGT_DX_FULL and then proc nodupkey by ID, DOD_DIFF and WGT_DX_FULL ? then merge and subtract between two SER_DATE_DX came from two sql tables? I'll do it more comfortably if you OK this approach 🙂

proc sql;
create table P.WGT_DX_FULL as /*N=6,375*/
select distinct SER_DATE_DX, DOD_DIFF,ID,min(WGT_DX) as WGT_DX_FULL
from
(select DISTINCT *, min(DISTINCT(DOD_DIFF)) as min_DOD_DIFF
from P.DAYS30_DX_DEF
group by id 
having DOD_DIFF=min_DOD_DIFF)
group by id, min_DOD_DIFF
having min(DISTINCT(wgt_dx))=wgt_dx;
quit;

novinosrin · Posted 11-16-2018 11:54 AM

I failed to pay attention to the fact you dropped

SER_DATE_DX

from the process that computes WGT_DX_FULL earlier. I don't think you need to drop the date var

Cruise · Posted 11-16-2018 11:56 AM

SQL produced multiples when I add SER_DATE_DX for select variables. The same thing happened when I wanted to keep CODE in the data which created duplicates

Re: Calculate error with variable known and unknown scenarios using repeated measurement data

Re: Calculate error with variable known and unknown scenarios using repeated measurement data

Re: Calculate error with variable known and unknown scenarios using repeated measurement data

Re: Calculate error with variable known and unknown scenarios using repeated measurement data

Re: Calculate error with variable known and unknown scenarios using repeated measurement data

Re: Calculate error with variable known and unknown scenarios using repeated measurement data

Re: Calculate error with variable known and unknown scenarios using repeated measurement data

Re: Calculate error with variable known and unknown scenarios using repeated measurement data

Re: Calculate error with variable known and unknown scenarios using repeated measurement data

Re: Calculate error with variable known and unknown scenarios using repeated measurement data

Re: Calculate error with variable known and unknown scenarios using repeated measurement data

Re: Calculate error with variable known and unknown scenarios using repeated measurement data

Re: Calculate error with variable known and unknown scenarios using repeated measurement data

Re: Calculate error with variable known and unknown scenarios using repeated measurement data

Re: Calculate error with variable known and unknown scenarios using repeated measurement data

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!