BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

There seems to be a slight confusion. WGT_DX_PARTIAL is calculated when DOD_DIFF does not exist. WGT_DX_FULL is calculated using DOD_DIFF. 

 

Under assumption of DOD_DIFF does not exist, we will use SER_DATE_DX instead DOD_DIFF and do following: 


- Choose the earliest 'SER_DATE_DX' (min('SER_DATE_DX')) instead min (DOD_DIFF)

- if there are multiple records for that data, choose the lowest WGT_DX, call is WGT_DX_PARTIAL
- Pick one row per patient when these conditions meet

 

Does this clarify? 

Cruise
Ammonite | Level 13
the scenario without DOD_DIFF, I can imagine the exact same approach used with DOD_DIFF but only difference would be to use SER_DATE_DX instead DOD_DIFF because of our assumption of its non-existence.
novinosrin
Tourmaline | Level 20

Take a look at these and test it on the sample you posted for just one id, i did for ID=1

 

/*WGT_DX_FULL*/

proc sql;
create table WGT_DX_FULL as
select distinct *,min(WGT_DX) as WGT_DX_FULL
from
(select  *, min(DOD_DIFF) as min_DOD_DIFF
from have1(where=(id=1))
group by id
having DOD_DIFF=min_DOD_DIFF)
group by id, min_DOD_DIFF
having min(wgt_dx)=wgt_dx;
quit;


/*WGT_DX_PARTIAL*/ proc sql; create table WGT_DX_PARTIAL as select distinct *,min(WGT_DX) as WGT_DX_PARTIAL from (select *, min(SER_DATE_DX) as min_SER_DATE_DX from have1(where=(id=1)) group by id having SER_DATE_DX=min_SER_DATE_DX) group by id, min_SER_DATE_DX having min(wgt_dx)=wgt_dx; quit;

 

 

Now, this should help you get the hang of  where I am at. 

WGT_DX_FULL uses DOD_DIFF
WGT_DX_PARTIAL uses SER_DATE_DX

 and we take it from here

Cruise
Ammonite | Level 13

I'm running downstairs to test the codes. Will take me 2 min. However, the codes make sense and you're correct on WGT_DX_FULL uses DOD_DIFF and WGT_DX_PARTIAL uses SER_DATE_DX. And I assume, that we will have to merge the resulting datasets from these proc sqls. I will get back to you asap. 2min.

novinosrin
Tourmaline | Level 20

Take your time, I am here till 6pm and right now it is 4:45. Central time

Cruise
Ammonite | Level 13

I just ran the codes. Yes, Eurika!!!! Exactly. I'm jumping up and down here. Exactly N

novinosrin
Tourmaline | Level 20

I am starting to feel better. Okay, I am about to leave, catch the train and go home. If anything do post, and we can catch up tomorrow

Cruise
Ammonite | Level 13

@novinosrin

I figured why. Duplicate patients have ser_date_dx and wgt_dx tied. Distinct not doing its job here? 

 

tied_serdate.png

novinosrin
Tourmaline | Level 20

Distinct is doing it's job, but code is making each of them distinct 

Duplicate patients have ser_date_dx and wgt_dx tied  -- this is perfectly fine

Cruise
Ammonite | Level 13

ONE-TO-ONE NOW. THANKS @novinosrin

 

/*WGT_DX_FULL*/
proc sql;
create table P.WGT_DX_FULL as /*N=6,375*/
select distinct DOD_DIFF,ID,min(WGT_DX) as WGT_DX_FULL
from
(select DISTINCT *, min(DISTINCT(DOD_DIFF)) as min_DOD_DIFF
from P.MYDATA
group by id 
having DOD_DIFF=min_DOD_DIFF)
group by id, min_DOD_DIFF
having min(DISTINCT(wgt_dx))=wgt_dx;
quit;

PROC SQL; SELECT COUNT (DISTINCT ID) FROM P.DAYS30_DX_DEF; QUIT; /*N=6,375 PATIENTS*/ 

/*WGT_DX_PARTIAL*/
proc sql;
create table P.WGT_DX_PARTIAL as /*N=6,375*/
select distinct SER_DATE_DX,ID,min(WGT_DX) as WGT_DX_PARTIAL
from
(select DISTINCT *, min(DISTINCT(SER_DATE_DX)) as min_SER_DATE_DX
from P.MYDATA
group by id 
having SER_DATE_DX=min_SER_DATE_DX)
group by id, min_SER_DATE_DX
having min(DISTINCT(wgt_dx))=wgt_dx;
quit;
PROC SQL; SELECT COUNT (DISTINCT ID) FROM P.DAYS30_DX_DEF; QUIT; /*N=6,375 PATIENTS*/ 
Cruise
Ammonite | Level 13

 @novinosrin, Now, I have A and B handled. Moving onto C. I have to do a separate SQL to achieve C.2:

 

C.2. DIFF_DATE=ABS(DIF('SER_DATE_DX' selected using DOD_DIFF, 'SER_DATE_DX' selected without using DOD_DIFF)

Correct? 

 

A. Without using DOD_DIFF (accomplished):

 

  1. Choose the earliest date of visit
  2. If there are multiple records for that data, choose the lowest WGT_DX, call is WGT_DX_PARTIAL
  3. Pick one row per patient when these conditions meet

B. Using DOD_DIFF(accomplished):

 

  1. Choose the minimum DOD_DIFF
  2. If there are multiple records for that value of DOD_DIFF, choose the lowestWGT_DX, call it WGT_DX_FULL
  3. Pick one row per patient when these conditions meet

C. Calculate measurement error 

  1. DIFF_WGT_DX= WGT_DX_FULL - WGT_DX_PARTIAL (accomplished):
  2. DIFF_DATE=ABS(DIF('SER_DATE_DX' selected using DOD_DIFF, 'SER_DATE_DX' selected without using DOD_DIFF)
  3. ERROR= DIFF_WGT_DX*DIFF_DATE 

At the end, each patient has one ERROR value. 

novinosrin
Tourmaline | Level 20

Sounds like you want a calc difference between the two dates from the two results?

If yes, straight forward merge and subtract. Make sure you have difference names the date variables to avoid ambiguous reference or alias

Cruise
Ammonite | Level 13

Yes, however, to do that subtraction I need SER_DATE_DX in both sql tables. I have SER_DATE_DX in WGT_DX_PARTIAL tables but not in WGT_DX_FULL. 

 

Please let me check with you on this. What if I include SER_DATE_DX in sql for WGT_DX_FULL and then proc nodupkey by ID, DOD_DIFF and WGT_DX_FULL ? then merge and subtract between two SER_DATE_DX came from two sql tables? I'll do it more comfortably if you OK this approach 🙂 

 

 

proc sql;
create table P.WGT_DX_FULL as /*N=6,375*/
select distinct SER_DATE_DX, DOD_DIFF,ID,min(WGT_DX) as WGT_DX_FULL
from
(select DISTINCT *, min(DISTINCT(DOD_DIFF)) as min_DOD_DIFF
from P.DAYS30_DX_DEF
group by id 
having DOD_DIFF=min_DOD_DIFF)
group by id, min_DOD_DIFF
having min(DISTINCT(wgt_dx))=wgt_dx;
quit;

 

novinosrin
Tourmaline | Level 20

I failed to pay attention to the fact you dropped 

SER_DATE_DX

from the process that computes  WGT_DX_FULL earlier. I don't think you need to drop the date var

Cruise
Ammonite | Level 13
SQL produced multiples when I add SER_DATE_DX for select variables. The same thing happened when I wanted to keep CODE in the data which created duplicates

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 43 replies
  • 1436 views
  • 18 likes
  • 3 in conversation