BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
smackerz1988
Pyrite | Level 9

Hello,

 

I have this code I'm using to assign a flag to the worst case scenario (High or within range). (R is from a previous data step to group visits where subject is  on treatment) 

proc sql;
 create table flg as
   select distinct usubjid,
                   paramcd,
                   adt,
                   visit,
                   a1ind,						 
                   "Y" as anl02fl
						 
   from advs
   where r in (1,2) and a1ind in('High','Within range') and ~missing(trtsdt)
   group by usubjid, paramcd  
   having adt eq min(adt);	
quit;

 This is the spec for reference.

 

Set to "Y" for the worst case A1IND occurring post-baseline (including unscheduled and early termination visits) for each parameter for each subject. If multiple visits exist with the worst case, flag only the earliest occurring visit.

 

My question is how can I order the worst case variable A1IND so that if "High" does exist that gets higher priority over within range as at the moment I have subjects that have both values but because the visit date that has the value of  "Within range" occurs before "High" that is getting incorrectly assigned the flag when it should be High as that is the worst case scenario.

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @smackerz1988,

 

In your PROC SQL step you could define a (numeric or character) sort key, say x, whose sort order corresponds to your selection priorities. Then select the observation(s) with the minimum of x in each USUBJID-PARAMCD group (and finally drop x).

 

Example:

proc sql;
 create table flg(drop=x) as
   select distinct usubjid,
                   paramcd,
                   adt,
                   visit,
                   a1ind,
                   "Y" as anl02fl,
                   1e7*(a1ind eqt 'W')+1e6*missing(adt)+coalesce(adt,0) as x  /* temporary sort key */

   from advs
   where r in (1,2) and a1ind in('High','Within range') and ~missing(trtsdt)
   group by usubjid, paramcd
   having x=min(x);
quit;

The term 1e7*(a1ind eqt 'W') penalizes observations with A1IND values starting with "W" in such a way that a single observation with A1IND="High" would "beat" them (by getting a smaller x value). Next, missing visit dates (assuming SAS date values in variable ADT!) are penalized by 1e6*missing(adt) so that non-missing ADT values, if any, are preferred. Finally, among non-missing visit dates the smallest ADT value (=earliest visit date) will be selected. Or do you need a different handling of missing ADT values? Are there any at all? If not, the definition of x could be simplified to 1e7*(a1ind eqt 'W')+adt.

 

 

View solution in original post

16 REPLIES 16
PaigeMiller
Diamond | Level 26

Show us (a portion of) the actual data that illustrates the problem. Provide the data as working SAS data step code, which you can type in yourself, or follow these instructions. Do not provide the data via other methods.

 

Also please show the desired output from this data.

--
Paige Miller
PeterClemmensen
Tourmaline | Level 20

I don't follow this. Is it simply a question of ordering the output data so that the records with a1ind = 'High' comes before records with a1ind = 'Within range'?

smackerz1988
Pyrite | Level 9

Yes and no. So some subjects might only have "Within Range" for their analysis values so that would be their worst case scenario so to speak. But if a subject has an analysis value of both "High" and "Within Range" then the code I'm using will flag the earliest date (adt) regardless if the analysis value is "High" or "Within Range". I need to prioritise the value of "High" if it is present for a subject so that the code select the earliest date that contains that value and then if it isn't present I can just use the earliest occurring  "Within range" value

smackerz1988
Pyrite | Level 9
That Data to Data Step Macro looks great btw!
Reeza
Super User

SQL seems a bad choice for this type of question, is SQL absolutely required?

smackerz1988
Pyrite | Level 9
No, if there is a better way I'm all ears it's just because I currently am using PROC SQL code
FreelanceReinh
Jade | Level 19

Hello @smackerz1988,

 

In your PROC SQL step you could define a (numeric or character) sort key, say x, whose sort order corresponds to your selection priorities. Then select the observation(s) with the minimum of x in each USUBJID-PARAMCD group (and finally drop x).

 

Example:

proc sql;
 create table flg(drop=x) as
   select distinct usubjid,
                   paramcd,
                   adt,
                   visit,
                   a1ind,
                   "Y" as anl02fl,
                   1e7*(a1ind eqt 'W')+1e6*missing(adt)+coalesce(adt,0) as x  /* temporary sort key */

   from advs
   where r in (1,2) and a1ind in('High','Within range') and ~missing(trtsdt)
   group by usubjid, paramcd
   having x=min(x);
quit;

The term 1e7*(a1ind eqt 'W') penalizes observations with A1IND values starting with "W" in such a way that a single observation with A1IND="High" would "beat" them (by getting a smaller x value). Next, missing visit dates (assuming SAS date values in variable ADT!) are penalized by 1e6*missing(adt) so that non-missing ADT values, if any, are preferred. Finally, among non-missing visit dates the smallest ADT value (=earliest visit date) will be selected. Or do you need a different handling of missing ADT values? Are there any at all? If not, the definition of x could be simplified to 1e7*(a1ind eqt 'W')+adt.

 

 

smackerz1988
Pyrite | Level 9

Wow really interesting and cool solution! Thank you!. Would you have any additional links so I could learn more about that?

FreelanceReinh
Jade | Level 19

@smackerz1988 wrote:

Wow really interesting and cool solution! Thank you!. Would you have any additional links so I could learn more about that?


You're welcome. Other than the usual documentation links for the EQT and related comparison operators, the MISSING function and the COALESCE SQL function (not identical with the COALESCE function) I don't know of any references specifically on this technique. But I assume that similar sort keys have been used in other posts in this forum, in SAS conference papers or possibly books.

 

I had been thinking about this problem -- selecting records in PROC SQL based on a hierarchy of criteria -- earlier this year (without a concrete application), had the idea of constructing such sort keys and added this idea to my SAS notes (started in 1999). Your question seemed to be an opportunity to apply this idea to a real-world problem. So thank you for asking. :-)

smackerz1988
Pyrite | Level 9
You're very welcome!
smackerz1988
Pyrite | Level 9

@FreelanceReinh Can I ask if this can be expanded further?. I have a similar analysis flag specification with three values ("High", "Low" ,"Within range") but the wrinkle is , as highlighted in red, if a subject has both "High" and "Low" I have to flag the earliest occurrence for each?. Is this approach even viable here?

 

smackerz1988_1-1663080204253.png

 

FreelanceReinh
Jade | Level 19

I don't think the earlier solution can be expanded easily to the new task because checking (in PROC SQL) whether a USUBJID-PARAMCD group has only observations with ANRIND="Within range" would typically require a GROUP BY USUBJID, PARAMCD clause. But for the earliest-date flags you would rather need a GROUP BY USUBJID, PARAMCD, ANRIND clause. So I'd switch to a different technique such as BY-group processing in a DATA step where those nested BY groups are readily available: You could sort your data (including only "High", "Low" and "Within range" cases) into a new dataset HAVE by usubjid paramcd anrind adt. Then a DATA step as shown below would create a flag for the first observation in each USUBJID-PARAMCD-ANRIND group except for those where ANRIND="Within range" occurs together with ANRIND="High" or ANRIND="Low" in the same USUBJID-PARAMCD group.

data want;
set have;
by usubjid paramcd anrind adt;
if first.anrind & (anrind ne: 'W' | first.paramcd) then flag='Y';
run;

(Note that this step uses the alphabetical order of "High", "Low" coming before "Within range".) Unlike your PROC SQL step, this results in the complete dataset with the flag variable added.

 

Note that missing or tied minimum ADT values are handled differently in the DATA step than in PROC SQL:

Without further changes

  • a missing ADT would qualify as the "earliest" visit date because a numeric missing value is less than any non-missing value
  • only the first of several tied minimum ADT values would be flagged (if any).
smackerz1988
Pyrite | Level 9

@FreelanceReinh That is brilliant thank you so much for going into so much depth it's very much appreciated. My only drawback and I do understand the benefit of having simulated data more and more to illustrate the crux of the coding issue but this flag is for post baseline records only and where ANL02FL ='Y' and it currently is flagging for all screening records. I apologies it was more cumbersome that what it appeared to be 

 

USUBJID PARAMCD ANRIND ABLFL ANL02FL FLAG
1001 ALT Within range     Y
1001 ALT Within range Y Y  
1001 ALT Within range   Y  
1001 ALT Within range   Y  
1001 ALT Within range   Y  
1001 ALT Within range   Y  
1001 AST Within range     Y
1001 AST Within range Y Y  
1001 AST Within range   Y  
1001 AST Within range   Y  
1001 AST Within range   Y  
1001 AST Within range   Y  
1001 BASO Within range     Y
1001 BASO Within range Y    
1001 BASO Within range      
1001 BASO Within range      
1001 BASO Within range      
1001 BASO Within range      
1001 BASOLE Within range     Y
1001 BASOLE Within range Y    
1001 BASOLE Within range      
1001 BASOLE Within range      
1001 BASOLE Within range      
1001 BASOLE Within range      
1001 BILIC Within range     Y
1001 BILIC Within range Y Y  
1001 BILIC Within range   Y  
1001 BILIC Within range   Y  
1001 BILIC Within range   Y  
1001 BILIC Within range   Y  
1001 BILIU Within range     Y
1001 BILIU Within range Y Y  
1001 BILIU Within range   Y  
1001 BILIU Within range   Y  
1001 BILIU Within range   Y  
1001 BILIU Within range   Y  
1001 CREAT Within range     Y
1001 CREAT Within range Y Y  
1001 CREAT Within range   Y  
1001 CREAT Within range   Y  
1001 CREAT Within range   Y  
1001 CREAT Within range   Y  
1001 CRP Within range     Y
1001 CRP Within range Y Y  
1001 CRP Within range   Y  
1001 CRP Within range   Y  
1001 CRP Within range   Y  
1002 MCHC Low   Y Y
1002 MCHC Within range      
1002 MCHC Within range Y Y  
1002 MCHC High    Y  
1002 MCHC Low   Y  
1002 MCHC Within range   Y  
1002 MCV Within range     Y
1002 MCV Within range Y Y  
1002 MCV Within range   Y  
1002 MCV Within range   Y  
1002 MCV Within range   Y  
1002 MCV Within range   Y  
FreelanceReinh
Jade | Level 19

A simple solution would be to select the relevant observations to be flagged and then merge back with the original dataset. If there's no suitable merge key, you could create one.

Example:

data _tmp / view=_tmp;
set have;
_seqno+1;
run;

data _flg(keep=_seqno);
set _tmp;
where /* insert applicable conditions incl. ANL02FL='Y' */;
by usubjid paramcd anrind adt;
if first.anrind & (anrind ne: 'W' | first.paramcd);
run;

data want(drop=_seqno);
merge _tmp _flg(in=_f);
by _seqno;
if _f then flag='Y';
run;

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 16 replies
  • 3864 views
  • 10 likes
  • 5 in conversation