BookmarkSubscribeRSS Feed
AmitChop7391
Obsidian | Level 7

Hi,

 

I have a data set which contains the following columns: PTID, NDC, DRUG_CLASS1_FLAG, DRUG_CLASS2_FLAG, DRUG_CLASS3_FLAG, DRUG_CLASS4FLAG.

 

I want to find out if these DRUG_CLASS FLAGS have same value in first and last observation of a data set. 

For example, in first observation, DRUG_CLASS1_FLAG =1 and in last observation, DRUG_CLASS1_FLAG=1, then I want to find out these type of records

 

Any help would be appreciated.

 

Thanks in advance.

 

 

17 REPLIES 17
PaigeMiller
Diamond | Level 26
data new;
    set have end=eof;
    if _n_=1 or eof then output;
run;

proc summary data=new;
     var whatevervariablesyouwant;
     output out=want range= /autoname;
run;

If the output of PROC SUMMARY shows a zero for the RANGEs, then you have the same value in the first and last observation.

--
Paige Miller
AmitChop7391
Obsidian | Level 7

To be more clear. I have the following data set

 

PT_IDDRUG_CODEDRUG_1DRUG_2
1A1 
1A1 
1B 1
1B 1
1A1 
1A1 

 

Similarly, I have multiple patients data. I want to find out those patients which have same value of flag DRUG_1 or DRUG_2 for their first and last observation. 

For example, if for PTID =1, first observations has DRUG_1=1 and last observation has DRUG_1 =1 then this patient should be reported.

PaigeMiller
Diamond | Level 26

So we have to perform this analysis for each specific PT_ID, and not over the entire data set, as was my impression from the original post?

 

And your variables are character variables and not numeric as was my impression from the original post?

 

In your example, under DRUG_2, the first and last record shows a missing character value, does that count as having the same value?

--
Paige Miller
AmitChop7391
Obsidian | Level 7

Yes, we have to perform the analysis for each PT_ID. Some variables are character and some are numeric.

 

DRUG_1 and DRUG_2 are numeric variables and I want only those patients where DRUG_1 or DRUG_2 = 1 for first and last observation. We won't consider missing values.

 

Thanks

ballardw
Super User

@AmitChop7391 wrote:

To be more clear. I have the following data set

 

PT_ID DRUG_CODE DRUG_1 DRUG_2
1 A 1  
1 A 1  
1 B   1
1 B   1
1 A 1  
1 A 1  

 

Similarly, I have multiple patients data. I want to find out those patients which have same value of flag DRUG_1 or DRUG_2 for their first and last observation. 

For example, if for PTID =1, first observations has DRUG_1=1 and last observation has DRUG_1 =1 then this patient should be reported.


Is there any chance you have an actual DATE or some sort of sequence indicator? I have a sneaking suspicion this problem may be much easier to define first and last if you have one of those variables in your data.

AmitChop7391
Obsidian | Level 7

Yes, I also have a date field which is in ascending order. First record of each patient has min date and last record has max date.

ballardw
Super User

@AmitChop7391 wrote:

Yes, I also have a date field which is in ascending order. First record of each patient has min date and last record has max date.


 

Something like this will select the first an last records for each patient.

data temp;
   set have;
   by patient date;
   if first.patient or last.patient;
run;

Note that if there is only one record for the patient there will only be one so "first" and "last" are the same for all variables...

 

For numeric variables a modification of @PaigeMiller's solution

proc summary data=temp nway;
    class patient;
     var whatevervariablesyouwant;
     output out=want range= /autoname;
run;

would have 0 for any of the numeric variables with the same value.

 

At some point you should provide an example of what you want the final output to look like as well.

AmitChop7391
Obsidian | Level 7
PT_IDDRUG_NAMEDRUG_1DRUG_2Date
1A1 1/1/2017
1A1 5/2/2017
1B 18/2/2017
1B 19/2/2017
1A1 12/2/2017
1A1 12/31/2017
2A1 5/3/2017
2B 16/3/2017
2A1 9/3/2017
3A1 5/6/2017
3B 17/9/2017
3B 19/9/2017

 

So, I only want PT_ID 1 and 2 to be reported in my final data set as their DRUG_1 flags have same values in first and last observation.

 

Thanks

Kurt_Bremser
Super User
data test;
set have (obs=1);
output;
if nobs > 1;
set have nobs=nobs point=nobs;
output;
run;

If dataset have has more than one observation, you'll get the first and last observation. Otherwise, it will be one or no observation.

AmitChop7391
Obsidian | Level 7

To be more clear. I have the following data set

 

PT_IDDRUG_CODEDRUG_1DRUG_2
1A1 
1A1 
1B 1
1B 1
1A1 
1A1 

 

Similarly, I have multiple patients data. I want to find out those patients which have same value of flag DRUG_1 or DRUG_2 for their first and last observation. 

For example, if for PT_ID =1, first observations has DRUG_1=1 and last observation has DRUG_1 =1 then this patient should be reported.

Reeza
Super User

Could you please provide a more detailed sample data set. Please include at least two or three different ID's, the date variable and what you would expect as the output. Do you want the list of ID's where the first matches the last? Or all records? Or something else entirely?

 


@AmitChop7391 wrote:

To be more clear. I have the following data set

 

PT_ID DRUG_CODE DRUG_1 DRUG_2
1 A 1  
1 A 1  
1 B   1
1 B   1
1 A 1  
1 A 1  

 

Similarly, I have multiple patients data. I want to find out those patients which have same value of flag DRUG_1 or DRUG_2 for their first and last observation. 

For example, if for PT_ID =1, first observations has DRUG_1=1 and last observation has DRUG_1 =1 then this patient should be reported.


 

AmitChop7391
Obsidian | Level 7
PT_IDDRUG_NAMEDRUG_1DRUG_2Date
1A1 1/1/2017
1A1 5/2/2017
1B 18/2/2017
1B 19/2/2017
1A1 12/2/2017
1A1 12/31/2017
2A1 5/3/2017
2B 16/3/2017
2A1 9/3/2017
3A1 5/6/2017
3B 17/9/2017
3B 19/9/2017

 

So, I only want PT_ID 1 and 2 to be reported in my final data set as their DRUG_1 flags have same values in first and last observation.

ballardw
Super User

To use the RANGE function to check equality we can't have missing values. So here is a modified data set

and code:

data have;
   input PT_ID DRUG_NAME $ DRUG_1 DRUG_2 Date :mmddyy10.; 
   format date mmddyy10.;
   drug_1 = (drug_1=1);
   drug_2 = (drug_2=1);
datalines;
1 A 1 . 1/1/2017 
1 A 1 . 5/2/2017 
1 B . 1 8/2/2017 
1 B . 1 9/2/2017 
1 A 1 . 12/2/2017 
1 A 1 . 12/31/2017 
2 A 1 . 5/3/2017 
2 B . 1 6/3/2017 
2 A 1 . 9/3/2017 
3 A 1 . 5/6/2017 
3 B . 1 7/9/2017 
3 B . 1 9/9/2017 
;

data temp;
   set have;
   by  PT_ID;
   if first.pt_id or last.Pt_id;
run;
proc summary data=temp nway;
   class pt_id;
   var drug_1;
   output out=work.drug_1 (drop=_type_ _freq_ where=(drug_1=0))
       range = ;
run;

The range=0 ones have the same values.

 

I strongly suggest that you consider using flags of 1/0 coding instead of 1/.  . You get some very useful statistics such as percentages by taking the mean of a 1/0 variable. Sum gives you a count of 1's either way, MODE would report whether you have more 1 or 0 values but would not work with missing values,

Depending on specific needs the 0 often comes in handy.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 17 replies
  • 4814 views
  • 4 likes
  • 5 in conversation