Hi,
I have a data set which contains the following columns: PTID, NDC, DRUG_CLASS1_FLAG, DRUG_CLASS2_FLAG, DRUG_CLASS3_FLAG, DRUG_CLASS4FLAG.
I want to find out if these DRUG_CLASS FLAGS have same value in first and last observation of a data set.
For example, in first observation, DRUG_CLASS1_FLAG =1 and in last observation, DRUG_CLASS1_FLAG=1, then I want to find out these type of records
Any help would be appreciated.
Thanks in advance.
data new;
set have end=eof;
if _n_=1 or eof then output;
run;
proc summary data=new;
var whatevervariablesyouwant;
output out=want range= /autoname;
run;
If the output of PROC SUMMARY shows a zero for the RANGEs, then you have the same value in the first and last observation.
To be more clear. I have the following data set
PT_ID | DRUG_CODE | DRUG_1 | DRUG_2 |
1 | A | 1 | |
1 | A | 1 | |
1 | B | 1 | |
1 | B | 1 | |
1 | A | 1 | |
1 | A | 1 |
Similarly, I have multiple patients data. I want to find out those patients which have same value of flag DRUG_1 or DRUG_2 for their first and last observation.
For example, if for PTID =1, first observations has DRUG_1=1 and last observation has DRUG_1 =1 then this patient should be reported.
So we have to perform this analysis for each specific PT_ID, and not over the entire data set, as was my impression from the original post?
And your variables are character variables and not numeric as was my impression from the original post?
In your example, under DRUG_2, the first and last record shows a missing character value, does that count as having the same value?
Yes, we have to perform the analysis for each PT_ID. Some variables are character and some are numeric.
DRUG_1 and DRUG_2 are numeric variables and I want only those patients where DRUG_1 or DRUG_2 = 1 for first and last observation. We won't consider missing values.
Thanks
@AmitChop7391 wrote:
To be more clear. I have the following data set
PT_ID DRUG_CODE DRUG_1 DRUG_2 1 A 1 1 A 1 1 B 1 1 B 1 1 A 1 1 A 1
Similarly, I have multiple patients data. I want to find out those patients which have same value of flag DRUG_1 or DRUG_2 for their first and last observation.
For example, if for PTID =1, first observations has DRUG_1=1 and last observation has DRUG_1 =1 then this patient should be reported.
Is there any chance you have an actual DATE or some sort of sequence indicator? I have a sneaking suspicion this problem may be much easier to define first and last if you have one of those variables in your data.
Yes, I also have a date field which is in ascending order. First record of each patient has min date and last record has max date.
@AmitChop7391 wrote:
Yes, I also have a date field which is in ascending order. First record of each patient has min date and last record has max date.
Something like this will select the first an last records for each patient.
data temp; set have; by patient date; if first.patient or last.patient; run;
Note that if there is only one record for the patient there will only be one so "first" and "last" are the same for all variables...
For numeric variables a modification of @PaigeMiller's solution
proc summary data=temp nway; class patient; var whatevervariablesyouwant; output out=want range= /autoname; run;
would have 0 for any of the numeric variables with the same value.
At some point you should provide an example of what you want the final output to look like as well.
PT_ID | DRUG_NAME | DRUG_1 | DRUG_2 | Date |
1 | A | 1 | 1/1/2017 | |
1 | A | 1 | 5/2/2017 | |
1 | B | 1 | 8/2/2017 | |
1 | B | 1 | 9/2/2017 | |
1 | A | 1 | 12/2/2017 | |
1 | A | 1 | 12/31/2017 | |
2 | A | 1 | 5/3/2017 | |
2 | B | 1 | 6/3/2017 | |
2 | A | 1 | 9/3/2017 | |
3 | A | 1 | 5/6/2017 | |
3 | B | 1 | 7/9/2017 | |
3 | B | 1 | 9/9/2017 |
So, I only want PT_ID 1 and 2 to be reported in my final data set as their DRUG_1 flags have same values in first and last observation.
Thanks
data test;
set have (obs=1);
output;
if nobs > 1;
set have nobs=nobs point=nobs;
output;
run;
If dataset have has more than one observation, you'll get the first and last observation. Otherwise, it will be one or no observation.
To be more clear. I have the following data set
PT_ID | DRUG_CODE | DRUG_1 | DRUG_2 |
1 | A | 1 | |
1 | A | 1 | |
1 | B | 1 | |
1 | B | 1 | |
1 | A | 1 | |
1 | A | 1 |
Similarly, I have multiple patients data. I want to find out those patients which have same value of flag DRUG_1 or DRUG_2 for their first and last observation.
For example, if for PT_ID =1, first observations has DRUG_1=1 and last observation has DRUG_1 =1 then this patient should be reported.
Could you please provide a more detailed sample data set. Please include at least two or three different ID's, the date variable and what you would expect as the output. Do you want the list of ID's where the first matches the last? Or all records? Or something else entirely?
@AmitChop7391 wrote:
To be more clear. I have the following data set
PT_ID DRUG_CODE DRUG_1 DRUG_2 1 A 1 1 A 1 1 B 1 1 B 1 1 A 1 1 A 1
Similarly, I have multiple patients data. I want to find out those patients which have same value of flag DRUG_1 or DRUG_2 for their first and last observation.
For example, if for PT_ID =1, first observations has DRUG_1=1 and last observation has DRUG_1 =1 then this patient should be reported.
PT_ID | DRUG_NAME | DRUG_1 | DRUG_2 | Date |
1 | A | 1 | 1/1/2017 | |
1 | A | 1 | 5/2/2017 | |
1 | B | 1 | 8/2/2017 | |
1 | B | 1 | 9/2/2017 | |
1 | A | 1 | 12/2/2017 | |
1 | A | 1 | 12/31/2017 | |
2 | A | 1 | 5/3/2017 | |
2 | B | 1 | 6/3/2017 | |
2 | A | 1 | 9/3/2017 | |
3 | A | 1 | 5/6/2017 | |
3 | B | 1 | 7/9/2017 | |
3 | B | 1 | 9/9/2017 |
So, I only want PT_ID 1 and 2 to be reported in my final data set as their DRUG_1 flags have same values in first and last observation.
To use the RANGE function to check equality we can't have missing values. So here is a modified data set
and code:
data have; input PT_ID DRUG_NAME $ DRUG_1 DRUG_2 Date :mmddyy10.; format date mmddyy10.; drug_1 = (drug_1=1); drug_2 = (drug_2=1); datalines; 1 A 1 . 1/1/2017 1 A 1 . 5/2/2017 1 B . 1 8/2/2017 1 B . 1 9/2/2017 1 A 1 . 12/2/2017 1 A 1 . 12/31/2017 2 A 1 . 5/3/2017 2 B . 1 6/3/2017 2 A 1 . 9/3/2017 3 A 1 . 5/6/2017 3 B . 1 7/9/2017 3 B . 1 9/9/2017 ; data temp; set have; by PT_ID; if first.pt_id or last.Pt_id; run; proc summary data=temp nway; class pt_id; var drug_1; output out=work.drug_1 (drop=_type_ _freq_ where=(drug_1=0)) range = ; run;
The range=0 ones have the same values.
I strongly suggest that you consider using flags of 1/0 coding instead of 1/. . You get some very useful statistics such as percentages by taking the mean of a 1/0 variable. Sum gives you a count of 1's either way, MODE would report whether you have more 1 or 0 values but would not work with missing values,
Depending on specific needs the 0 often comes in handy.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.