06-29-2016 02:09 PM - edited 06-29-2016 02:14 PM
I am trying to flag first and last observations based on baseline date (baseline_doc) for all same groupings of client_id.
For example, I want my final dataset to look like this:
Client_id Baseline_Date First.baseline_date Last.baseline_date 1 April 1 2016 0 0 1 April 20 2016 0 1 1 January 1 2016 1 0 2 January 1 2016 1 0 2 January 20 2016 0 1 3 February 20 2016 1 1
This is the code I am using:
data FA_CA_DADOS_CLEAN_DELETED; set fa_ca_dados_clean2; by client_id baseline_doc; firstclient_id = first.client_id; lastclient_id = last.client_id; firstBaseline_doc = first.baseline_doc; lastBaseline_doc = last.baseline_doc; run;
However, my code doesn't give me the output like I want. What is happening is that every client_id is getting tagged with a 1 for last.baseline_doc but there definitely are some that should be 0 b/c I have duplicate client_ids with different dates.
What am I doing wrong?
I copied this code directly from this resource:
and it has the output I want so I'm not sure what is going on.
I proc sorted by client_id and baseline_date before I ran the code above!
06-29-2016 02:25 PM
First, confusing when you are mixing variable names (doc vs date).
Second, nice to show the actual output.
Third, I think you are confusing the levels of BY variable where first. and last. operate.
First. will tag any observation that is first within it's value of the specified BY-group.
Since you have unique (my guess, I don't see the actual source data) values of baseline_doc/date, every row will tagged.
So my guess is that you only want tag set on first./last.client_id...? Perhaps that wasn't a coincidence that you omitted those variables in your desired output.
06-29-2016 02:30 PM
The underlying issue here is that your BASELINE_DATE variable is character. That's not how SAS is built to handle dates.
Try converting it to numeric:
baseline_sasdate = input(baseline_date, worddate18.);
Then use the numeric version in your programming.
06-29-2016 02:53 PM
I double checked hoping you were right, but my baseline_doc variable is definitely number data type
Any other ideas?
Here is my full code so far:
proc sort data=FA_CA_DADOS_CLEAN out=fa_ca_dados_clean2; by mrn baseline_doc; run; data FA_CA_DADOS_CLEAN_DELETED; set fa_ca_dados_clean2; by mrn baseline_doc; firstMRN = first.mrn; lastMRN = last.mrn; firstBaseline_doc = first.baseline_doc; lastBaseline_doc = last.baseline_doc; run;
MRN = client ID
Basedline_doc = baseline date
06-29-2016 03:00 PM
Well, if your baseline date is already numeric, then the output you are showing doesn't match the program you are running. There is no way for April 1 2016 to sort earlier than January 1 2016, when the variable is on SAS's date scale. That's the order you would get for a character variable, not numeric. I can't see what's in your data, but I know that there's a mismatch somewhere.
06-29-2016 04:10 PM
I couldn't figure it out so I just ended up doing it quickly in excel...
if anyone else knows the reason please post a solution. this will forever bug me
06-29-2016 11:40 PM
Did I miss something ?
Just SORT and use FIRST LAST keyword ?
proc sort data=have;
by Client_id Baseline_Date ;
by Client_id ;