Hi Everyone,
I am trying to flag first and last observations based on baseline date (baseline_doc) for all same groupings of client_id.
For example, I want my final dataset to look like this:
Client_id       Baseline_Date          First.baseline_date       Last.baseline_date
     1               April 1 2016             0                           0
     1              April 20 2016             0                           1
     1              January 1 2016            1                           0
     2              January 1 2016            1                           0
     2              January 20 2016           0                           1
     3              February 20 2016          1                           1
   This is the code I am using:
data FA_CA_DADOS_CLEAN_DELETED; set fa_ca_dados_clean2; by client_id baseline_doc; firstclient_id = first.client_id; lastclient_id = last.client_id; firstBaseline_doc = first.baseline_doc; lastBaseline_doc = last.baseline_doc; run;
However, my code doesn't give me the output like I want. What is happening is that every client_id is getting tagged with a 1 for last.baseline_doc but there definitely are some that should be 0 b/c I have duplicate client_ids with different dates.
What am I doing wrong?
I copied this code directly from this resource:
https://onlinecourses.science.psu.edu/stat481/node/9
and it has the output I want so I'm not sure what is going on.
I proc sorted by client_id and baseline_date before I ran the code above!
First, confusing when you are mixing variable names (doc vs date).
Second, nice to show the actual output.
Third, I think you are confusing the levels of BY variable where first. and last. operate.
First. will tag any observation that is first within it's value of the specified BY-group.
Since you have unique (my guess, I don't see the actual source data) values of baseline_doc/date, every row will tagged.
So my guess is that you only want tag set on first./last.client_id...? Perhaps that wasn't a coincidence that you omitted those variables in your desired output.
Yes that is right I want to tag the first and last client_ids based on baseline date
please see my example desired output above.
Anyone else have any ideas?
The underlying issue here is that your BASELINE_DATE variable is character. That's not how SAS is built to handle dates.
Try converting it to numeric:
baseline_sasdate = input(baseline_date, worddate18.);
Then use the numeric version in your programming.
Hi Astounding..
I double checked hoping you were right, but my baseline_doc variable is definitely number data type
Any other ideas?
Here is my full code so far:
proc sort data=FA_CA_DADOS_CLEAN out=fa_ca_dados_clean2; by mrn baseline_doc; run; data FA_CA_DADOS_CLEAN_DELETED; set fa_ca_dados_clean2; by mrn baseline_doc; firstMRN = first.mrn; lastMRN = last.mrn; firstBaseline_doc = first.baseline_doc; lastBaseline_doc = last.baseline_doc; run;
MRN = client ID
Basedline_doc = baseline date
Well, if your baseline date is already numeric, then the output you are showing doesn't match the program you are running. There is no way for April 1 2016 to sort earlier than January 1 2016, when the variable is on SAS's date scale. That's the order you would get for a character variable, not numeric. I can't see what's in your data, but I know that there's a mismatch somewhere.
Thanks everyone!
I couldn't figure it out so I just ended up doing it quickly in excel...
if anyone else knows the reason please post a solution. this will forever bug me:(
Did I miss something ?
Just SORT and use FIRST LAST keyword ?
proc sort data=have;
by Client_id Baseline_Date ;
run;
data want;
set have;
by Client_id ;
first=first.Client_id ;
last=last.Client_id ;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.
