Hi Everyone,
I am trying to flag first and last observations based on baseline date (baseline_doc) for all same groupings of client_id.
For example, I want my final dataset to look like this:
Client_id Baseline_Date First.baseline_date Last.baseline_date 1 April 1 2016 0 0 1 April 20 2016 0 1 1 January 1 2016 1 0 2 January 1 2016 1 0 2 January 20 2016 0 1 3 February 20 2016 1 1
This is the code I am using:
data FA_CA_DADOS_CLEAN_DELETED; set fa_ca_dados_clean2; by client_id baseline_doc; firstclient_id = first.client_id; lastclient_id = last.client_id; firstBaseline_doc = first.baseline_doc; lastBaseline_doc = last.baseline_doc; run;
However, my code doesn't give me the output like I want. What is happening is that every client_id is getting tagged with a 1 for last.baseline_doc but there definitely are some that should be 0 b/c I have duplicate client_ids with different dates.
What am I doing wrong?
I copied this code directly from this resource:
https://onlinecourses.science.psu.edu/stat481/node/9
and it has the output I want so I'm not sure what is going on.
I proc sorted by client_id and baseline_date before I ran the code above!
First, confusing when you are mixing variable names (doc vs date).
Second, nice to show the actual output.
Third, I think you are confusing the levels of BY variable where first. and last. operate.
First. will tag any observation that is first within it's value of the specified BY-group.
Since you have unique (my guess, I don't see the actual source data) values of baseline_doc/date, every row will tagged.
So my guess is that you only want tag set on first./last.client_id...? Perhaps that wasn't a coincidence that you omitted those variables in your desired output.
Yes that is right I want to tag the first and last client_ids based on baseline date
please see my example desired output above.
Anyone else have any ideas?
The underlying issue here is that your BASELINE_DATE variable is character. That's not how SAS is built to handle dates.
Try converting it to numeric:
baseline_sasdate = input(baseline_date, worddate18.);
Then use the numeric version in your programming.
Hi Astounding..
I double checked hoping you were right, but my baseline_doc variable is definitely number data type
Any other ideas?
Here is my full code so far:
proc sort data=FA_CA_DADOS_CLEAN out=fa_ca_dados_clean2; by mrn baseline_doc; run; data FA_CA_DADOS_CLEAN_DELETED; set fa_ca_dados_clean2; by mrn baseline_doc; firstMRN = first.mrn; lastMRN = last.mrn; firstBaseline_doc = first.baseline_doc; lastBaseline_doc = last.baseline_doc; run;
MRN = client ID
Basedline_doc = baseline date
Well, if your baseline date is already numeric, then the output you are showing doesn't match the program you are running. There is no way for April 1 2016 to sort earlier than January 1 2016, when the variable is on SAS's date scale. That's the order you would get for a character variable, not numeric. I can't see what's in your data, but I know that there's a mismatch somewhere.
Thanks everyone!
I couldn't figure it out so I just ended up doing it quickly in excel...
if anyone else knows the reason please post a solution. this will forever bug me:(
Did I miss something ?
Just SORT and use FIRST LAST keyword ?
proc sort data=have;
by Client_id Baseline_Date ;
run;
data want;
set have;
by Client_id ;
first=first.Client_id ;
last=last.Client_id ;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.