SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Finding first and last within subgroups

Reply
Frequent Contributor
Posts: 88

Finding first and last within subgroups

[ Edited ]

Hi Everyone, 

 

I am trying to flag first and last observations based on baseline date (baseline_doc) for all same groupings of client_id.

 

For example, I want my final dataset to look like this:

 

Client_id       Baseline_Date          First.baseline_date       Last.baseline_date
     1               April 1 2016             0                           0
     1              April 20 2016             0                           1
     1              January 1 2016            1                           0
     2              January 1 2016            1                           0
     2              January 20 2016           0                           1
     3              February 20 2016          1                           1
   

This is the code I am using:

 

data FA_CA_DADOS_CLEAN_DELETED;
	set fa_ca_dados_clean2;
	by client_id baseline_doc;
	firstclient_id = first.client_id;
	lastclient_id = last.client_id;
	firstBaseline_doc = first.baseline_doc;
	lastBaseline_doc = last.baseline_doc;
run;

However, my code doesn't give me the output like I want. What is happening is that every client_id is getting tagged with a 1 for last.baseline_doc but there definitely are some that should be 0 b/c I have duplicate client_ids with different dates.

 

What am I doing wrong?

 

I copied this code directly from this resource:

https://onlinecourses.science.psu.edu/stat481/node/9

 

and it has the output I want so I'm not sure what is going on.

 

I proc sorted by client_id and baseline_date before I ran the code above!

Super User
Posts: 5,254

Re: Finding first and last within subgroups

First, confusing when you are mixing variable names (doc vs date).

Second, nice to show the actual output.

Third, I think you are confusing the levels of BY variable where first. and last. operate.

First. will tag any observation that is first within it's value of the specified  BY-group.

Since you have unique (my guess, I don't see the actual source data) values of baseline_doc/date, every row will tagged.

So my guess is that you only want tag set on first./last.client_id...? Perhaps that wasn't a coincidence that you omitted those variables in your desired output.

Data never sleeps
Frequent Contributor
Posts: 88

Re: Finding first and last within subgroups

Yes that is right I want to tag the first and last client_ids based on baseline date

 

please see my example desired output above. 

 

Anyone else have any ideas?

Super User
Posts: 5,069

Re: Finding first and last within subgroups

The underlying issue here is that your BASELINE_DATE variable is character.  That's not how SAS is built to handle dates.

 

Try converting it to numeric:

 

baseline_sasdate = input(baseline_date, worddate18.);

 

Then use the numeric version in your programming.

Frequent Contributor
Posts: 88

Re: Finding first and last within subgroups

Hi Astounding..

 

I double checked hoping you were right, but my baseline_doc variable is definitely number data type

 

Any other ideas?

 

Here is my full code so far:

 

proc sort data=FA_CA_DADOS_CLEAN out=fa_ca_dados_clean2;
by mrn baseline_doc;
run;


data FA_CA_DADOS_CLEAN_DELETED;
	set fa_ca_dados_clean2;
	by mrn baseline_doc;
	firstMRN = first.mrn;
	lastMRN = last.mrn;
	firstBaseline_doc = first.baseline_doc;
	lastBaseline_doc = last.baseline_doc;
run;

 MRN = client ID

Basedline_doc = baseline date

 

 

Super User
Posts: 5,069

Re: Finding first and last within subgroups

Well, if your baseline date is already numeric, then the output you are showing doesn't match the program you are running.  There is no way for April 1 2016 to sort earlier than January 1 2016, when the variable is on SAS's date scale.  That's the order you would get for a character variable, not numeric.  I can't see what's in your data, but I know that there's a mismatch somewhere.

 

 

 

Frequent Contributor
Posts: 88

Re: Finding first and last within subgroups

Thanks everyone!

 

I couldn't figure it out so I just ended up doing it quickly in excel...

 

if anyone else knows the reason please post a solution. this will forever bug meSmiley Sad

Super User
Posts: 9,662

Re: Finding first and last within subgroups

Did I miss something ?

Just SORT and use FIRST LAST keyword ?

 

 

proc sort data=have;

by Client_id       Baseline_Date ;

run;

 

data want;

 set have;

 by Client_id ;

 first=first.Client_id      ;

 last=last.Client_id      ;

run;

Ask a Question
Discussion stats
  • 7 replies
  • 481 views
  • 2 likes
  • 4 in conversation