02-05-2016 10:02 AM
Using SAS 7.
I am quite new to SAS.
Please see attached file. I have a similar dateset.
For each unique name I want to extract all the records from the latest Amount Spent=X onwards, inclusive of the observation with the latest Amount Spent=X
How do I do so?
02-05-2016 10:06 AM
Its a good idea to put test data as a datastep rather than attaching files. So this is just off the top of my head:
if first.name then o=0; /* o is a flag 0=not output, 1 to output */
if value="X" then o=1;
if o then output;
02-05-2016 10:16 AM
Thank you for the reply.
But I don't think you looked at the file.
The output that I want
The code you provided will only extract the observations with X, and after X.
02-05-2016 10:21 AM
Well, minor change to the code then (and no, I dont look at attachments - post test data in the form of a datastep):
data want; set have; retain o; by name; if first.name then o=1; if value="X" then o=0; /* If X is encountered, stop outputting */ if o then output; run;
02-05-2016 11:04 AM - edited 02-05-2016 11:42 AM
Here is another approach:
/* Create test data, incl. a name without Amount_Spent=.X */ data have; missing x; input Name $ Date :yymmdd. Amount_Spent; format date yymmddn8.; cards; John 20160521 6 John 20160101 50 John 20150526 X John 20150329 23 John 20150215 X John 20141212 10 Peter 20151203 150 Peter 20150321 X Peter 20141123 20 Albert 20150408 X Jack 20141024 50 ; /* Add sequential number in case that sort order is to be preserved */ data temp; set have; seqno=_n_; run; proc sort data=temp; by name descending date; run; /* Create target dataset up to sort order */ data want0; do until(last.name); set temp; by name; if amount_spent=.X then flagX=1; end; do until(last.name); set temp; by name descending date; if first.name then flag=1; if flagX & flag then output; if amount_spent=.X then flag=0; end; drop flag:; run; proc sort data=want0 out=want(drop=seqno); by seqno; run; proc print data=want; run;
Edit: Please note that I interpreted your "X" values as special missing values. If I were you, I would strongly prefer to define Amount_Spent as a numeric variable, because it contains numbers, not "codes" consisting of digits. Character variables are not suitable for doing calculations.
Is there a possibility that one person has spent two amounts on the same date, maybe one .X and one non-missing? What would be the selection criterion in this case?
Edit 2: Simplified definition of variable FLAG (if first.name ...).
02-05-2016 12:16 PM - edited 02-05-2016 12:17 PM
infile cards dsd;
informat name $6. date yymmdd8. amount_spent 8.;
format name $6. date yymmddn8. amount_spent 8.;
input Name Date Amount_Spent;
create table want as
from have where date >= (select max(date)