I have a data set like this:
id | varX | varY | year |
1 | 1.5 | 2 | 2018 |
1 | 2 | 1 | 2017 |
2 | 3.5 | 2 | 2017 |
2 | 8 | 1 | 2016 |
3 | 10 | 2 | 2016 |
3 | 12 | 1 | 2014 |
4 | 13 | 2 | 2018 |
4 | 14 | 2 | 2016 |
4 | 16 | 3 | 2015 |
I only want to keep one observation per ID, and I want to keep only if it is 2018 or 2017. So if 2018 exists I want it, otherwise I want 2017, otherwise I don't want it.
This is what I want to have:
id | varX | varY | year |
1 | 1.5 | 2 | 2018 |
2 | 3.5 | 2 | 2017 |
4 | 13 | 2 | 2018 |
proc sort data=have;
by id year;
run;
data want;
set have;
by id;
if last.id and year in (2017,2018);
run;
I start with 2180 observations where 139 are from 2018. With the code you provided, I end up with 5 observations. Don't know why.
@Satori wrote:
I start with 2180 observations where 139 are from 2018. With the code you provided, I end up with 5 observations. Don't know why.
The data you showed does not indicate any problems when I run my code against it. Please provide a larger and more representative sample of your data. Please provide this sample of your data as working SAS data step code, and not as screen capture or file attachment.
Can you post the code you have tried (and post the data as a DATA step with cards data)? Are you familiar with BY-group processing?
You could sort by ID and year, then use by group processing (if last.ID) to check the value of the year for that record, and only output it if it's 2017 or 2018.
Or if you're a SQL person you could group by ID, and select the record having having year=max(year) and year IN (2017,2018).
Could you provide the code for that?
From where do you determine which years to select? Should that be taken from the dataset itself?
I determined the years: I want it only if it's for 2018, and if I there isn't for 2018, I want 2017, but not both.
the dataset has more a date variable which I converted to just year variable, and for each ID there are many years.
Assuming the data is already sorted by ID and YEAR then just use BY group process and WHERE filtering.
data want;
set have;
by id year;
where year in (2017 2018);
if last.id;
run;
the data is already sorted by ID and descending year. Running this code got me an error.
ERROR: WHERE clause operator requires compatible variables.
@Satori wrote:
the data is already sorted by ID and descending year. Running this code got me an error.
ERROR: WHERE clause operator requires compatible variables.
Why would you have created the YEAR variable as a CHARACTER string?
If the data is in DESCENDING order of YEAR then you want the FIRST one and not the LAST one.
data want;
set have;
by id descending year;
where year in ('2017' '2018');
if first.id;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.