BookmarkSubscribeRSS Feed
Satori
Quartz | Level 8

I have a data set like this:

 

idvarXvarYyear
11.522018
1212017
23.522017
2812016
31022016
31212014
41322018
41422016
41632015

 

I only want to keep one observation per ID, and I want to keep only if it is 2018 or 2017. So if 2018 exists I want it, otherwise I want 2017, otherwise I don't want it.

 

This is what I want to have:

 

idvarXvarYyear
11.522018
23.522017
41322018
10 REPLIES 10
PaigeMiller
Diamond | Level 26
proc sort data=have;
    by id year;
run;
data want;
    set have;
    by id;
    if last.id and year in (2017,2018);
run;
--
Paige Miller
Satori
Quartz | Level 8

I start with 2180 observations where 139 are from 2018. With the code you provided, I end up with 5 observations. Don't know why.

PaigeMiller
Diamond | Level 26

@Satori wrote:

I start with 2180 observations where 139 are from 2018. With the code you provided, I end up with 5 observations. Don't know why.


The data you showed does not indicate any problems when I run my code against it. Please provide a larger and more representative sample of your data. Please provide this sample of your data as working SAS data step code, and not as screen capture or file attachment.

--
Paige Miller
Quentin
Super User

Can you post the code you have tried (and post the data as a DATA step with cards data)?  Are you familiar with BY-group processing? 

 

You could sort by ID and year, then use by group processing (if last.ID) to check the value of the year for that record, and only output it if it's 2017 or 2018.

 

Or if you're a SQL person you could group by ID, and select the record having having year=max(year) and year IN (2017,2018).

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Satori
Quartz | Level 8

Could you provide the code for that?

Satori
Quartz | Level 8

I determined the years: I want it only if it's for 2018, and if I there isn't for 2018, I want 2017, but not both.

the dataset has more a date variable which I converted to just year variable, and for each ID there are many years.

Tom
Super User Tom
Super User

Assuming the data is already sorted by ID and YEAR then just use BY group process and WHERE filtering.

data want;
  set have;
  by id year;
  where year in (2017 2018);
  if last.id;
run;
Satori
Quartz | Level 8

the data is already sorted by ID and descending year. Running this code got me an error.

ERROR: WHERE clause operator requires compatible variables.

Tom
Super User Tom
Super User

@Satori wrote:

the data is already sorted by ID and descending year. Running this code got me an error.

ERROR: WHERE clause operator requires compatible variables.


Why would you have created the YEAR variable as a CHARACTER string?

If the data is in DESCENDING order of YEAR then you want the FIRST one and not the LAST one.

data want;
  set have;
  by id descending year;
  where year in ('2017' '2018');
  if first.id;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 512 views
  • 1 like
  • 5 in conversation