Hi, I am newbie to SAS and i need help desperately.
getting straight to the point, I am trying to process the sas table into another table as described below:
Given:
data work;
input ID JOB_CD YEAR;
cards;
1 4 2006
1 4 2007
1 3 2008
1 5 2009
1 4 2010
2 5 2005
2 5 2006
7 1 2007
;
run;
WANT:
ID JOB_CD YEAR FIRST_YEAR LAST_YEAR
1 4 2006 2006 2007
1 4 2007 2006 2007
1 3 2008 2008 2008
1 5 2009 2009 2009
1 4 2010 2010 2010
2 5 2005 2005 2006
2 5 2006 2005 2006
7 1 2007 2007 2007
the actual data is much larger but,
basically I am trying to group the data by ID and JOB_CD, and get first_year and last_year for each job within each ID.
It seemed to work with using "BY ID JOB_CD", FIRST.ID, LAST.ID, RETAIN statements but the problem is, given the case above,
ID JOB_CD YEAR FIRST_YEAR LAST_YEAR
1 4 2006 2006 2010
but as you can see this is not what i want. This guy with ID 1 had had a job 4 then changed a job for a few years before changing back to the original job.
It does not matter if I use SQL for it.
Simpler the better, but I really want this to work.
Thanks for reading.
Be cautious. Just grouping by id and job_cd will not solve the fact that id 1 has two separate periods with job_cd = 4.
I propose some trickery:
data
int1 (keep=id job_cd period year)
int2 (keep=id period first_year last_year)
;
set work;
by id job_cd notsorted;
retain period_count first_year;
if first.id then period = 0;
if first.job_cd
then do;
period + 1;
first_year = year;
end;
if last.job_cd
then do;
last_year = year;
output int2;
end;
output int1;
run;
data want;
merge
int1
int2
;
by id period;
drop period;
run;
While there are a couple of ways to do this, here are some tools that I expect would be "next in line" for you to learn. You can skip the PROC SORT if the data set is already in the proper order.
proc sort data=have;
by id job_cd;
run;
proc summary data=have;
var year;
by id job_cd;
output out=years (drop=_type_ _freq_) min(year) = first_year max(year) = last_year;
run;
data want;
merge have years;
by id job_cd;
run;
Be cautious. Just grouping by id and job_cd will not solve the fact that id 1 has two separate periods with job_cd = 4.
I propose some trickery:
data
int1 (keep=id job_cd period year)
int2 (keep=id period first_year last_year)
;
set work;
by id job_cd notsorted;
retain period_count first_year;
if first.id then period = 0;
if first.job_cd
then do;
period + 1;
first_year = year;
end;
if last.job_cd
then do;
last_year = year;
output int2;
end;
output int1;
run;
data want;
merge
int1
int2
;
by id period;
drop period;
run;
You can use @Astounding's method by first creating a period variable like I did and then use that in the proc summary and the merge instead of job_cd.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.