Hi, I am newbie to SAS and i need help desperately.
getting straight to the point, I am trying to process the sas table into another table as described below:
Given:
data work;
input ID JOB_CD YEAR;
cards;
1 4 2006
1 4 2007
1 3 2008
1 5 2009
1 4 2010
2 5 2005
2 5 2006
7 1 2007
;
run;
WANT:
ID JOB_CD YEAR FIRST_YEAR LAST_YEAR
1 4 2006 2006 2007
1 4 2007 2006 2007
1 3 2008 2008 2008
1 5 2009 2009 2009
1 4 2010 2010 2010
2 5 2005 2005 2006
2 5 2006 2005 2006
7 1 2007 2007 2007
the actual data is much larger but,
basically I am trying to group the data by ID and JOB_CD, and get first_year and last_year for each job within each ID.
It seemed to work with using "BY ID JOB_CD", FIRST.ID, LAST.ID, RETAIN statements but the problem is, given the case above,
ID JOB_CD YEAR FIRST_YEAR LAST_YEAR
1 4 2006 2006 2010
but as you can see this is not what i want. This guy with ID 1 had had a job 4 then changed a job for a few years before changing back to the original job.
It does not matter if I use SQL for it.
Simpler the better, but I really want this to work.
Thanks for reading.
Be cautious. Just grouping by id and job_cd will not solve the fact that id 1 has two separate periods with job_cd = 4.
I propose some trickery:
data
int1 (keep=id job_cd period year)
int2 (keep=id period first_year last_year)
;
set work;
by id job_cd notsorted;
retain period_count first_year;
if first.id then period = 0;
if first.job_cd
then do;
period + 1;
first_year = year;
end;
if last.job_cd
then do;
last_year = year;
output int2;
end;
output int1;
run;
data want;
merge
int1
int2
;
by id period;
drop period;
run;
While there are a couple of ways to do this, here are some tools that I expect would be "next in line" for you to learn. You can skip the PROC SORT if the data set is already in the proper order.
proc sort data=have;
by id job_cd;
run;
proc summary data=have;
var year;
by id job_cd;
output out=years (drop=_type_ _freq_) min(year) = first_year max(year) = last_year;
run;
data want;
merge have years;
by id job_cd;
run;
Be cautious. Just grouping by id and job_cd will not solve the fact that id 1 has two separate periods with job_cd = 4.
I propose some trickery:
data
int1 (keep=id job_cd period year)
int2 (keep=id period first_year last_year)
;
set work;
by id job_cd notsorted;
retain period_count first_year;
if first.id then period = 0;
if first.job_cd
then do;
period + 1;
first_year = year;
end;
if last.job_cd
then do;
last_year = year;
output int2;
end;
output int1;
run;
data want;
merge
int1
int2
;
by id period;
drop period;
run;
You can use @Astounding's method by first creating a period variable like I did and then use that in the proc summary and the merge instead of job_cd.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.