I have three datasets representing people by year.They should be unique. If a person in in year1, I don't want them in year2 or year3. If in year2, not in 1 or 3, if in 3 not in 1 or 2.
Unfortunately in the real world, people manage to cross neat dataset boundaries and show up wherever.
I created a field in each record to mark the year of the record and did this:
proc sort data=fy07_data;
by ssn;
run;
proc sort data=fy08_data;
by ssn;
run;
proc sort data=fy09_data;
by ssn;
run;
data data_07_09;
merge fy07_data (in=my07)
fy08_data (in=my08)
fy09_data (in=my09);
by ssn;
run;
* I was going to do something with the in's and didn't;
data data_07_09_final;
set data_07_09;
if myyear07flag = 1 and myyear08flag = 1 then delete;
if myyear07flag = 1 and myyear09flag = 1 then delete;
if myyear08flag = 1 and myyear09flag = 1 then delete;
myawardyear = '00';
if myyear07flag = 1 then myawardyear = '07';
if myyear08flag = 1 then myawardyear = '08';
if myyear09flag = 1 then myawardyear = '09';
run;
after which I drop the myyearXXflag fields befroe outputting the dataset. This seems somehow inelegant. It just feels like the process could be terser and cleaner.
Is there a better way to accomplish this or is this as good as anything?
Thanks.
I'd probably stack them together, sorting by year, removing duplicates using PROC SORT NODUPKEY and then transposing if desired. Not sure there's a reason to at this point. You don't have any other variables in the dataset? If they had the same variables they'd be overwritten.
data combined;
set fy07_data fy08_data fy09_data indsname=source;
source_name = source;
run;
proc sort data=combined;
by id source_name;
run;
proc sort data=combined nodupkey;
by id;
run;
*transpose if desired to 'wide' format, but assuming the data is the same what happens to the other variables?;
The indsname option is new to me and super cool, but this doesn't do what I want. I want to delete records common between the tables.
So with
data fy07_data;
input ssn;
datalines;
11
12
13
14
15
16
17
18
21
22
31
32
;
data fy08_data;
input ssn;
datalines;
11
12
21
22
23
24
25
26
27
28
33
34
;
data fy09_data;
input ssn;
datalines;
13
14
21
22
31
32
33
34
35
36
37
38
;
data combined;
set fy07_data fy08_data fy09_data indsname=source;
source_name = source;
run;
proc sort data=combined;
by source_name ssn;
run;
proc sort data=combined nodupkey;
by ssn;
run;
proc print data=combined noobs;
var ssn source_name;
run;
I get a data set of
ssn source_name
11 WORK.FY07_DATA
12 WORK.FY07_DATA
13 WORK.FY07_DATA
14 WORK.FY07_DATA
15 WORK.FY07_DATA
16 WORK.FY07_DATA
17 WORK.FY07_DATA
18 WORK.FY07_DATA
21 WORK.FY07_DATA
22 WORK.FY07_DATA
23 WORK.FY08_DATA
24 WORK.FY08_DATA
25 WORK.FY08_DATA
26 WORK.FY08_DATA
27 WORK.FY08_DATA
28 WORK.FY08_DATA
31 WORK.FY07_DATA
32 WORK.FY07_DATA
33 WORK.FY08_DATA
34 WORK.FY08_DATA
35 WORK.FY09_DATA
36 WORK.FY09_DATA
37 WORK.FY09_DATA
38 WORK.FY09_DATA
I want a data set of
ssn source_name
15 WORK.FY07_DATA
16 WORK.FY07_DATA
17 WORK.FY07_DATA
18 WORK.FY07_DATA
23 WORK.FY08_DATA
24 WORK.FY08_DATA
25 WORK.FY08_DATA
26 WORK.FY08_DATA
27 WORK.FY08_DATA
28 WORK.FY08_DATA
35 WORK.FY09_DATA
36 WORK.FY09_DATA
37 WORK.FY09_DATA
38 WORK.FY09_DATA
data fy07_data;
input ssn;
datalines;
11
12
13
14
15
16
17
18
21
22
31
32
;
data fy08_data;
input ssn;
datalines;
11
12
21
22
23
24
25
26
27
28
33
34
;
data fy09_data;
input ssn;
datalines;
13
14
21
22
31
32
33
34
35
36
37
38
;
data combined;
set fy07_data fy08_data fy09_data indsname=source;
source_name = source;
run;
proc sql;
select distinct source_name,ssn
from combined
group by ssn
having count(distinct source_name)=1;
quit;
proc sql;
(
select * from fy07_data
except
(select * from fy08_data union select * from fy09_data )
)
union
(
select * from fy08_data
except
(select * from fy07_data union select * from fy09_data )
)
union
(
select * from fy09_data
except
(select * from fy08_data union select * from fy07_data )
)
;
quit;
Okay, with that much unioning and excepting going on, I feel better about my
if myyear07flag = 1 and myyear08flag = 1 then delete;
if myyear07flag = 1 and myyear09flag = 1 then delete;
stuff. Lol.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.