BookmarkSubscribeRSS Feed
HB
Barite | Level 11 HB
Barite | Level 11

I have three datasets representing people by year.They should be unique.  If a person in in year1, I don't want them in year2 or year3.  If in year2, not in 1 or 3, if in 3 not in 1 or 2. 

 

Unfortunately in the real world, people manage to cross neat dataset boundaries and show up wherever.

 

I created a field in each record to mark the year of the record and did this:

 

proc sort data=fy07_data;
	by ssn;
run;

proc sort data=fy08_data;
	by ssn;
run;

proc sort data=fy09_data;
	by ssn;
run;

data data_07_09;
	merge 	        fy07_data (in=my07) 
			fy08_data (in=my08)
			fy09_data (in=my09);
	by ssn;
run;
* I was going to do something with the in's and didn't; 
data data_07_09_final; set data_07_09; if myyear07flag = 1 and myyear08flag = 1 then delete; if myyear07flag = 1 and myyear09flag = 1 then delete; if myyear08flag = 1 and myyear09flag = 1 then delete; myawardyear = '00'; if myyear07flag = 1 then myawardyear = '07'; if myyear08flag = 1 then myawardyear = '08'; if myyear09flag = 1 then myawardyear = '09'; run;

after which I drop the myyearXXflag fields befroe outputting the dataset.  This seems somehow inelegant.  It just feels like the process could be terser and cleaner.

 

Is there a better way to accomplish this or is this as good as anything?

 

Thanks.

5 REPLIES 5
Reeza
Super User

I'd probably stack them together, sorting by year, removing duplicates using PROC SORT NODUPKEY and then transposing if desired. Not sure there's a reason to at this point. You don't have any other variables in the dataset? If they had the same variables they'd be overwritten. 

 

data combined;
set fy07_data fy08_data fy09_data indsname=source;
source_name = source;
run;

proc sort data=combined; 
by id source_name;
run;

proc sort data=combined nodupkey;
by id;
run;

*transpose if desired to 'wide' format, but assuming the data is the same what happens to the other variables?;
HB
Barite | Level 11 HB
Barite | Level 11

The indsname option is new to me and super cool, but this doesn't do what I want. I want to delete records common between the tables.

 

So with

data fy07_data;
   input ssn;
   datalines;
11
12
13
14
15
16
17
18
21
22
31
32
;

data fy08_data;
   input ssn;
   datalines;
11
12
21
22
23
24
25
26
27
28
33
34
;

data fy09_data;
   input ssn;
   datalines;
13
14
21
22
31
32
33
34
35
36
37
38
;


data combined;
set fy07_data fy08_data fy09_data indsname=source;
source_name = source;
run;

proc sort data=combined; 
by source_name ssn;
run;

proc sort data=combined nodupkey;
by ssn;
run;


proc print data=combined noobs; 
 var ssn source_name;
run;

I get a data set of

                                     ssn     source_name

                                      11    WORK.FY07_DATA
                                      12    WORK.FY07_DATA
                                      13    WORK.FY07_DATA
                                      14    WORK.FY07_DATA
                                      15    WORK.FY07_DATA
                                      16    WORK.FY07_DATA
                                      17    WORK.FY07_DATA
                                      18    WORK.FY07_DATA
                                      21    WORK.FY07_DATA
                                      22    WORK.FY07_DATA
                                      23    WORK.FY08_DATA
                                      24    WORK.FY08_DATA
                                      25    WORK.FY08_DATA
                                      26    WORK.FY08_DATA
                                      27    WORK.FY08_DATA
                                      28    WORK.FY08_DATA
                                      31    WORK.FY07_DATA
                                      32    WORK.FY07_DATA
                                      33    WORK.FY08_DATA
                                      34    WORK.FY08_DATA
                                      35    WORK.FY09_DATA
                                      36    WORK.FY09_DATA
                                      37    WORK.FY09_DATA
                                      38    WORK.FY09_DATA

I want a data set of

                                     ssn     source_name

                                      15    WORK.FY07_DATA
                                      16    WORK.FY07_DATA
                                      17    WORK.FY07_DATA
                                      18    WORK.FY07_DATA
                                      23    WORK.FY08_DATA
                                      24    WORK.FY08_DATA
                                      25    WORK.FY08_DATA
                                      26    WORK.FY08_DATA
                                      27    WORK.FY08_DATA
                                      28    WORK.FY08_DATA
                                      35    WORK.FY09_DATA
                                      36    WORK.FY09_DATA
                                      37    WORK.FY09_DATA
                                      38    WORK.FY09_DATA
Ksharp
Super User
data fy07_data;
   input ssn;
   datalines;
11
12
13
14
15
16
17
18
21
22
31
32
;

data fy08_data;
   input ssn;
   datalines;
11
12
21
22
23
24
25
26
27
28
33
34
;

data fy09_data;
   input ssn;
   datalines;
13
14
21
22
31
32
33
34
35
36
37
38
;


data combined;
set fy07_data fy08_data fy09_data indsname=source;
source_name = source;
run;
proc sql;
select distinct source_name,ssn
 from combined
  group by ssn 
   having count(distinct source_name)=1;
quit;
Ksharp
Super User

proc sql;
(
select * from fy07_data 
except 
(select * from fy08_data union select * from fy09_data )
)

union

(
select * from  fy08_data
except 
(select * from fy07_data union select * from fy09_data )
)

union

(
select * from  fy09_data
except 
(select * from fy08_data union select * from fy07_data )
)
;
quit;
HB
Barite | Level 11 HB
Barite | Level 11

Okay, with that much unioning and excepting going on, I feel better about my

        if myyear07flag = 1 and myyear08flag = 1 then delete;
	if myyear07flag = 1 and myyear09flag = 1 then delete; 

stuff.  Lol.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 604 views
  • 1 like
  • 3 in conversation