Solved: Append Dataset, Create Flag, and Remove Observations Based on Conditio...

mcanzater · Posted 09-10-2019 03:44 PM

How do I replicate the commands below in SAS?

In the 1st line, the code appends a file and simultaneously generates a flag (validated) where validated = 1 and not validated = 0. In the 2nd line, duplicate observations are identified and simultaneously generates a flag (dup) where both duplicates = 1 and non-duplicates = 0. In the 3rd line, all duplicate observations that were contributed by the master dataset are removed (i.e. duplicate observations contributed by the append dataset are kept along with all non-duplicate observations contributed by the master dataset). In the 4th line, the duplicate flag (dup) is removed from the dataset.

append using "Not_Validated_Data", generate(validated)

duplicates tag id date, generate(dup)

drop if dup ==1 & validated ==0

drop dup

maguiremq · Posted 09-10-2019 08:56 PM

It's been a while since I've used Stata, but I think this does it. A little verbose, but it works.

data validated;
	input id date $20.;
datalines;
1 1/1/2019
1 1/2/2019
1 3/4/2019
2 3/5/2019
2 4/5/2019
2 5/5/2019
2 6/7/2019
3 5/4/2019
;
run;

data not_validated;
	input id date $20.;
datalines;
1 3/5/2019
1 1/2/2019
1 3/4/2019
2 5/4/2019
2 4/5/2019
3 5/5/2019
3 5/4/2019
;
run;

data have;
	set validated (in=a)
		not_validated (in=b);
	if a and not b then validated = 1;
	else validated = 0;
run;

proc sort data=have;
	by id date;
run;

data want (drop=dup);
	set have2;
	by id date;
	if first.date then dup = 1;
	else dup = 0;
	if dup = 1 and validated = 0 then delete;
run;

View solution in original post

maguiremq · Posted 09-10-2019 08:56 PM

It's been a while since I've used Stata, but I think this does it. A little verbose, but it works.

data validated;
	input id date $20.;
datalines;
1 1/1/2019
1 1/2/2019
1 3/4/2019
2 3/5/2019
2 4/5/2019
2 5/5/2019
2 6/7/2019
3 5/4/2019
;
run;

data not_validated;
	input id date $20.;
datalines;
1 3/5/2019
1 1/2/2019
1 3/4/2019
2 5/4/2019
2 4/5/2019
3 5/5/2019
3 5/4/2019
;
run;

data have;
	set validated (in=a)
		not_validated (in=b);
	if a and not b then validated = 1;
	else validated = 0;
run;

proc sort data=have;
	by id date;
run;

data want (drop=dup);
	set have2;
	by id date;
	if first.date then dup = 1;
	else dup = 0;
	if dup = 1 and validated = 0 then delete;
run;

mcanzater · Posted 09-20-2019 11:50 AM

The solution provided me about 80% of what I was looking for; definitely put me on the right path.

Append Dataset, Create Flag, and Remove Observations Based on Condition

Re: Append Dataset, Create Flag, and Remove Observations Based on Condition

Re: Append Dataset, Create Flag, and Remove Observations Based on Condition

Re: Append Dataset, Create Flag, and Remove Observations Based on Condition

Append Dataset, Create Flag, and Remove Observations Based on Condition

Re: Append Dataset, Create Flag, and Remove Observations Based on Condition

Re: Append Dataset, Create Flag, and Remove Observations Based on Condition

Re: Append Dataset, Create Flag, and Remove Observations Based on Condition

Ready to join fellow brilliant minds for the SAS Hackathon?