BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mcanzater
Calcite | Level 5

How do I replicate the commands below in SAS? 

 

In the 1st line, the code appends a file and simultaneously generates a flag (validated) where validated = 1 and not validated = 0. In the 2nd line, duplicate observations are identified and simultaneously generates a flag (dup) where both duplicates = 1 and non-duplicates = 0.  In the 3rd line, all duplicate observations that were contributed by the master dataset are removed (i.e. duplicate observations contributed by the append dataset are kept along with all non-duplicate observations contributed by the master dataset).  In the 4th line, the duplicate flag (dup) is removed from the dataset.

 

append using "Not_Validated_Data", generate(validated)

duplicates tag id date, generate(dup)

drop if dup ==1 & validated ==0

drop dup

1 ACCEPTED SOLUTION

Accepted Solutions
maguiremq
SAS Super FREQ

It's been a while since I've used Stata, but I think this does it. A little verbose, but it works.

data validated;
	input id date $20.;
datalines;
1 1/1/2019
1 1/2/2019
1 3/4/2019
2 3/5/2019
2 4/5/2019
2 5/5/2019
2 6/7/2019
3 5/4/2019
;
run;

data not_validated;
	input id date $20.;
datalines;
1 3/5/2019
1 1/2/2019
1 3/4/2019
2 5/4/2019
2 4/5/2019
3 5/5/2019
3 5/4/2019
;
run;

data have;
	set validated (in=a)
		not_validated (in=b);
	if a and not b then validated = 1;
	else validated = 0;
run;

proc sort data=have;
	by id date;
run;

data want (drop=dup);
	set have2;
	by id date;
	if first.date then dup = 1;
	else dup = 0;
	if dup = 1 and validated = 0 then delete;
run;

 

 

View solution in original post

2 REPLIES 2
maguiremq
SAS Super FREQ

It's been a while since I've used Stata, but I think this does it. A little verbose, but it works.

data validated;
	input id date $20.;
datalines;
1 1/1/2019
1 1/2/2019
1 3/4/2019
2 3/5/2019
2 4/5/2019
2 5/5/2019
2 6/7/2019
3 5/4/2019
;
run;

data not_validated;
	input id date $20.;
datalines;
1 3/5/2019
1 1/2/2019
1 3/4/2019
2 5/4/2019
2 4/5/2019
3 5/5/2019
3 5/4/2019
;
run;

data have;
	set validated (in=a)
		not_validated (in=b);
	if a and not b then validated = 1;
	else validated = 0;
run;

proc sort data=have;
	by id date;
run;

data want (drop=dup);
	set have2;
	by id date;
	if first.date then dup = 1;
	else dup = 0;
	if dup = 1 and validated = 0 then delete;
run;

 

 

mcanzater
Calcite | Level 5

The solution provided me about 80% of what I was looking for; definitely put me on the right path.