BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mcanzater
Calcite | Level 5

How do I replicate the commands below in SAS? 

 

In the 1st line, the code appends a file and simultaneously generates a flag (validated) where validated = 1 and not validated = 0. In the 2nd line, duplicate observations are identified and simultaneously generates a flag (dup) where both duplicates = 1 and non-duplicates = 0.  In the 3rd line, all duplicate observations that were contributed by the master dataset are removed (i.e. duplicate observations contributed by the append dataset are kept along with all non-duplicate observations contributed by the master dataset).  In the 4th line, the duplicate flag (dup) is removed from the dataset.

 

append using "Not_Validated_Data", generate(validated)

duplicates tag id date, generate(dup)

drop if dup ==1 & validated ==0

drop dup

1 ACCEPTED SOLUTION

Accepted Solutions
maguiremq
SAS Super FREQ

It's been a while since I've used Stata, but I think this does it. A little verbose, but it works.

data validated;
	input id date $20.;
datalines;
1 1/1/2019
1 1/2/2019
1 3/4/2019
2 3/5/2019
2 4/5/2019
2 5/5/2019
2 6/7/2019
3 5/4/2019
;
run;

data not_validated;
	input id date $20.;
datalines;
1 3/5/2019
1 1/2/2019
1 3/4/2019
2 5/4/2019
2 4/5/2019
3 5/5/2019
3 5/4/2019
;
run;

data have;
	set validated (in=a)
		not_validated (in=b);
	if a and not b then validated = 1;
	else validated = 0;
run;

proc sort data=have;
	by id date;
run;

data want (drop=dup);
	set have2;
	by id date;
	if first.date then dup = 1;
	else dup = 0;
	if dup = 1 and validated = 0 then delete;
run;

 

 

View solution in original post

2 REPLIES 2
maguiremq
SAS Super FREQ

It's been a while since I've used Stata, but I think this does it. A little verbose, but it works.

data validated;
	input id date $20.;
datalines;
1 1/1/2019
1 1/2/2019
1 3/4/2019
2 3/5/2019
2 4/5/2019
2 5/5/2019
2 6/7/2019
3 5/4/2019
;
run;

data not_validated;
	input id date $20.;
datalines;
1 3/5/2019
1 1/2/2019
1 3/4/2019
2 5/4/2019
2 4/5/2019
3 5/5/2019
3 5/4/2019
;
run;

data have;
	set validated (in=a)
		not_validated (in=b);
	if a and not b then validated = 1;
	else validated = 0;
run;

proc sort data=have;
	by id date;
run;

data want (drop=dup);
	set have2;
	by id date;
	if first.date then dup = 1;
	else dup = 0;
	if dup = 1 and validated = 0 then delete;
run;

 

 

mcanzater
Calcite | Level 5

The solution provided me about 80% of what I was looking for; definitely put me on the right path.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1219 views
  • 1 like
  • 2 in conversation