SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Need to eliminate repetative data from the file

Accepted Solution Solved
Reply
Contributor
Posts: 21
Accepted Solution

Need to eliminate repetative data from the file

Dear All,

 

I am using SAS data management studio 9.4 version, it is frontend interface where in there are only options of drag & drop and also to write few expressions.

 

I am in this sitution where the data is repeated and i need to eliminate them from the file. I have tried using clusters by grouping the primary key but getting stuck in the next condition. Please suggest me the appropriate solution, provided below the data.

 

Required : if roll no = 123A then remove the roll no, i.e. 0012 and 0064 has 123A area so the whole 0012 and 0064 should be removed from the data. The output should only consist of 12345 roll no

 

Input
Roll No Name Area
0012 KKKKK 123A
0012 KKKKK 3333
0012 KKKKK 7869
0012 KKKKK 7777
0012 KKKKK 913B
12345 LLLLL 7869
12345 LLLLL 123A
12345 LLLLL 3333
0064 MMMM 7869
0064 MMMM 7869
0064 MMMM 3333
0064 MMMM 123A
0064 MMMM 7869
0064 MMMM 6666
0064 MMMM 913B

 

Output
Roll No Name Area
12345 LLLLL 7869
12345 LLLLL 123A
12345 LLLLL 3333

 

Regards,
Shaheen 


Accepted Solutions
Solution
‎06-05-2017 11:12 PM
Trusted Advisor
Posts: 1,128

Re: Need to eliminate repetative data from the file

data have;
infile cards missover;
input Roll_No 	Name$ 	Area$;
cards;
0012 	KKKKK 	123A
0012 	KKKKK 	3333
0012 	KKKKK 	7869
0012 	KKKKK 	7777
0012 	KKKKK 	913B
12345 	LLLLL 	7869
12345 	LLLLL 	123A
12345 	LLLLL 	3333
0064 	MMMM 	7869
0064 	MMMM 	7869
0064 	MMMM 	3333
0064 	MMMM 	123A
0064 	MMMM 	7869
0064 	MMMM 	6666
0064 	MMMM 	913B
;

proc sql;
create table test as select a.*,b.sum from have as a left join (select sum(count) as sum, area from (select count(distinct area) as count, area, roll_no from have group by area,roll_no) group by area) as b on a.area=b.area where b.sum>=3 order a.area,b.roll_no;
quit;

data want;
set test;
by area roll_no;
if last.area;
run;
Thanks,
Jag

View solution in original post


All Replies
Solution
‎06-05-2017 11:12 PM
Trusted Advisor
Posts: 1,128

Re: Need to eliminate repetative data from the file

data have;
infile cards missover;
input Roll_No 	Name$ 	Area$;
cards;
0012 	KKKKK 	123A
0012 	KKKKK 	3333
0012 	KKKKK 	7869
0012 	KKKKK 	7777
0012 	KKKKK 	913B
12345 	LLLLL 	7869
12345 	LLLLL 	123A
12345 	LLLLL 	3333
0064 	MMMM 	7869
0064 	MMMM 	7869
0064 	MMMM 	3333
0064 	MMMM 	123A
0064 	MMMM 	7869
0064 	MMMM 	6666
0064 	MMMM 	913B
;

proc sql;
create table test as select a.*,b.sum from have as a left join (select sum(count) as sum, area from (select count(distinct area) as count, area, roll_no from have group by area,roll_no) group by area) as b on a.area=b.area where b.sum>=3 order a.area,b.roll_no;
quit;

data want;
set test;
by area roll_no;
if last.area;
run;
Thanks,
Jag
Contributor
Posts: 21

Re: Need to eliminate repetative data from the file

Dear Jag,

 

I use parts supplied by dataflux, it is frontend tool where there are data source node which is used to import the input files and data validation node and expression node is used to draw the conditions. So I cannot write the coding what you have provided. Please suggest short condition which will help me to reach the output. Also note that my data is dynamic, the area will keep changing and the roll no's will increase based on the requirement. The data provided was just a sample of my input.

 

Regards,
Shaheen

Respected Advisor
Posts: 3,889

Re: Need to eliminate repetative data from the file

@Shah

Please don't post the same question twice.

Contributor
Posts: 21

Re: Need to eliminate repetative data from the file

Dear Patrick,

 

Sorry for posting the query twice, i had a confusion to choose the forum, SAS studio and Data Management. Please ignore.

 

Thank you.

 

Regards,

Shaheen

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 130 views
  • 0 likes
  • 3 in conversation