SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Need code to pull the repetative data and eliminate them from the file

Accepted Solution Solved
Reply
Contributor
Posts: 21
Accepted Solution

Need code to pull the repetative data and eliminate them from the file

Dear All,

 

I am using SAS data management studio 9.4 version, it is frontend interface where in there are only options of drag & drop and also to write few expressions.

 

I am in this sitution where the data is repeated and i need to eliminate them from the file. I have tried using clusters by grouping the primary key but getting stuck in the next condition. Please suggest me the appropriate solution, provided below the data.

 

Required : if roll no = 123A then remove the roll no, i.e. 0012 and 0064 has 123A area so the whole 0012 and 0064 should be removed from the data. The output should only consist of 12345 roll no

 

Input
Roll No Name Area
0012 KKKKK 123A
0012 KKKKK 3333
0012 KKKKK 7869
0012 KKKKK 7777
0012 KKKKK 913B
12345 LLLLL 7869
12345 LLLLL 123A
12345 LLLLL 3333
0064 MMMM 7869
0064 MMMM 7869
0064 MMMM 3333
0064 MMMM 123A
0064 MMMM 7869
0064 MMMM 6666
0064 MMMM 913B

 

Output
Roll No Name Area
12345 LLLLL 7869
12345 LLLLL 123A
12345 LLLLL 3333

 

Regards,
Shaheen 

 


Accepted Solutions
Solution
‎06-05-2017 11:03 PM
Contributor
Posts: 21

Re: Need code to pull the repetative data and eliminate them from the file

Dear All,

 

I found solution for this, used sql lookup to get the output. Thank you all.

 

Regards,

Shaheen

View solution in original post


All Replies
Super User
Posts: 5,424

Re: Need code to pull the repetative data and eliminate them from the file

I'm not a specialist in DM Studio, but I think you need to clarify the requirement.

First, what makes you want to keep the 12345 Roll no records? What so specific about them?

Second, once you filtered out what you want, how do you intend to use this? Assuming you are building some kind of match code that can be applied to production data streams...?

Data never sleeps
Respected Advisor
Posts: 4,173

Re: Need code to pull the repetative data and eliminate them from the file

[ Edited ]

@Shah

I'm not a DM Studio expert but looking into the documentation tells me that you've got the full SQL syntax at your command (i.e. SQL Query Node).

 

This should allow you to define logic similar to below done with "normal" SAS.

data have;
  input Roll_No $ Name $ Area $;
  datalines;
0012 KKKKK 123A
0012 KKKKK 3333
0012 KKKKK 7869
0012 KKKKK 7777
0012 KKKKK 913B
12345 LLLLL 7869
12345 LLLLL 123B
12345 LLLLL 3333
0064 MMMM 7869
0064 MMMM 7869
0064 MMMM 3333
0064 MMMM 123A
0064 MMMM 7869
0064 MMMM 6666
0064 MMMM 913B
;
run;

proc sql;
  create table want1 as
    select o.*
      from 
        have as o
      where not exists
        (select * from have i where i.area='123A' and i.roll_no=o.roll_no)
  ;
quit;

 

N.B: I've been a bit "struggling" with the logic and expected output you've posted and had to work on assumptions.

 

Above code returns all records with roll_no values where area is never 123A

In the sample data you've posted above condition would never be true so I've also modified your sample data for roll_no=12345

 

 

Solution
‎06-05-2017 11:03 PM
Contributor
Posts: 21

Re: Need code to pull the repetative data and eliminate them from the file

Dear All,

 

I found solution for this, used sql lookup to get the output. Thank you all.

 

Regards,

Shaheen

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 122 views
  • 2 likes
  • 3 in conversation