SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

DI Studio: Capture Observations or Locate First Duplicate

Reply
Frequent Contributor
Posts: 89

DI Studio: Capture Observations or Locate First Duplicate

Hi Community,

 

So, using DI Studio I can group columns and create a count(*) coulmn that captures the number in that group, but is there a way to locate the first record in that group?

 

I want to find the first time one of the duplicates appears, because I want to keep that one and close out the rest on a fact table.

 

I was thinking if I could have a column that had each observation within the group numbered I could simply look for the '1' in this column. Or if I could find another way to flag one record within the group, that'd be great.

 

Seems like a simple thing to do.

 

Thanks!

Super User
Posts: 5,441

Re: DI Studio: Capture Observations or Locate First Duplicate

I'm not sure how you would apply this logic into a job/flow, and how would the Table Loader step work...?

Nevertheless, I don't think that a standard transformation could do this in one step.

If your data is sorted, a User Written code with if first.your_group then counter+1; would do the trick.

Data never sleeps
Frequent Contributor
Posts: 89

Re: DI Studio: Capture Observations or Locate First Duplicate

[ Edited ]

Fortunately, I don't need this in a Table Loader step, but for the time being just for some analysis. I was thinking I could do it in an Extract, but maybe I would need to use a User Written piece, and sort the code first.

 

I'm a SAS beginner, so could you flesh out your suggestion just a bit more?

 

Thanks, Linus!

Super User
Posts: 5,441

Re: DI Studio: Capture Observations or Locate First Duplicate

If it's just for analysis DI Studio may not be the ideal environment. Rather Enterprise Guide or SAS Studio.
Either way, the data step would be something like:

data want;
Set have;
By myid;
If first.myid then counter=1;
Else counter +1;
Run;
Data never sleeps
PROC Star
Posts: 1,167

Re: DI Studio: Capture Observations or Locate First Duplicate

SQL typically isn't a good option for things like the "first" record in a group, UNLESS you can identify it by some combination of min or max of variables, in which case it's usually quite easy. Provide a few more details about what you need. Personally, I agree with @LinusH that you should use EG or Studio if it's exploratory. Tom
Ask a Question
Discussion stats
  • 4 replies
  • 315 views
  • 0 likes
  • 3 in conversation