BY-Group processing to select specific observations, then output all group observatios to new table

Accepted Solution Solved
Reply
Contributor SM1
Contributor
Posts: 30
Accepted Solution

BY-Group processing to select specific observations, then output all group observatios to new table

I am working with personnel records structured as 1 event per person per row. I use SSN for all BY-Group work.

 

I'm doing some QA checks. I currently use this process:

 

1. Output all rows into a new table that meet the specific criteria.

 

data eventslt4;

set events;

if ((events eq 1) and (num_events lt 4)) then output eventslt4;

run;

 

2. De-duplicate the new table by SSN and keep only SSN.

 

Proc sort data = eventslt4 (keep = ssn)

Out=eventslt4_dedup nodupkey;

By ssn;

Run;

 

3. Use an inner join to generate a new table with all records associated with the SSNs.

 

proc sql;

create table eventslt4all as

select A.*, B.*

from eventslt4_dedup as A inner join

 events as B

on A.SSN = B.SSN;

quit;

 

Is there a more efficient way to do this task?

 

Thanks!

 

 


Accepted Solutions
Solution
‎03-16-2016 07:16 PM
Respected Advisor
Posts: 4,973

Re: BY-Group processing to select specific observations, then output all group observatios to new ta

My SQL isn't the greatest, but something along these lines should work:

 

proc sql;

create table eventslt4all as select * from events where ssn in

(select ssn from events where events=1 and num_events <= 4);

quit;

View solution in original post


All Replies
Esteemed Advisor
Posts: 5,198

Re: BY-Group processing to select specific observations, then output all group observatios to new ta

I don't understand the relation between events and num_events variables, and your statement "1 event per person per row".

But in similar situations, I think, I'm able to use a single SQL that uses the having clause, that can filter on aggregates, such as no of records within a by group.

Data never sleeps
Contributor SM1
Contributor
Posts: 30

Re: BY-Group processing to select specific observations, then output all group observatios to new ta

Sorry for the confusion. The variable 'Events' is a counter variable and "Num_Events" is a variable that records the number of events per SSN - but there's a value for this variable only in the 1st row of each SSN group (the value for this variable is missing in all other rows of each SSN group). So the syntax says look in the 1st row of each SSN group and select only those if Num_Events lt 4.

 

 

Esteemed Advisor
Posts: 5,198

Re: BY-Group processing to select specific observations, then output all group observatios to new ta

Pre-calculating num_events and sore it on the first row is unnecessary (and unpractical - if you need it stored, store in a separate data set). The num_events will be calculated on the fly in SQL).

Also, the events counter is not needed, at least for doing this filtering.

Data never sleeps
Solution
‎03-16-2016 07:16 PM
Respected Advisor
Posts: 4,973

Re: BY-Group processing to select specific observations, then output all group observatios to new ta

My SQL isn't the greatest, but something along these lines should work:

 

proc sql;

create table eventslt4all as select * from events where ssn in

(select ssn from events where events=1 and num_events <= 4);

quit;

Contributor SM1
Contributor
Posts: 30

Re: BY-Group processing to select specific observations, then output all group observatios to new ta

Your SQL is good enough!

 

Thanks for the suggested syntax. Much appreciated.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 404 views
  • 0 likes
  • 3 in conversation