BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ccaudillo100
Obsidian | Level 7

Hello, currently I run:

 

Proc SQL;
CREATE TABLE Name AS (
Select Effective_Date
,ID
,FullName
FROM xx.xx Where NameID = 1);
Quit;

Proc sort data= Name; by ID Effective_Date; Run;
data Name1 (Keep= FullName ID); do until(last.ID); set Name; by ID Effective_Date; end; run;

 

and this gives me my desired result. I am looking to see if I can cut this by making it all into one step under Proc SQL. Any advice or guidance is much appreciated! 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @ccaudillo100,

 

If there are no duplicate "effective dates" within an ID, you can use a HAVING clause in PROC SQL to select the most recent FullName per ID in one step:

proc sql;
create table Name1 as
select ID, FullName
from xx.xx where NameID = 1
group by ID
having Effective_Date=max(Effective_Date);
quit;

 

However, if there are duplicate "effective dates" within an ID, two different cases are possible:

  1. The names (FullName) in a group of records with the same ID and latest Effective_Date are always identical. In this case you could insert the DISTINCT keyword into the SELECT statement:
    select distinct ID, FullName
    to avoid duplicate observations in dataset Name1.
  2. The names are not identical, i.e., they are inconsistent. In this case your existing DATA step code would select only one of the different names, you wouldn't even notice the inconsistencies and it would depend on the order of observations in dataset Name which of the names would be selected. Given that dataset Name is created by PROC SQL, the selection might be hardly predictable. In contrast, the PROC SQL step suggested above (with DISTINCT inserted) would select all of the inconsistent names, if any, so you could use, e.g., a (post-processing) DATA step like this
    data incons;
    set Name1;
    by ID;
    if ~(first.ID & last.ID);
    run;
    to examine the inconsistencies or to confirm that there are none.

View solution in original post

9 REPLIES 9
ballardw
Super User

What is the "DO Loop Import" references in the subject? I don't see anything resembling an "import".

 

 

It might help a lot to include some example data and what you expect for a result. There are some things that may be possible depending on the types of some of those variables with aggregate functions coupled with Group by and Having clauses.

 

As a minimum though you can skip  the separate sort by using an Order by clause in the Proc SQL

Proc SQL;
   CREATE TABLE Name AS 
   Select Effective_Date
   ,ID
   ,FullName
   FROM xx.xx 
   Where NameID = 1
   order by Id, Effective_date
;
Quit;

I am a little concerned that your use or ( ) after "Create table name as (" may indicate that you intend to use the result as a subquery. Which may mean that some solutions may not be quite appropriate in that role.

ccaudillo100
Obsidian | Level 7

From xx.xx is an online data base attached to SAS through the company I work for and I am pulling data from that data base in real time. There are name changes to the IDs that occur and I am only wanting to keep the latest one that happened for each ID. sometimes there are only 2 name changes, sometimes there are 50. So far Do loop is the only thing I have found that works. The import time take a long time since there are so many name changes logged. there are about 4000+ IDs but when the pull happens, it is bringing in over 200,000 names.

PaigeMiller
Diamond | Level 26

Loop implies you want to do this more than once, with some condition changing on each iteration. You have not specified the condition that changes on each iteration. Could you please clear this up? Thanks!

--
Paige Miller
ccaudillo100
Obsidian | Level 7
From xx.xx is an online data base attached to SAS through the company I work for and I am pulling data from that data base in real time. There are name changes to the IDs that occur and I am only wanting to keep the latest one that happened for each ID. sometimes there are only 2 name changes, sometimes there are 50. So far Do loop is the only thing I have found that works. The import time take a long time since there are so many name changes logged. there are about 4000+ IDs but when the pull happens, it is bringing in over 200,000 names.
PaigeMiller
Diamond | Level 26

@ccaudillo100 wrote:
From xx.xx is an online data base attached to SAS through the company I work for and I am pulling data from that data base in real time. There are name changes to the IDs that occur and I am only wanting to keep the latest one that happened for each ID. sometimes there are only 2 name changes, sometimes there are 50. So far Do loop is the only thing I have found that works. The import time take a long time since there are so many name changes logged. there are about 4000+ IDs but when the pull happens, it is bringing in over 200,000 names.

I don't see how this answers my question. In the first iteration of the loop, compared to the second iteration of the loop, what is changed?

--
Paige Miller
ccaudillo100
Obsidian | Level 7

 I am confused by what you are asking. Sorry. I am just looking to see if there is a way I could do this all in 1 step vs doing 2 steps.

PaigeMiller
Diamond | Level 26

@ccaudillo100 wrote:

 I am confused by what you are asking. Sorry. I am just looking to see if there is a way I could do this all in 1 step vs doing 2 steps.


Well, I am confused too. Combining the two steps you show in your code into 1 is not a DO loop.

--
Paige Miller
Kurt_Bremser
Super User

Instead of loading everything into SQL, omit the SQL altogether:

proc sort
  data=xx.xx (
    keep=id effective_date fullname nameid
    where=(nameid = 1)
  )
  out=name (drop=nameid)
;
by ID Effective_Date;
run;

data Name1 (keep=FullName ID);
set Name;
by ID Effective_Date;
if last.id;
run;
FreelanceReinh
Jade | Level 19

Hello @ccaudillo100,

 

If there are no duplicate "effective dates" within an ID, you can use a HAVING clause in PROC SQL to select the most recent FullName per ID in one step:

proc sql;
create table Name1 as
select ID, FullName
from xx.xx where NameID = 1
group by ID
having Effective_Date=max(Effective_Date);
quit;

 

However, if there are duplicate "effective dates" within an ID, two different cases are possible:

  1. The names (FullName) in a group of records with the same ID and latest Effective_Date are always identical. In this case you could insert the DISTINCT keyword into the SELECT statement:
    select distinct ID, FullName
    to avoid duplicate observations in dataset Name1.
  2. The names are not identical, i.e., they are inconsistent. In this case your existing DATA step code would select only one of the different names, you wouldn't even notice the inconsistencies and it would depend on the order of observations in dataset Name which of the names would be selected. Given that dataset Name is created by PROC SQL, the selection might be hardly predictable. In contrast, the PROC SQL step suggested above (with DISTINCT inserted) would select all of the inconsistent names, if any, so you could use, e.g., a (post-processing) DATA step like this
    data incons;
    set Name1;
    by ID;
    if ~(first.ID & last.ID);
    run;
    to examine the inconsistencies or to confirm that there are none.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 732 views
  • 1 like
  • 5 in conversation