Solved: Proc SQL Do loop Import

ccaudillo100 · Posted 11-02-2022 11:28 AM

Hello, currently I run:

Proc SQL;
CREATE TABLE Name AS (
Select Effective_Date
,ID
,FullName
FROM xx.xx Where NameID = 1);
Quit;

Proc sort data= Name; by ID Effective_Date; Run;

data Name1 (Keep= FullName ID);
do until(last.ID);
set Name;
by ID Effective_Date;
end;
run;

and this gives me my desired result. I am looking to see if I can cut this by making it all into one step under Proc SQL. Any advice or guidance is much appreciated!

FreelanceReinh · Posted 11-02-2022 03:52 PM

Hello @ccaudillo100,

If there are no duplicate "effective dates" within an ID, you can use a HAVING clause in PROC SQL to select the most recent FullName per ID in one step:

proc sql;
create table Name1 as
select ID, FullName
from xx.xx where NameID = 1
group by ID
having Effective_Date=max(Effective_Date);
quit;

However, if there are duplicate "effective dates" within an ID, two different cases are possible:

The names (FullName) in a group of records with the same ID and latest Effective_Date are always identical. In this case you could insert the DISTINCT keyword into the SELECT statement:
```
select distinct ID, FullName
```
to avoid duplicate observations in dataset Name1.
The names are not identical, i.e., they are inconsistent. In this case your existing DATA step code would select only one of the different names, you wouldn't even notice the inconsistencies and it would depend on the order of observations in dataset Name which of the names would be selected. Given that dataset Name is created by PROC SQL, the selection might be hardly predictable. In contrast, the PROC SQL step suggested above (with DISTINCT inserted) would select all of the inconsistent names, if any, so you could use, e.g., a (post-processing) DATA step like this
```
data incons;
set Name1;
by ID;
if ~(first.ID & last.ID);
run;
```
to examine the inconsistencies or to confirm that there are none.

View solution in original post

ballardw · Posted 11-02-2022 12:27 PM

What is the "DO Loop Import" references in the subject? I don't see anything resembling an "import".

It might help a lot to include some example data and what you expect for a result. There are some things that may be possible depending on the types of some of those variables with aggregate functions coupled with Group by and Having clauses.

As a minimum though you can skip the separate sort by using an Order by clause in the Proc SQL

Proc SQL;
   CREATE TABLE Name AS 
   Select Effective_Date
   ,ID
   ,FullName
   FROM xx.xx 
   Where NameID = 1
   order by Id, Effective_date
;
Quit;

I am a little concerned that your use or ( ) after "Create table name as (" may indicate that you intend to use the result as a subquery. Which may mean that some solutions may not be quite appropriate in that role.

ccaudillo100 · Posted 11-02-2022 12:54 PM

From xx.xx is an online data base attached to SAS through the company I work for and I am pulling data from that data base in real time. There are name changes to the IDs that occur and I am only wanting to keep the latest one that happened for each ID. sometimes there are only 2 name changes, sometimes there are 50. So far Do loop is the only thing I have found that works. The import time take a long time since there are so many name changes logged. there are about 4000+ IDs but when the pull happens, it is bringing in over 200,000 names.

PaigeMiller · Posted 11-02-2022 12:46 PM

Loop implies you want to do this more than once, with some condition changing on each iteration. You have not specified the condition that changes on each iteration. Could you please clear this up? Thanks!

--
Paige Miller

ccaudillo100 · Posted 11-02-2022 12:54 PM

From xx.xx is an online data base attached to SAS through the company I work for and I am pulling data from that data base in real time. There are name changes to the IDs that occur and I am only wanting to keep the latest one that happened for each ID. sometimes there are only 2 name changes, sometimes there are 50. So far Do loop is the only thing I have found that works. The import time take a long time since there are so many name changes logged. there are about 4000+ IDs but when the pull happens, it is bringing in over 200,000 names.

PaigeMiller · Posted 11-02-2022 01:54 PM

@ccaudillo100 wrote:
From xx.xx is an online data base attached to SAS through the company I work for and I am pulling data from that data base in real time. There are name changes to the IDs that occur and I am only wanting to keep the latest one that happened for each ID. sometimes there are only 2 name changes, sometimes there are 50. So far Do loop is the only thing I have found that works. The import time take a long time since there are so many name changes logged. there are about 4000+ IDs but when the pull happens, it is bringing in over 200,000 names.

I don't see how this answers my question. In the first iteration of the loop, compared to the second iteration of the loop, what is changed?

--
Paige Miller

ccaudillo100 · Posted 11-02-2022 02:15 PM

I am confused by what you are asking. Sorry. I am just looking to see if there is a way I could do this all in 1 step vs doing 2 steps.

PaigeMiller · Posted 11-02-2022 02:32 PM

@ccaudillo100 wrote:

I am confused by what you are asking. Sorry. I am just looking to see if there is a way I could do this all in 1 step vs doing 2 steps.

Well, I am confused too. Combining the two steps you show in your code into 1 is not a DO loop.

--
Paige Miller

Kurt_Bremser · Posted 11-02-2022 02:33 PM

Instead of loading everything into SQL, omit the SQL altogether:

proc sort
  data=xx.xx (
    keep=id effective_date fullname nameid
    where=(nameid = 1)
  )
  out=name (drop=nameid)
;
by ID Effective_Date;
run;

data Name1 (keep=FullName ID);
set Name;
by ID Effective_Date;
if last.id;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

FreelanceReinh · Posted 11-02-2022 03:52 PM

Hello @ccaudillo100,

If there are no duplicate "effective dates" within an ID, you can use a HAVING clause in PROC SQL to select the most recent FullName per ID in one step:

proc sql;
create table Name1 as
select ID, FullName
from xx.xx where NameID = 1
group by ID
having Effective_Date=max(Effective_Date);
quit;

However, if there are duplicate "effective dates" within an ID, two different cases are possible:

The names (FullName) in a group of records with the same ID and latest Effective_Date are always identical. In this case you could insert the DISTINCT keyword into the SELECT statement:
```
select distinct ID, FullName
```
to avoid duplicate observations in dataset Name1.
The names are not identical, i.e., they are inconsistent. In this case your existing DATA step code would select only one of the different names, you wouldn't even notice the inconsistencies and it would depend on the order of observations in dataset Name which of the names would be selected. Given that dataset Name is created by PROC SQL, the selection might be hardly predictable. In contrast, the PROC SQL step suggested above (with DISTINCT inserted) would select all of the inconsistent names, if any, so you could use, e.g., a (post-processing) DATA step like this
```
data incons;
set Name1;
by ID;
if ~(first.ID & last.ID);
run;
```
to examine the inconsistencies or to confirm that there are none.

Proc SQL Do loop Import

Re: Proc SQL Do loop Import

Re: Proc SQL Do loop Import

Re: Proc SQL Do loop Import

Re: Proc SQL Do loop Import

Re: Proc SQL Do loop Import

Re: Proc SQL Do loop Import

Re: Proc SQL Do loop Import

Re: Proc SQL Do loop Import

Re: Proc SQL Do loop Import

Re: Proc SQL Do loop Import

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away