BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
novinosrin
Tourmaline | Level 20

"My SAS is still running "proc sort nodupkey; by id codes date;"

 

Hmm that's worrying. Not knowing your data and your infrastructure is making it hard. Anyways, for the hash code to work, both the temp and small should have the same IDs. Please make sure you are mindful about that. Have a good day. Off for coffee!

Cruise
Ammonite | Level 13
Enjoy your coffee. Thank for using the word "worrying". It's interesting my worry goes down when someone shows an understanding.
Cruise
Ammonite | Level 13

@novinosrin 

 

I'm using proc sql-inner join to start with. H folder has 1.88TB free space while Z folder has only 465GB. I'm using H to output a resulting table from SQL. Hope this helps. But @Astounding pointed out that I needed more space in H. Let's see what happens.

 

proc sql;
create table h.want as
select a.id,codes,date
from z.small a inner join z.large b
on a.id=b.id
order by id,codes ,date;
quit;
novinosrin
Tourmaline | Level 20

Good. The people who have offered advice are highly experienced to the tune of my 5+ years*N number of their years. So I truly trust them. My fun here is pretty much to do with solving with coding akin to  a video game that i treat this like Xbox or PlayStation. So from the point of  infrastructure, administration, capacity planning, contingency planning  and beyond, you are right that your best bet is to seek and receive advice from wise and senior people like @Astounding , @Kurt_Bremser , @Ksharp , @Patrick  etc. 

I shamelessly admit I am useless in that regard. 

 

 

Kurt_Bremser
Super User

SQL joins like that are VERY intensive on the WORK library, so you might run out of space there. If WORK shares its physical location with library H, a crash is pre-programmed.

That's why I suggested a simple data step merge.

And if you have non-uniqueness in both input datasets, the SQL will create a cartesian join and cause a guaranteed crash.

Cruise
Ammonite | Level 13

@Kurt_Bremser I thought too. but not that bad. SQL outputted N-Rows=108,956,917 and N_uniq_id=181,978. However, I have only selected three variables because I didn't believe this could output a table. But it did. Now I'm running the same SQL again with select *. There are green warning messages below. Is it safe to ignore these warnings?

WARNING: Column named ID is duplicated in a select expression (or a view).
         Explicit references to it will be to the first one.

WARNING: Variable obligation already exists on file H.MERGED.

novinosrin
Tourmaline | Level 20

Show us your SQL code plz

Cruise
Ammonite | Level 13

@novinosrin , @Kurt_Bremser 

 

small has "id" and other 20 variables which will be used in the analysis

large has only "id", "codes" and "date"

 

proc sql;
create table H.merged as
select *
from r.small a inner join R.large b
on a.id=b.id
order by id,codes,date;
quit;
novinosrin
Tourmaline | Level 20

Believing your small as unique ID as discussed before .that is each obs has distinct ID

 

proc sql;
create table H.merged as
select a.*,codes,date
from r.small a inner join R.large b
on a.id=b.id
order by a.id,codes,date;
quit;

This assumed codes and date are not in small  

Cruise
Ammonite | Level 13
GREAT. No warning message now. I understand the logic. But what is the benefit of order by a.id? instead just id? is it to save time? also to avoid duplicate queries?
novinosrin
Tourmaline | Level 20

Well, the order of Proc SQL output is never guaranteed and so that pretty much warrants an order by clause to have your output ordered based on a group unless otherwise desired or if order doesn't matter. 

 

I generally apply the order by clause whenever I use Proc SQL in the objective that this output may serve a purpose that would go as input for further processing where we might be using a datastep DOW/By Group processing etc. And In general, SAS works best when have you ordered long data as opposed to unordered wide data. This is thumb rule!

 

@Cruise   But what is the benefit of order by a.id? instead just id? 

 

That is to avoid ambiguous reference as Sql processor doesn't know which ID column  to select as it exists in both tables

Kurt_Bremser
Super User

@Cruise wrote:

@Kurt_Bremser I thought too. but not that bad. SQL outputted N-Rows=108,956,917 and N_uniq_id=181,978. However, I have only selected three variables because I didn't believe this could output a table. But it did. Now I'm running the same SQL again with select *. There are green warning messages below. Is it safe to ignore these warnings?

WARNING: Column named ID is duplicated in a select expression (or a view).
         Explicit references to it will be to the first one.

WARNING: Variable obligation already exists on file H.MERGED.


That's what happens when you use the asterisk. You need to be specific as to what columns from where should be included in the target dataset.

Cruise
Ammonite | Level 13

Thanks everybody: Mission accomplished.

 

NOTE: Table H.MERGED created, with 108956917 rows and 108 columns.

31   quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           36:31.60
      cpu time            4:17.32
Cruise
Ammonite | Level 13

@Kurt_Bremser @novinosrin 

 

for future reference. final success code is below which was written by novinosirin.

 

proc sql;
create table H.merged as
select a.*,codes,date
from r.small a inner join R.large b
on a.id=b.id
order by a.id,codes,date;
quit;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 60 replies
  • 3407 views
  • 35 likes
  • 7 in conversation