BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
bajtan
Calcite | Level 5

Hello,

I am trying to create a table using proc sql and I have a problem. Some but not all rows are getting doubled in the process. The code looks as follows:

 

PROC SQL;
CREATE TABLE WORK.STEP AS
SELECT
t1.group,
t1.time,

t1.profit
FROM WORK.INPUT_FILE t1
LEFT JOIN WORK.GROUPS t2 on (t2.g0="*" OR (find(t1.group, strip(t2.g0) ) AND (Missing(t2.g1) OR find(t1.group, strip(t2.g1)))
AND (Missing(t2.g2) OR find(t1.group, strip(t2.g2)))));
QUIT;

 

Main thing here is that t1.group variable has a lot of different values and I wanna use the summary procedure later to aggregate some of the profit numbers. So, the groups I want to use are stored in three variables in table two (g0, g1, g2). But, for some groups it does this weird thing, that this sql procedure duplicates rows for specific groups. Not all of them, just a few, that have nothing particular in common. Does anyone know, why this might be happening? 

 

Thanks in advance,

Bajtan

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

An SQL join will create all combinations of matching records.   So if one dataset contributes N observations and the other contributes M observations that match those N observations the result is N x M observations.  So assuming that there are no duplicates in the input "T1' dataset then more than one observation from "T2" must meet your ON condition for some of the observations from "T1".

 

For that query since you are only selecting variables from the T1 alias why not just add the DISTINCT keyword?

CREATE TABLE WORK.STEP AS
SELECT  distinct
 t1.group
,t1.time
...

View solution in original post

2 REPLIES 2
Tom
Super User Tom
Super User

An SQL join will create all combinations of matching records.   So if one dataset contributes N observations and the other contributes M observations that match those N observations the result is N x M observations.  So assuming that there are no duplicates in the input "T1' dataset then more than one observation from "T2" must meet your ON condition for some of the observations from "T1".

 

For that query since you are only selecting variables from the T1 alias why not just add the DISTINCT keyword?

CREATE TABLE WORK.STEP AS
SELECT  distinct
 t1.group
,t1.time
...
bajtan
Calcite | Level 5

I tried to use the distinct keyword and it looked like it helped. But once I got to a table that had more than one-part-group (t2.g1 wasn't missing), the problem was back. There is probably a problem with the condition behind on keyword, but I can't figure it out. Any ideas?

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 494 views
  • 1 like
  • 2 in conversation