Solved: Re: how to do left join in sas dataset

johnhuang12 · Posted 01-17-2014 07:59 AM

Hi,

I have a quick question I have the following code

proc sql;

create table want as

select label, value

from dataSet1 left join dataSet2

on start<=value<=end

group by label

order start;

run;

The thing is this code is not working if I want to replicate this sql statement using a data step how would I do this

it should be a left join

data want

set dataSet1(in=a) dataSet2(in=b);

//not sure what do here

run;

vivek_shr · Posted 01-18-2014 06:23 AM

i think you can do left join in data step using Merge.Lets say I have two datasets Table1 and Table2 which have has common variable ID.We first need to sort the two datasets by ID.After Sorting the twosets we can code as :

Proc Sql;

Create table NEW as

Select T1.*,T2.* from

Table1 as T1 left join Table2 as T2

on T1.id=T2.id;

Quit;

This will give matching rows with the non-matching rows from left table in the new dataset NEW.

Similarly in data step:

Data NEW;

Merge Table1(in=T1) Table2(in=T2);

If T1;

by ID;

run;

I suppose this works.Apologies if it does not help out.

View solution in original post

LinusH · Posted 01-17-2014 09:01 AM

1. Why do you want to use a data step?

2. Search these forums and support.sas.com - this is one of the most frequent discussions in the history of SAS...

Data never sleeps

Vince28_Statcan · Posted 01-17-2014 09:43 AM

You would first need to check your syntax and error log if a piece of code does not work. If it doesn't produce expected results, then it might be a functionality does not oblige to your requirements.

proc sql;

create table want as

select label, value

from dataSet1 left join dataSet2

on start<=value<=end

group by label

order by start;

run;

quit;

There is also typically a good practice when using SQL to use the dot notation to prefix variable names as per their source i.e.

proc sql;

create table want as

select a.label, a.value

from dataSet1 as a left join dataSet2 as b

on b.start<=a.value<=b.end

group by a.label

order by b.start;

quit;

Additionally, I don't think (but I might be wrong here) that SQL standards use the group by statement in a similar fashion to SAS data step with a merge statement. That is, to achieve the equivalent by statement, you would normally want to have something like

proc sql;

create table want as

select a.label, a.value

from dataSet1 as a left join dataSet2 as b

on a.label=b.label and

b.start<=a.value<=b.end

group by a.label

order by b.start;

quit;

Furthermore there is an ambiguity with your order by statement (At least based on my best guess of what you were trying to achieve) since the LEFT join retains records from dataset1 which did not match and thus have no start and end values tied to the record.

The SQL group by statement is used to define what summary functions "sumarize by". The general idea is that all variables in the group by statement are found in the select statement and all other variables in the select statement use sumary functions so as to generate a single record per group by unique key. Otherwise, you generate a cartesian product.

Beyond that, I must agree with LinusH. Data step is not appropriate for all merges and it is why proc sql exists and is built in sas. It might be viable depending on your data but likely not as it is even less fuzz match friendly than proc sql (the fuzz occur with start<=value<=end condition). Other data step alternatives involving hash object are more flexible in terms of what can be achieved (when compared with a merge statement) but it is best to be familiar with proc sql prior to getting into the hash object as the object has memory limitations.

Hope this helps

Vincent

vivek_shr · Posted 01-18-2014 06:23 AM

i think you can do left join in data step using Merge.Lets say I have two datasets Table1 and Table2 which have has common variable ID.We first need to sort the two datasets by ID.After Sorting the twosets we can code as :

Proc Sql;

Create table NEW as

Select T1.*,T2.* from

Table1 as T1 left join Table2 as T2

on T1.id=T2.id;

Quit;

This will give matching rows with the non-matching rows from left table in the new dataset NEW.

Similarly in data step:

Data NEW;

Merge Table1(in=T1) Table2(in=T2);

If T1;

by ID;

run;

I suppose this works.Apologies if it does not help out.

Tom · Posted 01-18-2014 01:56 PM

What do you want it to do? Your choice of variable names makes it look like you want to create FORMAT and apply it to your data.

data dataset1;

input value @@;

cards;

1 23 3 5 15

run;

data dataset2 ;

input label $ start end ;

cards;

LOW 0 10

HIGH 10 100

run;

data fmt ;

retain fmtname 'DECODE' fmttype 'N';

set dataset2 ;

run;

proc format cntlin=fmt;

run;

data want;

set dataset1;

label = put(value,decode.);

run;

proc print;

run;

Obs value label

1 1 LOW

2 23 HIGH

3 3 LOW

4 5 LOW

5 15 HIGH

how to do left join in sas dataset

Re: how to do left join in sas dataset

Re: how to do left join in sas dataset

Re: how to do left join in sas dataset

Re: how to do left join in sas dataset

Re: how to do left join in sas dataset

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!