BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
johnhuang12
Calcite | Level 5

Hi,

I have a quick question I have the following code

proc sql;

     create table want as

     select label, value

     from dataSet1 left join dataSet2

    on start<=value<=end

    group by label

    order start;

run;

The thing is this code is not working if I want to replicate this sql statement using a data step how would I do this

it should be a left join

data want

set dataSet1(in=a) dataSet2(in=b);

//not sure what do here

run;

1 ACCEPTED SOLUTION

Accepted Solutions
vivek_shr
Calcite | Level 5

i think you can do left join in data step using Merge.Lets say I have two datasets Table1 and Table2 which have has common variable ID.We first need to sort the two datasets by ID.After Sorting the twosets we can code as :

Proc Sql;

Create table NEW as

Select T1.*,T2.* from

Table1 as T1 left join Table2 as T2

on T1.id=T2.id;

Quit;

This will give matching rows with the non-matching rows from left table in the new dataset NEW.

Similarly in data step:

Data NEW;

Merge Table1(in=T1) Table2(in=T2);

If T1;

by ID;

run;

I suppose this works.Apologies if it does not help out.

View solution in original post

4 REPLIES 4
LinusH
Tourmaline | Level 20

1. Why do you want to use a data step?

2. Search these forums and support.sas.com - this is one of the most frequent discussions in the history of SAS...

Data never sleeps
Vince28_Statcan
Quartz | Level 8

You would first need to check your syntax and error log if a piece of code does not work. If it doesn't produce expected results, then it might be a functionality does not oblige to your requirements.

proc sql;

     create table want as

     select label, value

     from dataSet1 left join dataSet2

    on start<=value<=end

    group by label

    order by start;

run;

quit;

There is also typically a good practice when using SQL to use the dot notation to prefix variable names as per their source i.e.

proc sql;

     create table want as

     select a.label, a.value

     from dataSet1 as a left join dataSet2 as b

    on b.start<=a.value<=b.end

    group by a.label

    order by b.start;

quit;

Additionally, I don't think (but I might be wrong here) that SQL standards use the group by statement in a similar fashion to SAS data step with a merge statement. That is, to achieve the equivalent by statement, you would normally want to have something like

proc sql;

     create table want as

     select a.label, a.value

     from dataSet1 as a left join dataSet2 as b

    on  a.label=b.label and

          b.start<=a.value<=b.end

    group by a.label

    order by b.start;

quit;

Furthermore there is an ambiguity with your order by statement (At least based on my best guess of what you were trying to achieve) since the LEFT join retains records from dataset1 which did not match and thus have no start and end values tied to the record.

The SQL group by statement is used to define what summary functions "sumarize by". The general idea is that all variables in the group by statement are found in the select statement and all other variables in the select statement use sumary functions so as to generate a single record per group by unique key. Otherwise, you generate a cartesian product.

Beyond that, I must agree with LinusH. Data step is not appropriate for all merges and it is why proc sql exists and is built in sas. It might be viable depending on your data but likely not as it is even less fuzz match friendly than proc sql (the fuzz occur with start<=value<=end condition). Other data step alternatives involving hash object are more flexible in terms of what can be achieved (when compared with a merge statement) but it is best to be familiar with proc sql prior to getting into the hash object as the object has memory limitations.

Hope this helps

Vincent

vivek_shr
Calcite | Level 5

i think you can do left join in data step using Merge.Lets say I have two datasets Table1 and Table2 which have has common variable ID.We first need to sort the two datasets by ID.After Sorting the twosets we can code as :

Proc Sql;

Create table NEW as

Select T1.*,T2.* from

Table1 as T1 left join Table2 as T2

on T1.id=T2.id;

Quit;

This will give matching rows with the non-matching rows from left table in the new dataset NEW.

Similarly in data step:

Data NEW;

Merge Table1(in=T1) Table2(in=T2);

If T1;

by ID;

run;

I suppose this works.Apologies if it does not help out.

Tom
Super User Tom
Super User

What do you want it to do?  Your choice of variable names makes it look like you want to create FORMAT and apply it to your data.

data dataset1;

  input value @@;

cards;

1 23 3 5 15

run;

data dataset2 ;

  input label $ start end ;

cards;

LOW 0 10

HIGH 10 100

run;

data fmt ;

  retain fmtname 'DECODE' fmttype 'N';

  set dataset2 ;

run;

proc format cntlin=fmt;

run;

data want;

set dataset1;

label = put(value,decode.);

run;

proc print;

run;

Obs    value    label

1        1     LOW

2       23     HIGH

3        3     LOW

4        5     LOW

5       15     HIGH

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 55665 views
  • 0 likes
  • 5 in conversation