Hi,
I have a quick question I have the following code
proc sql;
create table want as
select label, value
from dataSet1 left join dataSet2
on start<=value<=end
group by label
order start;
run;
The thing is this code is not working if I want to replicate this sql statement using a data step how would I do this
it should be a left join
data want
set dataSet1(in=a) dataSet2(in=b);
//not sure what do here
run;
i think you can do left join in data step using Merge.Lets say I have two datasets Table1 and Table2 which have has common variable ID.We first need to sort the two datasets by ID.After Sorting the twosets we can code as :
Proc Sql;
Create table NEW as
Select T1.*,T2.* from
Table1 as T1 left join Table2 as T2
on T1.id=T2.id;
Quit;
This will give matching rows with the non-matching rows from left table in the new dataset NEW.
Similarly in data step:
Data NEW;
Merge Table1(in=T1) Table2(in=T2);
If T1;
by ID;
run;
I suppose this works.Apologies if it does not help out.
1. Why do you want to use a data step?
2. Search these forums and support.sas.com - this is one of the most frequent discussions in the history of SAS...
You would first need to check your syntax and error log if a piece of code does not work. If it doesn't produce expected results, then it might be a functionality does not oblige to your requirements.
proc sql;
create table want as
select label, value
from dataSet1 left join dataSet2
on start<=value<=end
group by label
order by start;
run;
quit;
There is also typically a good practice when using SQL to use the dot notation to prefix variable names as per their source i.e.
proc sql;
create table want as
select a.label, a.value
from dataSet1 as a left join dataSet2 as b
on b.start<=a.value<=b.end
group by a.label
order by b.start;
quit;
Additionally, I don't think (but I might be wrong here) that SQL standards use the group by statement in a similar fashion to SAS data step with a merge statement. That is, to achieve the equivalent by statement, you would normally want to have something like
proc sql;
create table want as
select a.label, a.value
from dataSet1 as a left join dataSet2 as b
on a.label=b.label and
b.start<=a.value<=b.end
group by a.label
order by b.start;
quit;
Furthermore there is an ambiguity with your order by statement (At least based on my best guess of what you were trying to achieve) since the LEFT join retains records from dataset1 which did not match and thus have no start and end values tied to the record.
The SQL group by statement is used to define what summary functions "sumarize by". The general idea is that all variables in the group by statement are found in the select statement and all other variables in the select statement use sumary functions so as to generate a single record per group by unique key. Otherwise, you generate a cartesian product.
Beyond that, I must agree with LinusH. Data step is not appropriate for all merges and it is why proc sql exists and is built in sas. It might be viable depending on your data but likely not as it is even less fuzz match friendly than proc sql (the fuzz occur with start<=value<=end condition). Other data step alternatives involving hash object are more flexible in terms of what can be achieved (when compared with a merge statement) but it is best to be familiar with proc sql prior to getting into the hash object as the object has memory limitations.
Hope this helps
Vincent
i think you can do left join in data step using Merge.Lets say I have two datasets Table1 and Table2 which have has common variable ID.We first need to sort the two datasets by ID.After Sorting the twosets we can code as :
Proc Sql;
Create table NEW as
Select T1.*,T2.* from
Table1 as T1 left join Table2 as T2
on T1.id=T2.id;
Quit;
This will give matching rows with the non-matching rows from left table in the new dataset NEW.
Similarly in data step:
Data NEW;
Merge Table1(in=T1) Table2(in=T2);
If T1;
by ID;
run;
I suppose this works.Apologies if it does not help out.
What do you want it to do? Your choice of variable names makes it look like you want to create FORMAT and apply it to your data.
data dataset1;
input value @@;
cards;
1 23 3 5 15
run;
data dataset2 ;
input label $ start end ;
cards;
LOW 0 10
HIGH 10 100
run;
data fmt ;
retain fmtname 'DECODE' fmttype 'N';
set dataset2 ;
run;
proc format cntlin=fmt;
run;
data want;
set dataset1;
label = put(value,decode.);
run;
proc print;
run;
Obs value label
1 1 LOW
2 23 HIGH
3 3 LOW
4 5 LOW
5 15 HIGH
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.