Dear experts,
I should merge two data sets on two variables but taking the observation on the left part of the join only once. The output below generated should have for the second and fourth observation a value=""
DATA data1;
INPUT code $2. location $3. location2 $4.;
DATALINES;
61 UK ABC
45 CH DEF
03 CH DEF
11 JP GHI
14 JP GHI
;
DATA data2;
INPUT value $3. location $3. location2 $4.;
DATALINES;
100 UK ABC
200 CH DEF
300 JP GHI
;
proc sql; create table merge as select t0.*, t1.* from data1 t0 left outer join data2 t1
on (t1.location=t0.location and t1.location2=t0.location2) ;quit;
Thanks a lo in advance. BRs, SH
Try the data step method:
proc sort data=data1;
by location location2;
run;
proc sort data=data2;
by location location2;
run;
data merge2;
merge
data1 (in=a)
data2 (in=b)
;
by location location2;
if a;
if first.location2;
run;
You might want to consider additional sort criteria for data1 to put the desired code first.
That'snot how SQL works. You requirement is row (observation) oriented, while SQL is column oriented.
SQL first performs a Cartesian join (in theory) and the filters using the on (or where criteria when used).
You need to use a data step to solve this.
Perhaps you need a better data model so you can describe which is the first observation in the "join group" (in case of resorting for instance), and why.
I just showed that I was not able to do it (using sql) and I asked how to get the output I want from the input I have. Where did I write to do it necessarily in sql? The only implicit constrain is using SAS. Pheraphs what I was doing is not related to any data model? Please try to answer accordingly to the question, thanks anyway for your effort.
Thats not a logical join though. There is no way of knowing, from the data, that any row is "first". This is information you need to put in before joining, and example given below. Note alo that the use of * causes warnings as variables appear in multiple tables - its not a good idea to use this syntax.
data data1; input code $2. location $3. location2 $4. first; datalines; 61 UK ABC 1 45 CH DEF 1 03 CH DEF 0 11 JP GHI 1 14 JP GHI 0 ; run; data data2; input value $3. location $3. location2 $4. first; datalines; 100 UK ABC 1 200 CH DEF 1 300 JP GHI 1 ; run; proc sql; create table MERGE as select T0.*, T1.* from DATA1 T0 left join DATA2 T1 on T1.LOCATION=T0.LOCATION and T1.LOCATION2=T0.LOCATION2 and T1.FIRST=T0.FIRST; quit;
Try the data step method:
proc sort data=data1;
by location location2;
run;
proc sort data=data2;
by location location2;
run;
data merge2;
merge
data1 (in=a)
data2 (in=b)
;
by location location2;
if a;
if first.location2;
run;
You might want to consider additional sort criteria for data1 to put the desired code first.
The DATA step is a workable approach here. SQL does not guarantee any "order" to the incoming records.
Here are a couple of variations to consider on the DATA step approach.
Instead of sorting/merging by LOCATION LOCATION2, you could sort/merge by LOCATION2 LOCATION. In the sample data you have provided, that would preserve the current order of the observations (and could conceivably eliminate the need for sorting).
For the final statement, you could consider this variation:
if first.location2=0 then value=' ';
That would keep all records, but reset VALUE to a blank value.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.