Hi everyone! I'm a new user of EG and i'm fascinated with the capabilities of the tool. Rigth now i'm working on two datasets, the first one (A) is 250k+ rows and 25 columns, the second one (B) is around 25k rows and only 2 columns.
I'm trying to do sort of a VLOOKUP, getting info from B that matches A and put it in C (new table) but there's two things: i haven't found a way of joining them the easy way (using queries); the other one I tried doing it programming but i'm not an expert in this language.
This is what i've tried, in separate, but doesn't seem to be working:
PROC SQL; CREATE TABLE C AS SELECT DISTINCT t1.UNIDAD, t2.unidad as unidad1, t2.marca FROM QUERY_FOR_PAGOSPROVEEDORES t1 LEFT JOIN MARCAS_Y_UNIDADES1 t2 ON (t1.UNIDAD LIKE t2.UNIDAD); QUIT; proc sql; create table test as select t1.UNIDAD, t2.UNIDAD from work.query_for_pagosproveedores t1, work.marcas_y_unidades1 t2 where upcase(marcas_y_unidades1.UNIDAD) contains upcase(query_for_pagosproveedores.UNIDAD); quit;
Also, a couple of images from both A and B datasets:
How can i do this? Thanks in advance, any help will be much appreciated!
NOTE: data in A repeats. I'm using SAS EG 5.1 (64-bit).
From your example data I do not think you ever have much luck trying to compare those UNIDAD fields directly as they look way too different.
My approach would be to pull add a variable to your table A with the Marca information. Then add model information such as "TSURU" or "TACOMA" or "H100" to both data sets. Then you wold have a chance on matching on those two variables.
In one form of looking at data your "UNIDAD" field should probably be at least 3 variables (Make Model (not your current Modelo variable which a model year which is another variable that would ideally be in both data sets) and Trim level) with some serious attention paid to the data entry. Since your table A looks like it is likely to be user entered then you should spend some time "cleaning" which means getting things consistent such as only one of: Mercedes, Mercedes-Benz, MBenz, Mercedes Benz (no -).
Thanks ballardw, maybe i should've been more precise: the data you see in A is part of what i have in B. In some cases yes, it is user entered however the major part is "trimmed" but not to the same length (it varies, not sure why).
Eg. from A you would get " TR KENWORTH T 600 B 42" " in the first row; in B it would look like " TR KENWORTH T 600 B 42" STD., 02 OCUP. ", If I do a manual search using a query and filtering UNIDAD contains TR KENWORTH T 600 B 42", it works perfectly. Of course I can't do this process 250k+ times manually
not sure if i explained myself... sure there's a way i can do this automatically
Post sample data please. Not as images, but as text.
Sure, sample data as text:
First table:
UNIDAD |
TR KENWORTH T 600 B 42" |
NISSAN TSURU GSII PLUS |
CS NISSAN CHASIS CABINA LARGO |
PY TOYOTA TACOMA TRD DOBLE CAB |
GMOTORS ACADIA V/P 4X4 |
MBENZ E-300 V6 SEDAN ADVANTGAR |
MBENZ GLK 300 CGI OFF ROAD SPO |
MBENZ SPRINTER 415 WAGON C/A T |
HYUNDAI / HYUNDAI GRAND I10 GL |
VW "EL NUEVO" JETTA A4 EUROPA |
CHEVROLET EXPRESS VAN SERVPUB |
RENAULT DUSTER EXPRESSION 4X2 |
CS FORD F-350 CHASIS CABINA KT |
FORD EXPLORER XLT 3.5L 290HP V |
DODGE NITRO SLT 4X2 |
Second table:
UNIDAD | MARCA |
CS FORD F-350 KTP XL CH 2P V8 6.2L 2WD AC 3.5TON AUT., 03 OCUP. | FORD |
CS FORD F-350 KTP XL PLUS CH 2P V8 6.2L AC 3.5TON STD., 03 OCUP. | FORD |
CS FORD F-350 KTP XL PLUS CH 2P V8 6.2L EFI AC R17 AUT., 02 OCUP. | FORD |
CS FORD F-350 KTP XLT CH 2P V8 6.2L 2WD 3.5TON STD., 03 OCUP. | FORD |
CS FORD F-450 CHASIS CABINA KTP 6.7L V8 TDI AUT., 02 OCUP. | FORD |
CS FORD F-450 CHASIS CABINA KTP XL PLUS TDI 6.4L AUT., 02 OCUP. | FORD |
CS FORD F-450 CHASIS CABINA KTP XL PLUS TDI 6.4L STD., 02 OCUP. | FORD |
CS FORD F-450 CHASIS CABINA KTP XL PLUS V10 6.8L AUT., 02 OCUP. | FORD |
CS FORD F-450 CHASIS CABINA KTP XL PLUS V10 6.8L STD., 02 OCUP. | FORD |
CS FORD F-450 CHASIS CABINA S.D XL A/AC DIESEL STD., 02 OCUP. | FORD |
CS FORD F-450 CHASIS CABINA SUPER DUTY XL A/AC STD., 02 OCUP. | FORD |
CS FORD F-450 CHASIS CABINA SUPER DUTY XL BA STD., 02 OCUP. | FORD |
CS FORD F-450 CHASIS CABINA SUPER DUTY XL DIESEL STD., 02 OCUP. | FORD |
CS FORD F-450 CHASIS CABINA SUPER DUTY XL GAS LP STD., 02 OCUP. | FORD |
CS FORD F-450 CHASIS CABINA SUPER DUTY XL STD., 02 OCUP. | FORD |
CS FORD F-450 CHASIS CABINA XL V10 A/AC BA STD., 02 OCUP. | FORD |
CS FORD F-450 CHASIS CABINA XL V10 C/A AC STD., 02 OCUP. | FORD |
CS FORD F-450 CHASIS CABINA XL V8 C/A AC DIESEL STD., 02 OCUP. | FORD |
CS FORD F-450 CHASIS CABINA XL V8 DIESEL AUT., 02 OCUP. | FORD |
Thanks!
If I understand correctly, based on your sample data there would be no matches? Perhaps some sample data where there would be matches and what the output should be...
Yes, there are no matches. The result output should look like this:
UNIDAD (FROM TABLE A) | UNIDAD (FROM TABLE B) | MARCA (FROM TABLE B) |
CS FORD F-350 CHASIS CABINA KT | CS FORD F-450 CHASIS CABINA KTP XL PLUS TDI 6.4L AUT., 02 OCUP. | FORD |
CS FORD F-350 CHASIS CABINA KT | CS FORD F-450 CHASIS CABINA KTP XL PLUS TDI 6.4L STD., 02 OCUP. | FORD |
CS FORD F-350 CHASIS CABINA KT | CS FORD F-450 CHASIS CABINA KTP XL PLUS V10 6.8L AUT., 02 OCUP. | FORD |
CS FORD F-350 CHASIS CABINA KT | CS FORD F-450 CHASIS CABINA KTP XL PLUS V10 6.8L STD., 02 OCUP. | FORD |
I assume if I use DISTINCT there would be only 1 row, however it'd look something like that.
If there are no matches how are you getting information from Table B to Table A?
I get no matches because certainly I'm doing something wrong in my coding, that's why i'm asking for help. My previous reply has what the output should look like.
I get why your code isn't working. What I'm asking for is a worked example. Based on your sample data, table A and Table B your output doesn't align with what your asking for, there are no matches from sample table A to sample table B.
But I'm guessing it doesn't and you didn't provide data that would actually work for testing...if we don't have data that will match in a test ita kinda hard to test. I would basically have to develop my own test data for your situation.
You can use functions in your join condition, so using the substr to create strings the same length that will then match.
Here's an example with mock data from SASHELP.CLASS mocked up. It should give you the idea.
You'll want to play around with the join type to get the one you want (left/right/inner/outer)
Hope this helps.
data class1;
length name $40.;
set sashelp.class;
name=catx(" ", name, "Black");
run;
data class2;
set sashelp.class;
ID = _n_;
if ID in (12, 13, 15, 19) then delete;
if name='Alfred' then name='Jonathon';
if name='Carol' then name='Tanisha';
run;
proc sql;
create table want as
select a.name as name1,
b.name as name2,
b.ID
from class1 as a
full outer join class2 as b
on b.name=substr(a.name, 1, length(b.name))
order by b.ID;
quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.