Desktop productivity for business analysts and programmers

How do I compare two datasets using string variables?

Reply
Occasional Contributor
Posts: 8

How do I compare two datasets using string variables?

[ Edited ]

Hi everyone! I'm a new user of EG and i'm fascinated with the capabilities of the tool. Rigth now i'm working on two datasets, the first one (A) is 250k+ rows and 25 columns, the second one (B) is around 25k rows and only 2 columns.

 

I'm trying to do sort of a VLOOKUP, getting info from B that matches A and put it in C (new table) but there's two things: i haven't found a way of joining them the easy way (using queries); the other one I tried doing it programming but i'm not an expert in this language.

 

This is what i've tried, in separate, but doesn't seem to be working:

 

PROC SQL;
	CREATE TABLE C AS
	SELECT DISTINCT t1.UNIDAD,
		t2.unidad as unidad1,
		t2.marca
	FROM QUERY_FOR_PAGOSPROVEEDORES t1 
		LEFT JOIN MARCAS_Y_UNIDADES1 t2 ON (t1.UNIDAD LIKE t2.UNIDAD);
QUIT;

proc sql;
	create table test as
	select t1.UNIDAD, t2.UNIDAD
	from work.query_for_pagosproveedores t1, work.marcas_y_unidades1 t2
	where upcase(marcas_y_unidades1.UNIDAD) contains upcase(query_for_pagosproveedores.UNIDAD);
quit;

 

Also, a couple of images from both A and B datasets:

 

t1.jpg

t2.jpg

 

How can i do this? Thanks in advance, any help will be much appreciated!

 

NOTE: data in A repeats. I'm using SAS EG 5.1 (64-bit).

Grand Advisor
Posts: 10,192

Re: How do I compare two datasets using string variables?

From your example data I do not think you ever have much luck trying to compare those UNIDAD fields directly as they look way too different.

My approach would be to pull add a variable to your table A with the Marca information. Then add model information such as "TSURU" or "TACOMA" or "H100" to both data sets. Then you wold have a chance on matching on those two variables.

 

In one form of looking at data your "UNIDAD" field should probably be at least 3 variables (Make Model (not your current Modelo variable which a model year which is another variable that would ideally be in both data sets) and Trim level) with some serious attention paid to the data entry. Since your table A looks like it is likely to be user entered then you should spend some time "cleaning" which means getting things consistent such as only one of: Mercedes, Mercedes-Benz, MBenz, Mercedes Benz (no -).

 

 

Occasional Contributor
Posts: 8

Re: How do I compare two datasets using string variables?

[ Edited ]

Thanks ballardw, maybe i should've been more precise: the data you see in A is part of what i have in B. In some cases yes, it is user entered however the major part is "trimmed" but not to the same length (it varies, not sure why).

 

Eg. from A you would get TR KENWORTH T 600 B 42" " in the first row; in B it would look like TR KENWORTH T 600 B 42" STD., 02 OCUP. ", If I do a manual search using a query and filtering UNIDAD contains TR KENWORTH T 600 B 42", it works perfectly. Of course I can't do this process 250k+ times manually Man Tongue

 

not sure if i explained myself... Man Frustrated sure there's a way i can do this automatically

Grand Advisor
Posts: 17,290

Re: How do I compare two datasets using string variables?

Post sample data please. Not as images, but as text. 

Occasional Contributor
Posts: 8

Re: How do I compare two datasets using string variables?

Sure, sample data as text:

 

First table:

 

UNIDAD
TR KENWORTH T 600 B 42"
NISSAN TSURU GSII PLUS
CS NISSAN CHASIS CABINA LARGO
PY TOYOTA TACOMA TRD DOBLE CAB
GMOTORS ACADIA V/P 4X4
MBENZ E-300 V6 SEDAN ADVANTGAR
MBENZ GLK 300 CGI OFF ROAD SPO
MBENZ SPRINTER 415 WAGON C/A T
HYUNDAI / HYUNDAI GRAND I10 GL
VW "EL NUEVO" JETTA A4 EUROPA
CHEVROLET EXPRESS VAN SERVPUB
RENAULT DUSTER EXPRESSION 4X2
CS FORD F-350 CHASIS CABINA KT
FORD EXPLORER XLT 3.5L 290HP V
DODGE NITRO SLT 4X2

 

Second table:

 

UNIDADMARCA
CS FORD F-350 KTP XL CH 2P V8 6.2L 2WD AC 3.5TON AUT., 03 OCUP.FORD
CS FORD F-350 KTP XL PLUS CH 2P V8 6.2L AC 3.5TON STD., 03 OCUP.FORD
CS FORD F-350 KTP XL PLUS CH 2P V8 6.2L EFI AC R17 AUT., 02 OCUP.FORD
CS FORD F-350 KTP XLT CH 2P V8 6.2L 2WD 3.5TON STD., 03 OCUP.FORD
CS FORD F-450 CHASIS CABINA KTP 6.7L V8 TDI AUT., 02 OCUP.FORD
CS FORD F-450 CHASIS CABINA KTP XL PLUS TDI 6.4L AUT., 02 OCUP.FORD
CS FORD F-450 CHASIS CABINA KTP XL PLUS TDI 6.4L STD., 02 OCUP.FORD
CS FORD F-450 CHASIS CABINA KTP XL PLUS V10 6.8L AUT., 02 OCUP.FORD
CS FORD F-450 CHASIS CABINA KTP XL PLUS V10 6.8L STD., 02 OCUP.FORD
CS FORD F-450 CHASIS CABINA S.D XL A/AC DIESEL STD., 02 OCUP.FORD
CS FORD F-450 CHASIS CABINA SUPER DUTY XL A/AC STD., 02 OCUP.FORD
CS FORD F-450 CHASIS CABINA SUPER DUTY XL BA STD., 02 OCUP.FORD
CS FORD F-450 CHASIS CABINA SUPER DUTY XL DIESEL STD., 02 OCUP.FORD
CS FORD F-450 CHASIS CABINA SUPER DUTY XL GAS LP STD., 02 OCUP.FORD
CS FORD F-450 CHASIS CABINA SUPER DUTY XL STD., 02 OCUP.FORD
CS FORD F-450 CHASIS CABINA XL V10 A/AC BA STD., 02 OCUP.FORD
CS FORD F-450 CHASIS CABINA XL V10 C/A AC STD., 02 OCUP.FORD
CS FORD F-450 CHASIS CABINA XL V8 C/A AC DIESEL STD., 02 OCUP.FORD
CS FORD F-450 CHASIS CABINA XL V8 DIESEL AUT., 02 OCUP.FORD

 

 

Thanks!

Grand Advisor
Posts: 17,290

Re: How do I compare two datasets using string variables?

If I understand correctly, based on your sample data there would be no matches? Perhaps some sample data where there would be matches and what the output should be...

Occasional Contributor
Posts: 8

Re: How do I compare two datasets using string variables?

Yes, there are no matches. The result output should look like this:

 

UNIDAD (FROM TABLE A)UNIDAD (FROM TABLE B)MARCA (FROM TABLE B)
CS FORD F-350 CHASIS CABINA KTCS FORD F-450 CHASIS CABINA KTP XL PLUS TDI 6.4L AUT., 02 OCUP.FORD
CS FORD F-350 CHASIS CABINA KTCS FORD F-450 CHASIS CABINA KTP XL PLUS TDI 6.4L STD., 02 OCUP.FORD
CS FORD F-350 CHASIS CABINA KTCS FORD F-450 CHASIS CABINA KTP XL PLUS V10 6.8L AUT., 02 OCUP.FORD
CS FORD F-350 CHASIS CABINA KTCS FORD F-450 CHASIS CABINA KTP XL PLUS V10 6.8L STD., 02 OCUP.FORD

 

I assume if I use DISTINCT there would be only 1 row, however it'd look something like that.

Grand Advisor
Posts: 17,290

Re: How do I compare two datasets using string variables?

If there are no matches how are you getting information from Table B to Table A?

Occasional Contributor
Posts: 8

Re: How do I compare two datasets using string variables?

I get no matches because certainly I'm doing something wrong in my coding, that's why i'm asking for help. My previous reply has what the output should look like.

Grand Advisor
Posts: 17,290

Re: How do I compare two datasets using string variables?

I get why your code isn't working. What I'm asking for is a worked example. Based on your sample data, table A and Table B your output doesn't align with what your asking for, there are no matches from sample table A to sample table B.

 

But I'm guessing it doesn't and you didn't provide data that would actually work for testing...if we don't have data that will match in a test ita kinda hard to test. I would basically have to develop my own test data for your situation. 

 

 

Grand Advisor
Posts: 17,290

Re: How do I compare two datasets using string variables?

You can use functions in your join condition, so using the substr to create strings the same length that will then match.

 

Here's an example with mock data from SASHELP.CLASS mocked up. It should give you the idea. 

You'll want to play around with the join type to get the one you want (left/right/inner/outer)

 

Hope this helps. 

 

data class1;
length name $40.;
set sashelp.class;
name=catx(" ", name, "Black");
run;

data class2;
set sashelp.class;
ID = _n_;
if ID in (12, 13, 15, 19) then delete;
if name='Alfred' then name='Jonathon';
if name='Carol' then name='Tanisha';
run;



proc sql;
create table want as
select a.name as name1,
       b.name as name2,
       b.ID
from class1 as a
full outer join class2 as b
on b.name=substr(a.name, 1, length(b.name))
order by b.ID;
quit;
Ask a Question
Discussion stats
  • 10 replies
  • 399 views
  • 0 likes
  • 3 in conversation