DATA Step, Macro, Functions and more

match two variables in two different data sets

Reply
Occasional Contributor sos
Occasional Contributor
Posts: 12

match two variables in two different data sets

Hi, guys,

   I have a challenge in my work: it is about match in two variables in two different data sets;

Variable 1 in data 1is like this:

Series ID
SMU01115000000000001
SMU01122200000000001
SMU01138200000000001
SMU01194600000000001
SMU01200200000000001
SMU01225200000000001
SMU01234600000000001
SMU01266200000000001
SMU01336600000000001
SMU01338600000000001
SMU01462200000000001

... which are ID

For the Data 2 is like this:

Area Code           Area Title                                       
01000               Alabama -- Statewide
01001               Autauga County, Alabama
01003               Baldwin County, Alabama
01005               Barbour County, Alabama
01007               Bibb County, Alabama
01009               Blount County, Alabama
01011               Bullock County, Alabama
01013               Butler County, Alabama
01015               Calhoun County, Alabama
01017               Chambers County, Alabama
01019               Cherokee County, Alabama
01021               Chilton County, Alabama
....

 

If the numbers in the  positions of 4-8 in ID of data 1 are the same as Area Code in data 2 , then  the ID is named after a corresponding Area Title in data 2.

Note that the length of these two data sets are not the same. Here I guess it would be ideal to create a loop to do it. By the way, I do not want to combine those two data set, just want to name the data set 1 instead.

Thanks for any response!

Wu Zhang

Respected Advisor
Posts: 4,919

Re: match two variables in two different data sets

You want a left join :

proc sql;

create table want as

select data1.*, data2.AreaTitle as name

from data1 left join data2 on substr(data1.ID,4,5)=data2.AreaCode;

quit;

PG


PG
Regular Contributor
Posts: 233

Re: match two variables in two different data sets

DATA HAVE1;
INPUT SERIES_ID $20.;
CARDS;
SMU01115000000000001
SMU01122200000000001
SMU01138200000000001
SMU01194600000000001
SMU01200200000000001
SMU01225200000000001
SMU01234600000000001
SMU01266200000000001
SMU01336600000000001
SMU01338600000000001
SMU01021000000000001
;
RUN;

DATA HAVE2;
INPUT Area_Code $5. Area_Title $25. ;
CARDS;
01000 Alabama -- Statewide
01001 Autauga County, Alabama
01003 Baldwin County, Alabama
01005 Barbour County, Alabama
01007 Bibb County, Alabama
01009 Blount County, Alabama
01011 Bullock County, Alabama
01013 Butler County, Alabama
01015 Calhoun County, Alabama
01017 Chambers County, Alabama
01019 Cherokee County, Alabama
01021 Chilton County, Alabama
;
RUN;


DATA HAVE3;
SET HAVE1;
LENGTH Area_Code $5.;
Area_Code = SUBSTR(SERIES_ID,4,5);
DATA WANT;
IF 0 THEN SET HAVE2;
IF _N_=1 THEN DO;
DECLARE HASH HH(DATASET: 'HAVE2');
HH.DEFINEKEY ('Area_Code');
HH.DEFINEDATA('Area_Title');
HH.DEFINEDONE();
DO UNTIL(EOF);
SET HAVE3 END=EOF;  
IF HH.FIND()=0 THEN OUTPUT;
ELSE DO;
CALL MISSING(OF Area_Title);
OUTPUT;
END;
END;
END;
STOP;
DROP Area_Code;
RUN;

PROC PRINT; RUN;

Super User
Posts: 10,018

Re: match two variables in two different data sets

Since you don't give us a output. Just guest:

BTW. Are you also come from China ?

DATA HAVE1;
INPUT SERIES_ID $20.;
CARDS;
SMU01115000000000001
SMU01122200000000001
SMU01138200000000001
SMU01194600000000001
SMU01200200000000001
SMU01225200000000001
SMU01234600000000001
SMU01266200000000001
SMU01336600000000001
SMU01338600000000001
SMU01021000000000001
;
RUN;

DATA HAVE2;
INPUT Area_Code $5. Area_Title $25. ;
CARDS;
01000 Alabama -- Statewide
01001 Autauga County, Alabama
01003 Baldwin County, Alabama
01005 Barbour County, Alabama
01007 Bibb County, Alabama
01009 Blount County, Alabama
01011 Bullock County, Alabama
01013 Butler County, Alabama
01015 Calhoun County, Alabama
01017 Chambers County, Alabama
01019 Cherokee County, Alabama
01021 Chilton County, Alabama
;
RUN;


DATA WANT;
IF _N_=1 THEN DO;
IF 0 THEN SET HAVE2;
DECLARE HASH HH(DATASET: 'HAVE2');
HH.DEFINEKEY ('Area_Code');
HH.DEFINEDATA('Area_Title');
HH.DEFINEDONE();
END;
SET HAVE1 ;  
Area_Code=SUBSTR(SERIES_ID,4,5); 
CALL MISSING( Area_Title);
RC=HH.FIND();
DROP Area_Code RC;
RUN;


Ksharp

Super Contributor
Posts: 349

Re: match two variables in two different data sets

Hi ,

Try this...

DATA HAVE1;

INPUT SERIES_ID $20.;

CARDS;

SMU01115000000000001

SMU01122200000000001

SMU01138200000000001

SMU01194600000000001

SMU01200200000000001

SMU01225200000000001

SMU01234600000000001

SMU01266200000000001

SMU01336600000000001

SMU01338600000000001

SMU01021000000000001

;

RUN;

data fmt (keep=FMTNAME START END LABEL TYPE);

length FMTNAME $30. START END $256.;

set WORK.have1;

      FMTNAME = 'test' ;

      START   = substr(SERIES_ID,4,5) ;

      END     = Start;

      LABEL   = 'test';

      TYPE    = 'C' ;

run;

proc format cntlin=fmt lib=work; run;

DATA HAVE2(drop=x);

INPUT Area_Code $5. Area_Title $25. ;

x=put(Area_code,$test.);

if x='test' then output;

CARDS;

01000 Alabama -- Statewide

01001 Autauga County, Alabama

01003 Baldwin County, Alabama

01005 Barbour County, Alabama

01007 Bibb County, Alabama

01009 Blount County, Alabama

01011 Bullock County, Alabama

01013 Butler County, Alabama

01015 Calhoun County, Alabama

01017 Chambers County, Alabama

01019 Cherokee County, Alabama

01021 Chilton County, Alabama

;

RUN;

Thanks,

Shiva

Super User
Posts: 11,336

Re: match two variables in two different data sets

Or make a format from Data2 and use that to add the label.

Data makeformat (keep=start label fmtname);

     set data2;

     rename (variable name for area code not mentioned)=start (variable name of area title)=label;

     fmtname = "$AreaLabel"; /* or what every you like*/

run;

proc format cntlin=makeformat library=(library you would like to keep it in);

/* make sure the library is in the format search path. If you use a permanent library then the code above only needs

to be run when you have a change in your area titles. Since those codes look like counties then they aren't likely to

change often*/

use as :

data want;

   set data1 (or what ever your first data set is named);

   AreaTitle = put(substr(id, 4,8),$AreaLabel.);

run;

BTW, I would be careful about calling FIPS codes area code, you will generate confusion.

Ask a Question
Discussion stats
  • 5 replies
  • 306 views
  • 0 likes
  • 6 in conversation