Hi, guys,
I have a challenge in my work: it is about match in two variables in two different data sets;
Variable 1 in data 1is like this:
Series ID |
SMU01115000000000001 |
SMU01122200000000001 |
SMU01138200000000001 |
SMU01194600000000001 |
SMU01200200000000001 |
SMU01225200000000001 |
SMU01234600000000001 |
SMU01266200000000001 |
SMU01336600000000001 |
SMU01338600000000001 |
SMU01462200000000001 |
... which are ID
For the Data 2 is like this:
Area Code Area Title
01000 Alabama -- Statewide
01001 Autauga County, Alabama
01003 Baldwin County, Alabama
01005 Barbour County, Alabama
01007 Bibb County, Alabama
01009 Blount County, Alabama
01011 Bullock County, Alabama
01013 Butler County, Alabama
01015 Calhoun County, Alabama
01017 Chambers County, Alabama
01019 Cherokee County, Alabama
01021 Chilton County, Alabama
....
If the numbers in the positions of 4-8 in ID of data 1 are the same as Area Code in data 2 , then the ID is named after a corresponding Area Title in data 2.
Note that the length of these two data sets are not the same. Here I guess it would be ideal to create a loop to do it. By the way, I do not want to combine those two data set, just want to name the data set 1 instead.
Thanks for any response!
Wu Zhang
You want a left join :
proc sql;
create table want as
select data1.*, data2.AreaTitle as name
from data1 left join data2 on substr(data1.ID,4,5)=data2.AreaCode;
quit;
PG
DATA HAVE1;
INPUT SERIES_ID $20.;
CARDS;
SMU01115000000000001
SMU01122200000000001
SMU01138200000000001
SMU01194600000000001
SMU01200200000000001
SMU01225200000000001
SMU01234600000000001
SMU01266200000000001
SMU01336600000000001
SMU01338600000000001
SMU01021000000000001
;
RUN;
DATA HAVE2;
INPUT Area_Code $5. Area_Title $25. ;
CARDS;
01000 Alabama -- Statewide
01001 Autauga County, Alabama
01003 Baldwin County, Alabama
01005 Barbour County, Alabama
01007 Bibb County, Alabama
01009 Blount County, Alabama
01011 Bullock County, Alabama
01013 Butler County, Alabama
01015 Calhoun County, Alabama
01017 Chambers County, Alabama
01019 Cherokee County, Alabama
01021 Chilton County, Alabama
;
RUN;
DATA HAVE3;
SET HAVE1;
LENGTH Area_Code $5.;
Area_Code = SUBSTR(SERIES_ID,4,5);
DATA WANT;
IF 0 THEN SET HAVE2;
IF _N_=1 THEN DO;
DECLARE HASH HH(DATASET: 'HAVE2');
HH.DEFINEKEY ('Area_Code');
HH.DEFINEDATA('Area_Title');
HH.DEFINEDONE();
DO UNTIL(EOF);
SET HAVE3 END=EOF;
IF HH.FIND()=0 THEN OUTPUT;
ELSE DO;
CALL MISSING(OF Area_Title);
OUTPUT;
END;
END;
END;
STOP;
DROP Area_Code;
RUN;
PROC PRINT; RUN;
Since you don't give us a output. Just guest:
BTW. Are you also come from China ?
DATA HAVE1; INPUT SERIES_ID $20.; CARDS; SMU01115000000000001 SMU01122200000000001 SMU01138200000000001 SMU01194600000000001 SMU01200200000000001 SMU01225200000000001 SMU01234600000000001 SMU01266200000000001 SMU01336600000000001 SMU01338600000000001 SMU01021000000000001 ; RUN; DATA HAVE2; INPUT Area_Code $5. Area_Title $25. ; CARDS; 01000 Alabama -- Statewide 01001 Autauga County, Alabama 01003 Baldwin County, Alabama 01005 Barbour County, Alabama 01007 Bibb County, Alabama 01009 Blount County, Alabama 01011 Bullock County, Alabama 01013 Butler County, Alabama 01015 Calhoun County, Alabama 01017 Chambers County, Alabama 01019 Cherokee County, Alabama 01021 Chilton County, Alabama ; RUN; DATA WANT; IF _N_=1 THEN DO; IF 0 THEN SET HAVE2; DECLARE HASH HH(DATASET: 'HAVE2'); HH.DEFINEKEY ('Area_Code'); HH.DEFINEDATA('Area_Title'); HH.DEFINEDONE(); END; SET HAVE1 ; Area_Code=SUBSTR(SERIES_ID,4,5); CALL MISSING( Area_Title); RC=HH.FIND(); DROP Area_Code RC; RUN;
Ksharp
Hi ,
Try this...
DATA HAVE1;
INPUT SERIES_ID $20.;
CARDS;
SMU01115000000000001
SMU01122200000000001
SMU01138200000000001
SMU01194600000000001
SMU01200200000000001
SMU01225200000000001
SMU01234600000000001
SMU01266200000000001
SMU01336600000000001
SMU01338600000000001
SMU01021000000000001
;
RUN;
data fmt (keep=FMTNAME START END LABEL TYPE);
length FMTNAME $30. START END $256.;
set WORK.have1;
FMTNAME = 'test' ;
START = substr(SERIES_ID,4,5) ;
END = Start;
LABEL = 'test';
TYPE = 'C' ;
run;
proc format cntlin=fmt lib=work; run;
DATA HAVE2(drop=x);
INPUT Area_Code $5. Area_Title $25. ;
x=put(Area_code,$test.);
if x='test' then output;
CARDS;
01000 Alabama -- Statewide
01001 Autauga County, Alabama
01003 Baldwin County, Alabama
01005 Barbour County, Alabama
01007 Bibb County, Alabama
01009 Blount County, Alabama
01011 Bullock County, Alabama
01013 Butler County, Alabama
01015 Calhoun County, Alabama
01017 Chambers County, Alabama
01019 Cherokee County, Alabama
01021 Chilton County, Alabama
;
RUN;
Thanks,
Shiva
Or make a format from Data2 and use that to add the label.
Data makeformat (keep=start label fmtname);
set data2;
rename (variable name for area code not mentioned)=start (variable name of area title)=label;
fmtname = "$AreaLabel"; /* or what every you like*/
run;
proc format cntlin=makeformat library=(library you would like to keep it in);
/* make sure the library is in the format search path. If you use a permanent library then the code above only needs
to be run when you have a change in your area titles. Since those codes look like counties then they aren't likely to
change often*/
use as :
data want;
set data1 (or what ever your first data set is named);
AreaTitle = put(substr(id, 4,8),$AreaLabel.);
run;
BTW, I would be careful about calling FIPS codes area code, you will generate confusion.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.