BookmarkSubscribeRSS Feed
sos
Calcite | Level 5 sos
Calcite | Level 5

Hi, guys,

   I have a challenge in my work: it is about match in two variables in two different data sets;

Variable 1 in data 1is like this:

Series ID
SMU01115000000000001
SMU01122200000000001
SMU01138200000000001
SMU01194600000000001
SMU01200200000000001
SMU01225200000000001
SMU01234600000000001
SMU01266200000000001
SMU01336600000000001
SMU01338600000000001
SMU01462200000000001

... which are ID

For the Data 2 is like this:

Area Code           Area Title                                       
01000               Alabama -- Statewide
01001               Autauga County, Alabama
01003               Baldwin County, Alabama
01005               Barbour County, Alabama
01007               Bibb County, Alabama
01009               Blount County, Alabama
01011               Bullock County, Alabama
01013               Butler County, Alabama
01015               Calhoun County, Alabama
01017               Chambers County, Alabama
01019               Cherokee County, Alabama
01021               Chilton County, Alabama
....

 

If the numbers in the  positions of 4-8 in ID of data 1 are the same as Area Code in data 2 , then  the ID is named after a corresponding Area Title in data 2.

Note that the length of these two data sets are not the same. Here I guess it would be ideal to create a loop to do it. By the way, I do not want to combine those two data set, just want to name the data set 1 instead.

Thanks for any response!

Wu Zhang

5 REPLIES 5
PGStats
Opal | Level 21

You want a left join :

proc sql;

create table want as

select data1.*, data2.AreaTitle as name

from data1 left join data2 on substr(data1.ID,4,5)=data2.AreaCode;

quit;

PG


PG
Hima
Obsidian | Level 7

DATA HAVE1;
INPUT SERIES_ID $20.;
CARDS;
SMU01115000000000001
SMU01122200000000001
SMU01138200000000001
SMU01194600000000001
SMU01200200000000001
SMU01225200000000001
SMU01234600000000001
SMU01266200000000001
SMU01336600000000001
SMU01338600000000001
SMU01021000000000001
;
RUN;

DATA HAVE2;
INPUT Area_Code $5. Area_Title $25. ;
CARDS;
01000 Alabama -- Statewide
01001 Autauga County, Alabama
01003 Baldwin County, Alabama
01005 Barbour County, Alabama
01007 Bibb County, Alabama
01009 Blount County, Alabama
01011 Bullock County, Alabama
01013 Butler County, Alabama
01015 Calhoun County, Alabama
01017 Chambers County, Alabama
01019 Cherokee County, Alabama
01021 Chilton County, Alabama
;
RUN;


DATA HAVE3;
SET HAVE1;
LENGTH Area_Code $5.;
Area_Code = SUBSTR(SERIES_ID,4,5);
DATA WANT;
IF 0 THEN SET HAVE2;
IF _N_=1 THEN DO;
DECLARE HASH HH(DATASET: 'HAVE2');
HH.DEFINEKEY ('Area_Code');
HH.DEFINEDATA('Area_Title');
HH.DEFINEDONE();
DO UNTIL(EOF);
SET HAVE3 END=EOF;  
IF HH.FIND()=0 THEN OUTPUT;
ELSE DO;
CALL MISSING(OF Area_Title);
OUTPUT;
END;
END;
END;
STOP;
DROP Area_Code;
RUN;

PROC PRINT; RUN;

Ksharp
Super User

Since you don't give us a output. Just guest:

BTW. Are you also come from China ?

DATA HAVE1;
INPUT SERIES_ID $20.;
CARDS;
SMU01115000000000001
SMU01122200000000001
SMU01138200000000001
SMU01194600000000001
SMU01200200000000001
SMU01225200000000001
SMU01234600000000001
SMU01266200000000001
SMU01336600000000001
SMU01338600000000001
SMU01021000000000001
;
RUN;

DATA HAVE2;
INPUT Area_Code $5. Area_Title $25. ;
CARDS;
01000 Alabama -- Statewide
01001 Autauga County, Alabama
01003 Baldwin County, Alabama
01005 Barbour County, Alabama
01007 Bibb County, Alabama
01009 Blount County, Alabama
01011 Bullock County, Alabama
01013 Butler County, Alabama
01015 Calhoun County, Alabama
01017 Chambers County, Alabama
01019 Cherokee County, Alabama
01021 Chilton County, Alabama
;
RUN;


DATA WANT;
IF _N_=1 THEN DO;
IF 0 THEN SET HAVE2;
DECLARE HASH HH(DATASET: 'HAVE2');
HH.DEFINEKEY ('Area_Code');
HH.DEFINEDATA('Area_Title');
HH.DEFINEDONE();
END;
SET HAVE1 ;  
Area_Code=SUBSTR(SERIES_ID,4,5); 
CALL MISSING( Area_Title);
RC=HH.FIND();
DROP Area_Code RC;
RUN;


Ksharp

shivas
Pyrite | Level 9

Hi ,

Try this...

DATA HAVE1;

INPUT SERIES_ID $20.;

CARDS;

SMU01115000000000001

SMU01122200000000001

SMU01138200000000001

SMU01194600000000001

SMU01200200000000001

SMU01225200000000001

SMU01234600000000001

SMU01266200000000001

SMU01336600000000001

SMU01338600000000001

SMU01021000000000001

;

RUN;

data fmt (keep=FMTNAME START END LABEL TYPE);

length FMTNAME $30. START END $256.;

set WORK.have1;

      FMTNAME = 'test' ;

      START   = substr(SERIES_ID,4,5) ;

      END     = Start;

      LABEL   = 'test';

      TYPE    = 'C' ;

run;

proc format cntlin=fmt lib=work; run;

DATA HAVE2(drop=x);

INPUT Area_Code $5. Area_Title $25. ;

x=put(Area_code,$test.);

if x='test' then output;

CARDS;

01000 Alabama -- Statewide

01001 Autauga County, Alabama

01003 Baldwin County, Alabama

01005 Barbour County, Alabama

01007 Bibb County, Alabama

01009 Blount County, Alabama

01011 Bullock County, Alabama

01013 Butler County, Alabama

01015 Calhoun County, Alabama

01017 Chambers County, Alabama

01019 Cherokee County, Alabama

01021 Chilton County, Alabama

;

RUN;

Thanks,

Shiva

ballardw
Super User

Or make a format from Data2 and use that to add the label.

Data makeformat (keep=start label fmtname);

     set data2;

     rename (variable name for area code not mentioned)=start (variable name of area title)=label;

     fmtname = "$AreaLabel"; /* or what every you like*/

run;

proc format cntlin=makeformat library=(library you would like to keep it in);

/* make sure the library is in the format search path. If you use a permanent library then the code above only needs

to be run when you have a change in your area titles. Since those codes look like counties then they aren't likely to

change often*/

use as :

data want;

   set data1 (or what ever your first data set is named);

   AreaTitle = put(substr(id, 4,8),$AreaLabel.);

run;

BTW, I would be careful about calling FIPS codes area code, you will generate confusion.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1159 views
  • 0 likes
  • 6 in conversation