BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
marianhabesland
Calcite | Level 5

 

 

I am trying to combine three data sets: Src_iowaresidents, Src_ms_citizens and Src_utah_records into one data set called Work.Contacts. Each data set by itself is working fine, but when I combine them together something goes wrong. Each data set looks like this:

 

PS. Sorry parts of it are in all caps. My professor prefers it that way.

 

OPTIONS FMTSEARCH = (HypTabs.HypFormats WORK LIBRARY);

DATA WORK.Contact_IA;
RETAIN SSN Inits City StateCd ZipCd;
SET HypTabs.Src_IowaResidents (RENAME =(ZipCd = ZipCdSrc));

Inits = SUBSTR(Initials, 3, 2)||SUBSTR(Initials, 1, 1);
City = PROPCASE(City);
StateCd = "IA";

ZipCd = PUT(ZipCdSrc, 5.);

KEEP SSN Inits City StateCd ZipCd;

LABEL SSN = "Social Security Number"
Inits = "Subject Initials"
City = "City"
StateCd = "State Code"
ZipCd = "Zip Code";

FORMAT StateCd $StateCd.;
RUN;

PROC SORT;
BY SSN;
RUN;


DATA WORK.Contact_MS;
RETAIN SSN Inits City StateCd ZipCd;
SET HypTabs.Src_MS_Citizens (RENAME =(SocSecNum = SSN));
Inits = COMPRESS(CATS(FirstInit, MiddleInit, LastInit), '.');
City = SCAN(CityState, 1, ',');
StateCd = "MS";

KEEP SSN Inits City StateCd ZipCd;

LABEL SSN = "Social Security Number"
Inits = "Subject Initials"
City = "City"
StateCd = "State Code"
ZipCd = "Zip Code";
FORMAT StateCd $StateCd.;
RUN;

PROC SORT;
BY SSN;
RUN;


DATA WORK.Contact_UT;
RETAIN SSN Inits City StateCd ZipCd;
SET HypTabs.Src_UT_Records ( RENAME = ( Inits = InitsTemp
ZipCode = ZipCd )
);

SSN1 = INPUT(ID, 12.);
SSN = PUT(SSN1, SSN11.);

Inits = COMPRESS(InitsTemp, '.');

City = SCAN(CitySt, 1, ',');

StateCd = "UT";

KEEP SSN Inits City StateCd ZipCd;

LABEL SSN = "Social Security Number"
Inits = "Subject Initials"
City = "City"
StateCd = "State Code"
ZipCd = "Zip Code";
FORMAT StateCd $StateCd.;
RUN;

PROC SORT;
BY SSN;
RUN;

 

DATA WORK.Contact (LABEL = "Contact Information");
SET HypTabs.Src_IowaResidents
HypTabs.Src_MS_Citizens
HypTabs.Src_UT_Records;
KEEP SSN Inits City StateCd ZipCd;
RUN;

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

You created 3 datasets named with prefix: work.contact_

but you use the original source datasets:

 

DATA WORK.Contact (LABEL = "Contact Information");
SET HypTabs.Src_IowaResidents
        HypTabs.Src_MS_Citizens
        HypTabs.Src_UT_Records;
KEEP SSN Inits City StateCd ZipCd;
RUN;

 

Replace the 3 input datasets to those you created .

 

 

View solution in original post

12 REPLIES 12
Reeza
Super User

@marianhabesland wrote:

 

 

I am trying to combine three data sets: Src_iowaresidents, Src_ms_citizens and Src_utah_records into one data set called Work.Contacts. Each data set by itself is working fine, but when I combine them together something goes wrong. 


 

It helps if you show what you tried and what went wrong. 

 

Nothing shown in the code here combines data and you don’t say what’s 'wrong'. 

marianhabesland
Calcite | Level 5

Sorry about that. Just realized part of the code got cut off. It is now marked in red. As for what goes wrong the data set WORK.Contacts does not even show up in the output. The only error message I get is 

 

ERROR: Variable ZipCd has been defined as both character and numeric.

 

This does not really make sense to me because when I look at the data sets ZipCd is always a character variable.

Shmuel
Garnet | Level 18

Run proc contents with each of the 3 datasets and check type of ZIPCD.

You will find which are numeric and which are not. 

 

The given code itself is not enough to guess the type.

marianhabesland
Calcite | Level 5

Thanks for your reply. I have checked and it is a character variable in all three data sets. I have even put in a statement to ensure this. This is the full message I get if that gives any more clues to the problem:

 

NOTE: Data file HYPTABS.SRC_IOWARESIDENTS.DATA is in a format that is native to another host, or the file encoding does not match
the session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might
reduce performance.
NOTE: Data file HYPTABS.SRC_MS_CITIZENS.DATA is in a format that is native to another host, or the file encoding does not match the
session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce
performance.
ERROR: Variable ZipCd has been defined as both character and numeric.
NOTE: Data file HYPTABS.SRC_UT_RECORDS.DATA is in a format that is native to another host, or the file encoding does not match the session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce
performance.
Reeza
Super User

ERROR: Variable ZipCd has been defined as both character and numeric.

 

Post the proc contents output from the three runs please.

marianhabesland
Calcite | Level 5

The CONTENTS Procedure

HYPTABS.SRC_IOWARESIDENTS196
DATA9
V90
11:17:29 15.09.201780
11:17:29 15.09.20170
 NO
 NO
  
WINDOWS_64 
wlatin1 Western (Windows) 
 Engine/Host Dependent InformationData Set Page SizeNumber of Data Set PagesFirst Data PageMax Obs per PageObs in First Data PageNumber of Data Set RepairsExtendObsCounterFilenameRelease CreatedHost CreatedInode NumberAccess PermissionOwner NameFile SizeFile Size (bytes)
65536
1
1
817
196
0
YES
/folders/myfolders/BIOS6680/Hypertension Study/1 - Data/2 - Tabulations/src_iowaresidents.sas7bdat
9.0401M2
X64_8PRO
77
rwxrwx---
root
128KB
131072
 Alphabetic List of Variables and Attributes# Variable Type Len937281645
BirthDtNum8
CityChar20
EthnicityChar12
InitialsChar4
RaceChar5
SSNChar11
SexChar6
StateChar4
ZipCdNum8

The CONTENTS Procedure

 Data Set NameObservationsMember TypeVariablesEngineIndexesCreatedObservation LengthLast ModifiedDeleted ObservationsProtectionCompressedData Set TypeSortedLabel Data Representation Encoding 
HYPTABS.SRC_MS_CITIZENS207
DATA10
V90
12:46:45 15.09.2017110
12:46:45 15.09.20170
 NO
 NO
  
WINDOWS_64 
wlatin1 Western (Windows) 
 Engine/Host Dependent InformationData Set Page SizeNumber of Data Set PagesFirst Data PageMax Obs per PageObs in First Data PageNumber of Data Set RepairsExtendObsCounterFilenameRelease CreatedHost CreatedInode NumberAccess PermissionOwner NameFile SizeFile Size (bytes)
65536
1
1
594
207
0
YES
/folders/myfolders/BIOS6680/Hypertension Study/1 - Data/2 - Tabulations/src_ms_citizens.sas7bdat
9.0401M2
X64_8PRO
82
rwxrwx---
root
128KB
131072
 Alphabetic List of Variables and Attributes# Variable Type Len51082743916
CityStateChar35
DOBChar9
EthChar22
FirstInitChar2
GenderChar6
LastInitChar2
MiddleInitChar2
RacialChar16
SocSecNumChar11
ZipCdChar5

The CONTENTS Procedure

 Data Set NameObservationsMember TypeVariablesEngineIndexesCreatedObservation LengthLast ModifiedDeleted ObservationsProtectionCompressedData Set TypeSortedLabel Data Representation Encoding 
HYPTABS.SRC_UT_RECORDS195
DATA10
V90
09:08:11 15.09.201772
09:08:11 15.09.20170
 NO
 NO
  
WINDOWS_64 
wlatin1 Western (Windows) 
 Engine/Host Dependent InformationData Set Page SizeNumber of Data Set PagesFirst Data PageMax Obs per PageObs in First Data PageNumber of Data Set RepairsExtendObsCounterFilenameRelease CreatedHost CreatedInode NumberAccess PermissionOwner NameFile SizeFile Size (bytes)
65536
1
1
908
195
0
YES
/folders/myfolders/BIOS6680/Hypertension Study/1 - Data/2 - Tabulations/src_ut_records.sas7bdat
9.0401M2
X64_8PRO
83
rwxrwx---
root
128KB
131072
 Alphabetic List of Variables and Attributes# Variable Type Len10983651274
BirthDayNum8
BirthMonthNum8
BirthYearNum8
CityStChar24
EthnicityCodeChar1
GenderCodeChar1
IDNum8
InitsChar6
RaceCodeChar1
ZipCodeChar5
 
So I can tell from the PROC CONTENTS that ZipCd is a numerical variable in the Src_iowaresidents data set. In my output it is however a character variable as I define that in my code. Is there someway I can fix this so that it becomes a character variable so that I can combine the datasets?
 
 
 
Shmuel
Garnet | Level 18

Add to your code, starting with:

DATA WORK.Contact_IA;
RETAIN SSN Inits City StateCd ZipCd;
SET HypTabs.Src_IowaResidents (RENAME =(ZipCd = ZipCdSrc));

a line:

DATA WORK.Contact_IA;
LENGTH zipcd $5;       /*** line to add ***/
RETAIN SSN Inits City StateCd ZipCd;
SET HypTabs.Src_IowaResidents (RENAME =(ZipCd = ZipCdSrc));

I hope it will sove your problem.

ballardw
Super User

PLEASE edit your post to remove any displayed SSN. I may be paranoid about such things but better safe than sorry.

 

Initials + residence+ zip code and SSN could compromise an individual's SSN for exploitation.

Reeza
Super User

@ballardw I sort of assumed with the word professor there it would be fake data...but they are valid SSNs and do map to people. And it's also weird that US even has online systems that show me that....

marianhabesland
Calcite | Level 5
I totally understand your concern. This is however 100% fictional data that was made up for academic use. I’ll make sure to edit it though so no one else gets worried.
Shmuel
Garnet | Level 18

You created 3 datasets named with prefix: work.contact_

but you use the original source datasets:

 

DATA WORK.Contact (LABEL = "Contact Information");
SET HypTabs.Src_IowaResidents
        HypTabs.Src_MS_Citizens
        HypTabs.Src_UT_Records;
KEEP SSN Inits City StateCd ZipCd;
RUN;

 

Replace the 3 input datasets to those you created .

 

 

marianhabesland
Calcite | Level 5

Yes, that makes sense. Silly mistake. Thank you!!

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2860 views
  • 0 likes
  • 4 in conversation