SAS Programming

BW18 · Posted 05-30-2024 04:47 PM

Hi all,

SAS amateur here! I am trying to figure out how to create 2 new variables and assign values to each observation based on the data contained in a reference file. I have created a mock example below. The first dataset is the main one I need to keep. It contains data on individual stores and several stores will have the same corporate ID. The second dataset is a reference that contains approximately 500 Corporate IDs and addresses, including the State and City. I would like to create 2 new variables for CorpState and CorpCity in Dataset 1 based on the CorpID contained in both files. Is this even possible to do and where do I start? Any help would be greatly appreciated.

Dataset 1 (Have): Main dataset with individual store information

Obs#	CorpID	StoreID	StoreState
1	4002	40021234	AL
2	4002	40021237	MS
3	4003	40035678	TN
4	4100	41007890	WV
5	4110	41103456	NC
6	4110	41103478	NC
7	4597	45972345	WA
8	4975	49750023	WY

Dataset 2 (Have): Reference dataset with corporate information

Obs#	CorpID	CorpState	CorpCity
1	4002	AL	Birmingham
2	4003	GA	Atlanta
3	4004	TN	Nashville
4	4100	SC	Charleston
5	4110	NC	Charlotte
6	4111	FL	Tampa
7	4597	CA	San Francisco
8	4975	WY	Jackson

Want: Dataset 1 file containing store and corporate information

Obs#	CorpID	CorpState	CorpCity	StoreID	StoreState
1	4002	AL	Birmingham	40021234	AL
2	4002	AL	Birmingham	40021237	MS
3	4003	GA	Atlanta	40035678	TN
4	4100	SC	Charleston	41007890	WV
5	4110	NC	Charlotte	41103456	NC
6	4110	NC	Charlotte	41103478	NC
7	4597	CA	San Francisco	45972345	WA
8	4975	WY	Jackson	49750023	WY

ballardw · Posted 05-30-2024 04:54 PM

Proc sort data=data1;
   by corpid;
run;

Proc sort data=data2;
  by corpid;
run;

data want;
   merge data1 data2;
   by corpid;
run;

Basic match merge. To merge sets on the value of one or more variables generally sort by them and then merge as shown.

Proc Sql can also do a join but can require a bit more typing.

View solution in original post

PaigeMiller · Posted 05-30-2024 04:51 PM

NOTE: UNTESTED CODE as the data is not in a usable form. This is a MERGE in a DATA step:

data want;
    merge have1(in=in1) have2;
    by corpid;
    if in1;
run;

This assumes both data sets are sorted by CORPID

--
Paige Miller

ballardw · Posted 05-30-2024 04:54 PM

Proc sort data=data1;
   by corpid;
run;

Proc sort data=data2;
  by corpid;
run;

data want;
   merge data1 data2;
   by corpid;
run;

Basic match merge. To merge sets on the value of one or more variables generally sort by them and then merge as shown.

Proc Sql can also do a join but can require a bit more typing.

BW18 · Posted 05-30-2024 05:20 PM

So basically I was overthinking by trying to create a new variable and assign values when all I needed to do was merge and keep the variables I want or drop those I don't. Thank you!

Unfortunately, I am still having an issue with the merge. It keeps saying that the CorpID is being treated as both character and numeric. I used the following code to convert it to character to match the main dataset, sort the data, and try to merge. The proc contents results show it as character following the first code, but when I tried to merge, I am still getting the same error.

/* Change the CorpID variable in CorpAddresses from Numeric to character and end up with the original variable name */
data N.CorpAdresses;
set N.CorpAddresses;
CorpID_char = put(CorpID, 4.);
drop CorpID;
rename CorpID_char = CorpID;
run;
proc contents data=N.CorpAdresses;
run;

Proc sort data=N.StoreInformation;
by CorpID;
run;

Proc sort data=N.CorpAddresses;
by CorpID;
run;

data StoreInfoWithCorp;
merge N.StoreInformation N.CorpAddresses;
by CorpID;
run;

ERROR: Variable CorpID has been defined as both character and numeric.

PaigeMiller · Posted 05-30-2024 06:03 PM

@BW18 wrote:

It keeps saying that the CorpID is being treated as both character and numeric.

This is not allowed, as you have discovered. We can't really debug your code as we don't have the data, but still one of the two data sets has numeric CorpID and one has character CorpID.

We cannot use data provided as you did in your initial post. Data must be provided as working SAS data step code, following these instructions and examples. Data provided via any other method (like in your first post) is unreliable, and may fail to represent your actual SAS data sets. So I am going to have to insist on you providing data using the instructions at the link.

--
Paige Miller

Tom · Posted 05-30-2024 06:40 PM

Why did you make it numeric in one dataset and character in the other?

I would suggest you fix that first.

In general ID values should be character as you will not be doing arithmetic with the values.

But in converting from numeric to character you need take care that you generate the digit strings in the same way so they match. Values like '123', ' 123', and '0123' might look like the same number to a human but will be three different values as a character string.

BW18 · Posted 05-31-2024 09:12 AM

I did not purposely make it numeric in one dataset and character in the other. As I said previously, I tried to convert the one that SAS was treating as numeric to character to match the main dataset. This data was imported from a text file, so I am not sure if that is why it was reading it as numeric instead of character.

Tom · Posted 05-31-2024 09:29 AM

@BW18 wrote:
I did not purposely make it numeric in one dataset and character in the other. As I said previously, I tried to convert the one that SAS was treating as numeric to character to match the main dataset. This data was imported from a text file, so I am not sure if that is why it was reading it as numeric instead of character.

If you are reading in a TEXT file then you have control over how the variables are created.

If you use PROC IMPORT to read the file then it will have to GUESS how to read the file and the only information is has is what it sees in that particular file.

If you are not sure how to read in a text file you could use the code that PROC IMPORT generated as a starting point. (But note that PROC IMPORT generates poorly designed code that attaches INFORMATs and FORMATs to many variables that do not need them.)

In general to read a delimited file (like a CSV file) the code just needs to look like:

data want;
  infile 'myfile.txt' dsd truncover firstobs=2;
  length varA $20 varB 8 varC $5 ;
  input varA -- varC ;
run;

The DATA statement names the dataset to create.

The INFILE statement names the file to read and sets some processing options. If the delimiter is something other than a comma add the DLM= option to specify the delimiter. If there is no header (or the header is mote than one line) then remove the FIRSTOBS= option or change the number. If the lines are longer than 32K bytes then add the LRECL= option to something that is longer than the longest possible line.

The LENGTH statement defines the variables type and storage length. If you define the variable is the order they appear then the INPUT statement can just use the simple positional variable list as shown.

If any of the variables require special instructions for converting the text in the file into the value to be stored then add an INFORMAT statement and attach the needed informat. Most variables do not need this. The main exceptions are DATE, TIME and DATETIME values. You might also see some text files that have thousands separators or $ added for numbers which can be read using the COMMA informat.

Similarly for FORMAT statement. Most variables will not need to have a format attached. The main exception is DATE, TIME and DATETIME values.

You could also add LABEL statement to attach a descriptive label to one or more of the variables.

Tom · Posted 05-31-2024 09:54 AM

So for example if you have this CSV file. Let's call it main.csv.

CorpID,StoreID,StoreState
4002,40021234,AL
4002,40021237,MS
4003,40035678,TN
4100,41007890,WV
4110,41103456,NC
4110,41103478,NC
4597,45972345,WA
4975,49750023,WY

Then the code to make your MAIN dataset would simply be:

data main;
  infile 'main.csv' dsd truncover firstobs=2;
  length CorpID $20 StoreID $20 StoreState $2 ;
  input CorpID -- StoreState;
run;

Which is almost less code that it would take to call PROC IMPORT to make guesses about what data the file has.

Patrick · Posted 05-30-2024 10:04 PM

@BW18

Please share the output running below code:

proc contents data=N.CorpAdresses(keep=CorpID);
run;
proc contents data=N.StoreInformation(keep=CorpID);
run;

Do your CorpId values in the data set where the variable is of type character have any leading zeros?

BW18 · Posted 05-31-2024 11:52 AM

There are no leading zeros in the CorpIDs. They range from 4001-4975. However, I closed out SAS and reimported the text file containing CorpAddresses. I performed the transformation to a character variable for CorpIDs and it appeared to work this time. Following this, I was able to perform the merge successfully. Thank you!

Patrick · Posted 05-31-2024 11:21 PM

@BW18 wrote:

There are no leading zeros in the CorpIDs. They range from 4001-4975. However, I closed out SAS and reimported the text file containing CorpAddresses. I performed the transformation to a character variable for CorpIDs and it appeared to work this time. Following this, I was able to perform the merge successfully. Thank you!

Good that you've got something working.

Just as a comment: Ideally you read the source string already into a character variable as part of the import step.

SAS Programming

Creating new variables and assigning values with a reference dataset

Re: Creating new variables and assigning values with a reference dataset

Re: Creating new variables and assigning values with a reference dataset

Re: Creating new variables and assigning values with a reference dataset

Re: Creating new variables and assigning values with a reference dataset

Re: Creating new variables and assigning values with a reference dataset

Re: Creating new variables and assigning values with a reference dataset

Re: Creating new variables and assigning values with a reference dataset

Re: Creating new variables and assigning values with a reference dataset

Re: Creating new variables and assigning values with a reference dataset

Re: Creating new variables and assigning values with a reference dataset

Re: Creating new variables and assigning values with a reference dataset

Re: Creating new variables and assigning values with a reference dataset

Most efficient way to use long list of alphanumeric values to create n...

Conditional replacement of a variable values

Assign values to a parameter without controls

Assign the same value by group of variables

Creating sequence variable with partitioning

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...