Solved: Most efficient way to use long list of alphanumeric values to create n...

sasgorilla · Posted 01-12-2025 12:06 AM

I have a date set with oldvar comprised of a few thousand alphanumeric values. Big chunks of these values (60-100) represent respective organizations. I want to create newvar with a value based on the value of oldvar.

In a similar situation with only a few values for a variable I have done the below.

data want;
     set have;
     If oldvar in ("123abc" "abc234" "789ghi") then newvar=1;
     else if oldvar in ("222xyz" "444qrs" "432opm") then newvar=2;
run;

In this case with a few thousand values I am trying to paste from an excel file, what is the most efficient way to do this without having to put quotation marks around every single value? Thank you.

Tom · Posted 01-12-2025 11:25 AM

@sasgorilla wrote:

Thank you, very helpful. I needed one more step and had one additional question.

Step: Sort by oldvar

*I received an error message because I hadn't sorted by the "by" variable in each data set. This was an easy fix by sorting each of the data sets by oldvar.

Question:

When I used your code with merge, the number of observations increased. Does this mean that there must be some "oldvar" in the "company_groups" that were not in the "have" dataset? Is there an easy way to test for this if many observations in the "have" dataset have a blank or missing value for oldvar?

Yes. To use MERGE the datasets must be sorted (or have an index that can be used to retrieve the values) in the order specified in the BY statement. You could use an SQL join instead, in which case SQL will sort the data for you.

If the number of observations increases there are two possible causes. Like you said it could be that some extra observations were added. But the subsetting IF in my code eliminates those.

data want;
  merge have(in=in1) company_groups;
  by oldvar;
  if in1;
run;

The other reason would be that you have multiple observations with the same value of OLDVAR. Either just plain duplicate observations. Or OLDVAR values that are mapped to more than one NEWVAR value.

You could fix that by first eliminating the duplicates using PROC SORT.

proc sort data=company_groups nodupkey;
  by oldvar;
run;

View solution in original post

Tom · Posted 01-12-2025 12:13 AM

Leave the list in DATA, not CODE.

So make a dataset that has OLDVAR and NEWVAR

data company_groups;
  input oldvar $ newvar ;
cards;
123abc 1
abc234 1
789ghi 1
222xyz 2
444qrs 2
432opm 2
;

then merge it with HAVE.

data want;
  merge have(in=in1) company_groups;
  by oldvar;
  if in1;
run;

sasgorilla · Posted 01-12-2025 12:43 AM

Thank you, very helpful. I needed one more step and had one additional question.

Step: Sort by oldvar

*I received an error message because I hadn't sorted by the "by" variable in each data set. This was an easy fix by sorting each of the data sets by oldvar.

Question:

When I used your code with merge, the number of observations increased. Does this mean that there must be some "oldvar" in the "company_groups" that were not in the "have" dataset? Is there an easy way to test for this if many observations in the "have" dataset have a blank or missing value for oldvar?

Tom · Posted 01-12-2025 11:25 AM

@sasgorilla wrote:

Thank you, very helpful. I needed one more step and had one additional question.

Step: Sort by oldvar

*I received an error message because I hadn't sorted by the "by" variable in each data set. This was an easy fix by sorting each of the data sets by oldvar.

Question:

When I used your code with merge, the number of observations increased. Does this mean that there must be some "oldvar" in the "company_groups" that were not in the "have" dataset? Is there an easy way to test for this if many observations in the "have" dataset have a blank or missing value for oldvar?

Yes. To use MERGE the datasets must be sorted (or have an index that can be used to retrieve the values) in the order specified in the BY statement. You could use an SQL join instead, in which case SQL will sort the data for you.

If the number of observations increases there are two possible causes. Like you said it could be that some extra observations were added. But the subsetting IF in my code eliminates those.

data want;
  merge have(in=in1) company_groups;
  by oldvar;
  if in1;
run;

The other reason would be that you have multiple observations with the same value of OLDVAR. Either just plain duplicate observations. Or OLDVAR values that are mapped to more than one NEWVAR value.

You could fix that by first eliminating the duplicates using PROC SORT.

proc sort data=company_groups nodupkey;
  by oldvar;
run;

Ksharp · Posted 01-12-2025 02:23 AM

You could make a informat by proc format or Hash Table.

data company_groups;
  input start $ label ;
  retain fmtname 'fmt' type 'i' ;
cards;
123abc 1
abc234 1
789ghi 1
222xyz 2
444qrs 2
432opm 2
;
proc format cntlin=company_groups;
run;




data have;
  input oldvar $ ;
cards;
123abc 
abc234 
789ghi 
222xyz 
444qrs 
432opm 
;
data want;
 set have;
 newvar=input(oldvar,fmt.);
run;

PaigeMiller · Posted 01-12-2025 06:11 AM

@sasgorilla wrote:

In this case with a few thousand values I am trying to paste from an excel file, what is the most efficient way to do this without having to put quotation marks around every single value? Thank you.

See this explanation of using the %QLIST macro for avoiding adding quotation marks. https://communities.sas.com/t5/SAS-Programming/Assign-the-values-to-a-macro-variable/m-p/955804#M373...

Or as @Tom said, put these values in a data set rather than code.

--
Paige Miller

Most efficient way to use long list of alphanumeric values to create new variable

Re: Most efficient way to use long list of alphanumeric values to create new variable

Re: Most efficient way to use long list of alphanumeric values to create new variable

Re: Most efficient way to use long list of alphanumeric values to create new variable

Re: Most efficient way to use long list of alphanumeric values to create new variable

Re: Most efficient way to use long list of alphanumeric values to create new variable

Re: Most efficient way to use long list of alphanumeric values to create new variable

Most efficient way to use long list of alphanumeric values to create new variable

Re: Most efficient way to use long list of alphanumeric values to create new variable

Re: Most efficient way to use long list of alphanumeric values to create new variable

Re: Most efficient way to use long list of alphanumeric values to create new variable

Re: Most efficient way to use long list of alphanumeric values to create new variable

Re: Most efficient way to use long list of alphanumeric values to create new variable

Re: Most efficient way to use long list of alphanumeric values to create new variable

Registration is open

SAS Training: Just a Click Away