Solved: How to create a variable/category when it falls in tons of code number...

CynthiaWei · Posted 08-25-2020 01:53 AM

Hi SAS Pros,

I am having a dataset representing vehicle model numbers like follows:

ID	Date	Model_Code
1	1/3/2020	170516458
1	2/5/2020	89076153
1	4/19/2020	3157604
1	6/3/2020	489601560
1	8/20/2020	6780660
2	3/27/2020	3157604
2	7/1/2020	3157604
2	8/23/2020	41509613
3	1/10/2020	489601560
3	3/5/2020	489601560
3	6/19/2020	170516458
3	7/20/2020	89076153
3	8/22/2020	3157604

I want to create two new variables to represent Model 1 (when Model_code=170516458, 89076153, or 3157604) and Model 2 (when Model_code=489601560, 6780660, or 41509613) as below:

ID	Date	Model_Code	Model_1	Model_2
1	1/3/2020	170516458	1	0
1	2/5/2020	89076153	1	0
1	4/19/2020	3157604	1	0
1	6/3/2020	489601560	0	1
1	8/20/2020	6780660	0	1
2	3/27/2020	3157604	1	0
2	7/1/2020	3157604	1	0
2	8/23/2020	41509613	0	1
3	1/10/2020	489601560	0	1
3	3/5/2020	489601560	0	1
3	6/19/2020	170516458	1	0
3	7/20/2020	89076153	1	0
3	8/22/2020	3157604	1	0

I also upload an Excel file that includes the code for Model 1 and Model 2. I know it would be easy if there are only 3 different codes for each model, however, my actual dataset has about 20,000 different codes that belong to Model 1 and about 10,000 different codes that fall into Model 2, so it is impossible to type every code for each model. So, how can I create the two new variable: Model_1 and Model_2 by assigning the codes/values that are listed in the Excel file(s)?

I really appreciate any help to solve this problem!

Best regards,

C

Kurt_Bremser · Posted 08-25-2020 02:31 AM

That's what the data step hash object is built for:

data want;
set have;
if _n_ = 1
then do;
  declare hash m1 (dataset:"model_code1");
  m1.definekey('model_code');
  m1.definedone();
  declare hash m2 (dataset:"model_code2");
  m2.definekey('model_code');
  m2.definedone();
end;
model_1 = not m1.check();
model_2 = not m2.check();
run;

The NOT is necessary because the hash object methods return a zero (logically false) when successful. Although the code may not be intuitive at first glance, the two assignment statements are the quickest way to achieve your objective.

Code is not tested, for lack of readily usable example data.

Edit: fixed some typos in the code. Code is now tested with this example data:

data have;
input ID $ Date :mmddyy10. Model_Code :$10.;
format date yymmdd10.;
datalines;
1 1/3/2020  170516458
1 2/5/2020  89076153
1 4/19/2020 3157604
1 6/3/2020  489601560
1 8/20/2020 6780660
2 3/27/2020 3157604
2 7/1/2020  3157604
2 8/23/2020 41509613
3 1/10/2020 489601560
3 3/5/2020  489601560
3 6/19/2020 170516458
3 7/20/2020 89076153
3 8/22/2020 3157604
;

data model_code1;
input model_code $10.;
datalines;
170516458
89076153
3157604
;

data model_code2;
input model_code $10.;
datalines;
489601560
6780660
41509613
;

Note that data steps with datalines (or other means to create the dataset in one step) are the preferred way to show example data.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 08-25-2020 02:31 AM

That's what the data step hash object is built for:

data want;
set have;
if _n_ = 1
then do;
  declare hash m1 (dataset:"model_code1");
  m1.definekey('model_code');
  m1.definedone();
  declare hash m2 (dataset:"model_code2");
  m2.definekey('model_code');
  m2.definedone();
end;
model_1 = not m1.check();
model_2 = not m2.check();
run;

The NOT is necessary because the hash object methods return a zero (logically false) when successful. Although the code may not be intuitive at first glance, the two assignment statements are the quickest way to achieve your objective.

Code is not tested, for lack of readily usable example data.

Edit: fixed some typos in the code. Code is now tested with this example data:

data have;
input ID $ Date :mmddyy10. Model_Code :$10.;
format date yymmdd10.;
datalines;
1 1/3/2020  170516458
1 2/5/2020  89076153
1 4/19/2020 3157604
1 6/3/2020  489601560
1 8/20/2020 6780660
2 3/27/2020 3157604
2 7/1/2020  3157604
2 8/23/2020 41509613
3 1/10/2020 489601560
3 3/5/2020  489601560
3 6/19/2020 170516458
3 7/20/2020 89076153
3 8/22/2020 3157604
;

data model_code1;
input model_code $10.;
datalines;
170516458
89076153
3157604
;

data model_code2;
input model_code $10.;
datalines;
489601560
6780660
41509613
;

Note that data steps with datalines (or other means to create the dataset in one step) are the preferred way to show example data.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

CynthiaWei · Posted 09-01-2020 08:16 PM

Hi,

I am sorry for my late response. It works perfectly!

I really appreciate it!

Best regards,

Shmuel · Posted 08-25-2020 07:47 AM

You can use next code or adapt it if you have more then two groups of models:

libname models xlsx "<path>/Model1and2Codes.xlsx";
data cntl;
  set models.sheet1;
      retain fmtname 'modcode';
	  start=model_1_code; label=1; output;
	  start=model_2_code; label=2; output;
run;
proc format lib=work cntlin=cntl; run;

data want;
 set have;
     model_type = strip(put(model_code,modcode.));
	 if model_type = '1' then do;
	    Model_1 = 1; model_2=0; 
	 end; else
	 if model_type = '2' then do;
	    Model_1 = 0; model_2=1; 
	 end; 
     drop model_type; /* or keep it instead the 2 variables Model_1 , Model_2 */
run;

CynthiaWei · Posted 09-01-2020 08:17 PM

Hi,

Thank you so much for this extension a lot! I really appreciate it very much! It is going to be very useful for my future data management!

Best regards,

C

How to create a variable/category when it falls in tons of code numbers that are listed in an Excel

Re: How to create a variable/category when it falls in tons of code numbers that are listed in an Ex

Re: How to create a variable/category when it falls in tons of code numbers that are listed in an Ex

Re: How to create a variable/category when it falls in tons of code numbers that are listed in an Ex

Re: How to create a variable/category when it falls in tons of code numbers that are listed in an Ex

Re: How to create a variable/category when it falls in tons of code numbers that are listed in an Ex

How to create a variable/category when it falls in tons of code numbers that are listed in an Excel

Re: How to create a variable/category when it falls in tons of code numbers that are listed in an Ex

Re: How to create a variable/category when it falls in tons of code numbers that are listed in an Ex

Re: How to create a variable/category when it falls in tons of code numbers that are listed in an Ex

Re: How to create a variable/category when it falls in tons of code numbers that are listed in an Ex

Re: How to create a variable/category when it falls in tons of code numbers that are listed in an Ex

SAS Innovate 2025: Call for Content