BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
CynthiaWei
Obsidian | Level 7

Hi SAS Pros,

 

I am having a dataset representing vehicle model numbers like follows:

ID Date Model_Code
1 1/3/2020 170516458
1 2/5/2020 89076153
1 4/19/2020 3157604
1 6/3/2020 489601560
1 8/20/2020 6780660
2 3/27/2020 3157604
2 7/1/2020 3157604
2 8/23/2020 41509613
3 1/10/2020 489601560
3 3/5/2020 489601560
3 6/19/2020 170516458
3 7/20/2020 89076153
3 8/22/2020 3157604

 

I want to create two new variables to represent Model 1 (when Model_code=170516458, 89076153, or 3157604) and Model 2 (when Model_code=489601560, 6780660, or 41509613) as below:

ID Date Model_Code Model_1 Model_2
1 1/3/2020 170516458 1 0
1 2/5/2020 89076153 1 0
1 4/19/2020 3157604 1 0
1 6/3/2020 489601560 0 1
1 8/20/2020 6780660 0 1
2 3/27/2020 3157604 1 0
2 7/1/2020 3157604 1 0
2 8/23/2020 41509613 0 1
3 1/10/2020 489601560 0 1
3 3/5/2020 489601560 0 1
3 6/19/2020 170516458 1 0
3 7/20/2020 89076153 1 0
3 8/22/2020 3157604 1 0

 

I also upload an Excel file that includes the code for Model 1 and Model 2. I know it would be easy if there are only 3 different codes for each model, however, my actual dataset has about 20,000 different codes that belong to Model 1 and about 10,000 different codes that fall into Model 2, so it is impossible to type every code for each model. So, how can I create the two new variable: Model_1 and Model_2 by assigning the codes/values that are listed in the Excel file(s)?

 

I really appreciate any help to solve this problem!

 

Best regards,

 

C

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

That's what the data step hash object is built for:

data want;
set have;
if _n_ = 1
then do;
  declare hash m1 (dataset:"model_code1");
  m1.definekey('model_code');
  m1.definedone();
  declare hash m2 (dataset:"model_code2");
  m2.definekey('model_code');
  m2.definedone();
end;
model_1 = not m1.check();
model_2 = not m2.check();
run;

The NOT is necessary because the hash object methods return a zero (logically false) when successful. Although the code may not be intuitive at first glance, the two assignment statements are the quickest way to achieve your objective.

Code is not tested, for lack of readily usable example data.

 

Edit: fixed some typos in the code. Code is now tested with this example data:

data have;
input ID $ Date :mmddyy10. Model_Code :$10.;
format date yymmdd10.;
datalines;
1 1/3/2020  170516458
1 2/5/2020  89076153
1 4/19/2020 3157604
1 6/3/2020  489601560
1 8/20/2020 6780660
2 3/27/2020 3157604
2 7/1/2020  3157604
2 8/23/2020 41509613
3 1/10/2020 489601560
3 3/5/2020  489601560
3 6/19/2020 170516458
3 7/20/2020 89076153
3 8/22/2020 3157604
;

data model_code1;
input model_code $10.;
datalines;
170516458
89076153
3157604
;

data model_code2;
input model_code $10.;
datalines;
489601560
6780660
41509613
;

Note that data steps with datalines (or other means to create the dataset in one step) are the preferred way to show example data.

View solution in original post

4 REPLIES 4
Kurt_Bremser
Super User

That's what the data step hash object is built for:

data want;
set have;
if _n_ = 1
then do;
  declare hash m1 (dataset:"model_code1");
  m1.definekey('model_code');
  m1.definedone();
  declare hash m2 (dataset:"model_code2");
  m2.definekey('model_code');
  m2.definedone();
end;
model_1 = not m1.check();
model_2 = not m2.check();
run;

The NOT is necessary because the hash object methods return a zero (logically false) when successful. Although the code may not be intuitive at first glance, the two assignment statements are the quickest way to achieve your objective.

Code is not tested, for lack of readily usable example data.

 

Edit: fixed some typos in the code. Code is now tested with this example data:

data have;
input ID $ Date :mmddyy10. Model_Code :$10.;
format date yymmdd10.;
datalines;
1 1/3/2020  170516458
1 2/5/2020  89076153
1 4/19/2020 3157604
1 6/3/2020  489601560
1 8/20/2020 6780660
2 3/27/2020 3157604
2 7/1/2020  3157604
2 8/23/2020 41509613
3 1/10/2020 489601560
3 3/5/2020  489601560
3 6/19/2020 170516458
3 7/20/2020 89076153
3 8/22/2020 3157604
;

data model_code1;
input model_code $10.;
datalines;
170516458
89076153
3157604
;

data model_code2;
input model_code $10.;
datalines;
489601560
6780660
41509613
;

Note that data steps with datalines (or other means to create the dataset in one step) are the preferred way to show example data.

CynthiaWei
Obsidian | Level 7

Hi,

 

I am sorry for my late response. It works perfectly!

 

I really appreciate it!

 

Best regards,

Shmuel
Garnet | Level 18

You can use next code or adapt it if you have more then two groups of models:

libname models xlsx "<path>/Model1and2Codes.xlsx";
data cntl;
  set models.sheet1;
      retain fmtname 'modcode';
	  start=model_1_code; label=1; output;
	  start=model_2_code; label=2; output;
run;
proc format lib=work cntlin=cntl; run;

data want;
 set have;
     model_type = strip(put(model_code,modcode.));
	 if model_type = '1' then do;
	    Model_1 = 1; model_2=0; 
	 end; else
	 if model_type = '2' then do;
	    Model_1 = 0; model_2=1; 
	 end; 
     drop model_type; /* or keep it instead the 2 variables Model_1 , Model_2 */
run;
CynthiaWei
Obsidian | Level 7

Hi,

 

Thank you so much for this extension a lot! I really appreciate it very much! It is going to be very useful for my future data management!

 

Best regards,

 

C

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 736 views
  • 0 likes
  • 3 in conversation