I have to match one column of dataset with other columns and write the corresponding value of one more column and put it in a new column. I know, it must be sounding too complicated so I have created following dummy datasets.
DATASET | New | |||
ID | id_1 | id_2 | id_3 | P |
A123 | A125 | A127 | A124 | 1 |
A124 | A126 | A123 | A127 | 2 |
A125 | A123 | A127 | A124 | 3 |
A126 | A124 | A123 | A125 | 4 |
A127 | A125 | A123 | A126 | 5 |
Dataset | Want | |||
ID | P1 | P2 | P3 | |
A123 | 3 | 5 | 2 | |
A124 | 4 | 1 | 5 | |
A125 | 1 | 5 | 2 | |
A126 | 2 | 1 | 3 | |
A127 | 3 | 1 | 4 |
I am matching ID column with id_1 and writing the corresponding P value in a new P1 column, since there are three columns id_1, id_2 and id_3 so the output will also have P1, P2, P3. I have written the following code but its not working.
data want;
set new;
array new{5} ID P id_1-id_3 ;
array p_{3} $ 64;
do i = 1 to 3;
if new{ID} = id_{i}
then new{p_{i}} = P ;
end;
run;
Would anybody please tell me whats wrong with my code. Thanks in Advance.
PROC FORMAT;
VALUE $ IDFMT
'A123'=1
'A124'=2
'A125'=3
'A126'=4
'A127'=5;
RUN;
DATA WANT;
SET NEW;
FORMAT ID_1 ID_2 ID_3 $IDFMT. ;
RUN;
Create the format based on value of ID and P then apply the format.
Define "big". Value formats are only limited by the memory available, so you can have formats with quite a high cardinality.
A slight variation on @RahulG's way of creating the format, and instead of just assigning the format, creating new variables:
data cntlin;
set new (
keep=ID P
rename=(ID=start P=label)
);
fmtname = "$idfmt";
run;
proc format library=work cntlin=cntlin;
run;
data want (keep=ID P:);
set new (drop=P);
array P_ {3} P1-P3;
array id_ {3} id_1-id_3;
do i = 1 to 3;
p_{i} = input(put(id_{i},$idfmt.),best.);
end;
run;
data new;
input ID $ id_1 $ id_2 $ id_3 $ P ;
datalines;
A123 A125 A127 A124 1
A124 A126 A123 A127 2
A125 A123 A127 A124 3
A126 A124 A123 A125 4
A127 A125 A123 A126 5
;
data fmt(DROP = ID id_1 id_2 id_3 P);
set new;
fmtname="$IDFMT";
START=ID;
LABEL=P;
RUN;
PROC FORMAT CNTLIN=fmt; RUN;
DATA WANT (RENAME =(ID_1 =P1 id_2=P2 id_3=P3 )DROP=P) ;
SET NEW;
FORMAT ID_1 id_2 id_3 $IDFMT.;
RUN;
data new; input ID $ id_1 $ id_2 $ id_3 $ P ; datalines; A123 A125 A127 A124 1 A124 A126 A123 A127 2 A125 A123 A127 A124 3 A126 A124 A123 A125 4 A127 A125 A123 A126 5 ; run; data want; if _n_=1 then do; if 0 then set new; declare hash h(dataset:'new',hashexp:20); h.definekey('id'); h.definedata('p'); h.definedone(); end; set new; array _id{*} $ id_1-id_3; array _p{*} p_1-p_3; do i=1 to dim(_id); call missing(p); rc=h.find(key:_id{i}); _p{i}=p; end; drop i p id_: rc; run;
I am new to SAS coding and I really want to know whats wrong in my set of codes. If anybody could highlight my mistakes, it will be a great learning for me. Thanks in advance.
Hi!
First of all, anybody who works with SAS and aspires to be more than just a button-pusher MUST learn to read and interpret the SAS log.
So, let's take a look at your code from the log side.
First, here the code I stole shamelessly from @RahulG for test data:
data new;
input ID $ id_1 $ id_2 $ id_3 $ P ;
datalines;
A123 A125 A127 A124 1
A124 A126 A123 A127 2
A125 A123 A127 A124 3
A126 A124 A123 A125 4
A127 A125 A123 A126 5
;
run;
Now, the log of your initial program:
27 data want; 28 set new; 29 array new{5} ID P id_1-id_3 ; ERROR: All variables in array list must be the same type, i.e., all numeric or character. 30 array p_{3} $ 64; 31 do i = 1 to 3; 32 if new{ID} = id_{i} ERROR: Undeclared array referenced: id_. 33 then new{p_{i}} = P ; ERROR: Variable id_ has not been declared as an array. 34 end; 35 run;
The first complaint of SAS (line 29) is that you tried to define an array with multiple data types. Arrays can only contain either numeric or character variables.
The second complaint (line 32): you did not define a second array id_, so you can't use it.
The next problem (also in line 32) is the use of ID as an index to an array. ID is of type character, but array indexes have to be numeric. This might work if ID only contains numeric values, as SAS usually does type-casts on its own (and writes a corresponding note to the log).
SAS did not complain about that because that would be a runtime error, and this data step never reached execution phase.
This problem is repeated when you try to use an element of p_ (which is also declared as an array with variables of type character) as an index to array "new" in line 33.
Bottom line: your code has so many mistakes that I strongly recommend to do some basic learning examples about the workings of arrays in SAS before trying to solve a more complicated problem with arrays.
I also advise that you try the solutions suggested by us and see where they take you.
Your code have many errors. I think @Kurt 'code is right version of yours. data want; set new; array new{5} ID P id_1-id_3 ; <- new{5} $ ID .. Since they are all character variables. array p_{3} $ 64; do i = 1 to 3; if new{ID} = id_{i} <-- What is id_{i} ? you didn't define a arrary id_{*} ,so you can't use it in this way. ID is character variable, you can't use it as a array INDEX . then new{p_{i}} = P ; <-- p_{*} is a character variable, you can't use it as a array INDEX . end; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.