I'm attempting to transform my if logic into something more concise. The sample I provide is much smaller than the real data:
data have;
infile datalines;
input list $ 1-3 id $ 5-7 price $ 9-10;
datalines;
BAC BB8 4a
BAC 22C
BAC DDD 5b
CBA 6c
CBA 22A 4b
CBA 10B 4b
CBA 11A 5f
DAC 2CC
DAC 11C 2e
DAC 3e
;
run;
data want;
set have;
if list='BAC' and id='BB8' and price='4a' then code='BBB';
if list='BAC' and id='22C' and price=' ' then code='CCC';
if list='BAC' and id='DDD' and price='5b' then code='DDD';
if list='CBA' and id=' ' and price='6c' then code='FFF';
if list='CBA' and id='22A' and price='4b' then code='FFF';
if list='CBA' and id='10B' and price='4b' then code='GGG';
if list='CBA' and id='11A' and price='5f' then code='HHH';
if list='DAC' and id='2CC' and price=' ' then code='III';
if list='DAC' and id='11C' and price='2e' then code='BBB';
if list='DAC' and id=' ' and price='3e' then code='BBB';
run;
So I use those if statements to create a new field. But I don't want to have tons of if statements for every variation - I'd rather automate it somehow. Is this possible?
You could make a dataset RECODES with all the combinatinos of list/id/price and resulting code. Then use it as a hash lookup table in a subsequent data set, as in:
data recodes;
input (list id price code) ($char4.);
datalines;
BAC BB8 4a BBB
BAC 22C CCC
BAC DDD 5b DDD
CBA 6c FFF
CBA 22A 4b FFF
CBA 10B 4b GGG
CBA 11A 5f HHH
DAC 2CC III
DAC 11C 2e BBB
DAC 3e BBB
run;
data want (drop=rc);
set have;
if _n_=1 then do;
if 0 then set recodes;
declare hash h (dataset:'recodes');
h.definekey('list','id','price');
h.definedata('code');
h.definedone();
end; /* Added after @LaurieF's note */
rc=h.find();
if rc^=0 then code=' ';
run;
The variable RC is set to zero with a successful lookup. If it's not a zero then that combinatino of list/id/price was not in the recodes dataset, so code should be set to missing.
Perhaps you are looking for the SELECT-WHEN statement. An explanation and examples are available in the article "The SELECT statement in the SAS DATA step."
That depends. Do you have the logic for your mappings somewhere? It has to come from somewhere.
Creating a table you can join to as a lookup.
One refactoring that you should look into, is combining all conditions into one where the outcome is the same.
ie all conditions that map to BBB should have a single IF/THEN condition. In your example you have three different IF statements for this single value.
That depends. Do you have the logic for your mappings somewhere? It has to come from somewhere.
Creating a table you can join to as a lookup.
One refactoring that you should look into, is combining all conditions into one where the outcome is the same.
ie all conditions that map to BBB should have a single IF/THEN condition. In your example you have three different IF statements for this single value.
Yes, I do have a mapping document with all the codes that correspond with each line of logic.
@Reeza How would I do a join on a table for a lookup? - I have a table with all of the values and corresponding codes in it.
You could make a dataset RECODES with all the combinatinos of list/id/price and resulting code. Then use it as a hash lookup table in a subsequent data set, as in:
data recodes;
input (list id price code) ($char4.);
datalines;
BAC BB8 4a BBB
BAC 22C CCC
BAC DDD 5b DDD
CBA 6c FFF
CBA 22A 4b FFF
CBA 10B 4b GGG
CBA 11A 5f HHH
DAC 2CC III
DAC 11C 2e BBB
DAC 3e BBB
run;
data want (drop=rc);
set have;
if _n_=1 then do;
if 0 then set recodes;
declare hash h (dataset:'recodes');
h.definekey('list','id','price');
h.definedata('code');
h.definedone();
end; /* Added after @LaurieF's note */
rc=h.find();
if rc^=0 then code=' ';
run;
The variable RC is set to zero with a successful lookup. If it's not a zero then that combinatino of list/id/price was not in the recodes dataset, so code should be set to missing.
Hi:
And, in addition to the elegant HASH approach, a slightly different approach with a user-defined format for a lookup list is how I was taught to do it in the golden olden days before HASH. (If we already had a table with the values we needed for the format, then we used the CNTLIN technique with PROC FORMAT to make the list.)
cynthia
Or old-fashioned and simpler to code and understand, but probably much slower than a hash table:
data HAVE(drop=CODE)
CODE(index=(A=(LIST ID PRICE )));
input LIST $ 1-3 ID $ 5-7 PRICE $ 9-10 CODE $ 12-14;
cards;
BAC BB8 4a BBB
BAC 22C CCC
BAC DDD 5b DDD
CBA 6c FFF
CBA 22A 4b FFF
CBA 10B 4b GGG
CBA 11A 5f HHH
DAC 2CC III
DAC 11C 2e BBB
DAC 3e BBB
run;
data WANT ;
set HAVE;
set CODE key=A;
if _IORC_ = %sysrc(_dsenom) then _ERROR_=0;
run;
@mkeintz …with an 'end;' statement to terminate the 'if _n_ = 1 then do;', of course!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.