Re: A question about SAS functions

Khno · Posted 07-28-2021 11:48 AM

Hello All,

I found the below code online. It is used to correct issues with the contribution amounts from the Federal Election Committee database (FEC.gov). I do not understand what exactly this code suppose to do. I would appreciate it if you could please explain the purpose of doing this to the amount column in the data.

***Correcting issues with contribution amount in the data;

data _null_;

amt='008900]';

b=input(compress(amt,"jklmnopqr]","i"),8.)*(-10)-(find(']jklmnopqr',lowcase(substr(amt,notdigit(amt))))-1);;

put b;

run;



%macro amt;

%let yrlist=00 02 04 06 08 80 82 84 86 88 90 92 94 96 98;

%do i=1 %to 15;

%let yr=%scan(&yrlist,&i);



data r.ind&yr;

set r.ind&yr;

if notdigit(amt)=0 then amt_new=input(put(amt,8.),8.);

else

amt_new=input(compress(amt,"jklmnopqr]","i"),8.)*(-10)-(find(']jklmnopqr',lowcase(substr(amt,notdigit(amt))))-1);

run;



%end;



%mend;



%amt;

jimbarbour · Posted 07-28-2021 11:56 AM

Presumably, amounts should be numeric. The Compress function compresses out (removes) unwanted characters. It appears that they are removing alphabetic characters. There are other ways to do this such as coding 'kd' (keep digits) as the third parameter of Compress, but here they're coding the characters individually, and they're just coding "jklmnopqr". I can't tell you why those specific characters would be important.

You can see that the Compress is only done if a non-numeric is found in this If statement:

if  notdigit(amt)=0 then amt_new=input(put(amt,8.),8.);
else
    amt_new=input(compress(amt,"jklmnopqr]","i"),8.)*(-10)-(find(']jklmnopqr',lowcase(substr(amt,notdigit(amt))))-1);

Jim

Reeza · Posted 07-28-2021 11:58 AM

If you compare the input to the output that should give you a general idea of what's happening, in general it seems to be cleaning the field.

COMPRESS() removes characters specified.
FIND() searches for characters and returns the index when the specified character is found.
NOTDIGIT() checks if any non digit characters are in a string.
SUBSTR() takes part of a string
LOWCASE() converts everything to lower case
INPUT() converts a character to numeric.

Now tie those to what you have as the input value and see how it can be used to create your output value.

Is amt='0089000]' your example of how the the amt field originally looks? That wasn't clear from your example or code.

Khno · Posted 07-28-2021 12:23 PM

Thank you all for the explanations.

Reeza,
amt='0089000]' is part of the code.

Bulk data from FEC is fairly clean and that is why I was wondering why this code is needed as I think the amount column looks a normal numerical variable.

Now I have an idea of what this code is supposed to do but I could not get why the code multiply by -10 and subtract one (-1). These are amounts, is this changing the amounts in any how?

Reeza · Posted 07-28-2021 01:09 PM

Again, you'll need to check the before and after. Adding a bunch of print statements (PUT in SAS) is old school but it helps you debug these things quite well.

%macro amt;

%let yrlist=00 02 04 06 08 80 82 84 86 88 90 92 94 96 98;

%do i=1 %to 15;

%let yr=%scan(&yrlist,&i);

%put &yr.;


data r.ind&yr;

set r.ind&yr;
put "AMT = " amt;
if notdigit(amt)=0 then amt_new=input(put(amt,8.),8.);

else

amt_new=input(compress(amt,"jklmnopqr]","i"),8.)*(-10)-(find(']jklmnopqr',lowcase(substr(amt,notdigit(amt))))-1);

put "AMT_NEW=" AMT_NEW;

run;



%end;



%mend;



%amt;

Reeza · Posted 07-28-2021 02:13 PM

I should clarify Bulk data from FEC is fairly clean now....it's very much possible that it wasn't when the coder was working on it.
If I was doing this, I'd verify the file was being read correctly based on current specifications and if there wasn't a way to fix some of this using formats as indicated by @ballardw.
I've seen that issue with public data feeds before over time....

ballardw · Posted 07-28-2021 12:06 PM

Which specific functions do you not understand.

One way to see what is happening in specific is to pull the syntax apart (good lesson itself);

Such as

data junk;

amt='008900]';
A = compress(amt,"jklmnopqr]","i");
b = input(A,8.);
c = notdigit(amt);
d = substr(amt,c);
e = lowcase(d);
f = find(']jklmnopqr',d);

run;

Apparently the coder wanted a numeric value from a field that is supplied with additional codes , the values that appear in the string "jklmnopqr]". The various functions find the bits not related to the numeric value and remove them using some external not provided rules to create a desired range.

The macro then dangerously loops through a number of similarly named data sets and hopefully adds a similarly created numeric value.

Why do I say "dangerously"? Any code that uses the same data set as the source, Set statement, and output, Data statement, completely replaces the source data set if no errors are encountered. A logic error or typo might accidentally replace a value not intended.

Reeza · Posted 07-28-2021 12:08 PM

Is someone scraping data or downloading from the bulk files? The bulk files are fairly clean are they not?

FreelanceReinh · Posted 07-28-2021 02:01 PM

Hello @Khno and welcome to the SAS Support Communities!

It looks like the purpose of this code is to read (so called) zoned decimal values, likely originating from an IBM mainframe system, into a numeric variable. See https://en.wikipedia.org/wiki/Binary-coded_decimal#Zoned_decimal for the explanation how negative values are coded by replacing the last digit (0, 1, 2, 3, ..., 9) with a particular non-numeric character (}, J, K, L, ..., R) -- where "}" might "vary depending on the local character code page setting."

Then the multiplication by -10 and the subtraction of 1 in your code make perfect sense.

It's possible, though, that you don't even need this manual conversion because SAS provides informats to deal with various sorts of zoned decimal data, for example the ZDw.d informat.

Example:

data _null_;
input amt $;
b=input(translate(amt,'}',']'),zd7.);
put b 6.;
cards;
008900]
008900J
008900K
008900R
0089000
0089002
;

Result (on a German Windows system):

As mentioned above, using the TRANSLATE function might not be necessary on other systems.

Also, note that the PUT function in

input(put(amt,8.),8.);

is used inappropriately (and can be omitted), assuming variable amt is already character.

SAS Innovate 2025: Call for Content

Classroom Training Available!