DATA Step, Macro, Functions and more

Treating '.' as a character in group by

Reply
N/A
Posts: 0

Treating '.' as a character in group by

Hi,
We are using SAS V8.2 in our Mainframe work shop.

My requirement is to calculate count of unique variables. Here is the code I have used.

DATA INPUT1;
INFILE INPUT1;
INPUT MNAME $ 1-25;
RUN;

DATA NEWFI;
SET INPUT1;
BY MNAME;
IF FIRST.MNAME THEN SUM=0;
SUM + 1;
IF LAST.MNAME THEN OUTPUT;

RUN;

PROC SORT DATA=NEWFI;
BY DESCENDING SUM;

PROC PRINT;

I have input data in the format

----+----1----+----2----+
(spaces)
(spaces)
(spaces)
. (one dot)

I am expecting the output as

MNAME SUM
---------------------------------------
(SPACES) 3
. 1

But I am receiving the output as

MNAME SUM
---------------------------------------
(SPACES) 4

Kindly correct the above code to get the expected result.

Thanks,
kris Corrected the SAS output display.


Message was edited by: kris_madras
Super Contributor
Super Contributor
Posts: 3,174

Re: Treating '.' as a character in group by

Review the SASLOG output closely - you will see a SAS diagnostic message which you may have overlooked, shown below:

NOTE: SAS went to a new line when INPUT statement reached past the end of a line.

Also, you most likely do not have a fixed-length (RECFM=F) record/file layout providing data (again info revealed in the INFILE diagnostic of your SAS log) in columns 1 through 25 on each record, so SAS is assigning a missing value.

First you must specify TRUNCOVER or MISSOVER, possibly -- review the INFILE statement coding. And second you want to review the INPUT statement processing to handle the shorter record length, rather than using the INPUT statement you have coded. I don't believe that this behavior is unique to the mainframe either - it is related to the input file layout and record format for your data.

For diagnostic information (self-debugging), consider adding either a :

LIST;

...or.... (tip: get to SAS 9 as soon as possble so you can use PUTLOG -- and so you can be on a supported SAS version):

FILE LOG;
PUT _ALL_;


Also, consider adding a sort step for your input file before you use BY processing. And I would encourage you to code a LENGTH statement to declare MNAME length explicitly.

Scott Barry
SBBWorks, Inc.
N/A
Posts: 0

Re: Treating '.' as a character in group by

Thanks Scott.. will check one by one as you suggested and update in this thread...
Frequent Contributor
Posts: 127

Re: Treating '.' as a character in group by

Hello Kris,

Here is a small example (based on the code you pasted) that you could easily adapt upon your needs Smiley Happy

Regards,
Florent



DATA INPUT1 (drop= i);
format MNAME $25.;

/* Spaces */
do i=1 to 3;
MNAME=' ';
output;
end;

/* Text */
do i=1 to 2;
MNAME='This is some text';
output;
end;

/* Dot */
do i=1 to 6;
MNAME=.;
output;
end;
run;

proc sql;
create table NEWFI as
select MNAME,
count(missing(MNAME)) as SUM
from INPUT1
group by MNAME
order by SUM desc;
quit;
Super User
Posts: 10,550

Re: Treating '.' as a character in group by

One thought, if the sole purpose of the second data step is to get a count of the unique variables it has one problem in that the data should be sorted by the variable you are trying to count.

Another approach would be to use some thing like:

Proc freq data=INPUT1 order=freq;
table mname /nopercent missing;
run;

Your example actually didn't have spaces but were null strings (empty).
From the online help for input statement, column:

Both leading and trailing blanks within the field are ignored. Therefore, if numeric values contain blanks that represent zeros or if you want to retain leading and trailing blanks in character values, read the value with an informat. See INPUT Statement, Formatted.

Missing Values
Missing data do not require a place-holder. The INPUT statement interprets a blank field as missing and reads other values correctly. If a numeric or character field contains a single period, the variable value is set to missing.
Ask a Question
Discussion stats
  • 4 replies
  • 154 views
  • 0 likes
  • 4 in conversation