Questions for beginner stuff for SAS 9.4

Occasional Contributor
Posts: 12

Questions for beginner stuff for SAS 9.4

Hello SAS community,


I have a few basic questions. I seem to have trouble with some formats.


1) my data for date of birth is in the format mm/yyyy so I wrote:


input @3 DOB monyy6.


I get an "invalid data for DOB in line x" error.


2) my data for paid service is in the format 1,230.00. So I wrote


input @10 comma4.2


I get an "invalid data for Paid_amt in line x" error.


3) Lastly, I was able to remove duplicates by ID but how do I choose which of the duplicates to keep. For example, If a person has 3 visits 04/01/2010, 06/25/2010, 08/21/2010. I would like to keep the first visit which would be 04/01/2010.


Thank you in advance guys!




Super User
Posts: 23,323

Re: Questions for beginner stuff for SAS 9.4

Use an informat, don't specify it on the input line. 


There's some good writeups on how this works across the internet so I'm not going into the various input methods for SAS to read files.



informat dob monyy6. paid_service best32.;
format dob monyy6. paid_service comma12.;

input @3 dob @10 paid_service;


Occasional Contributor
Posts: 12

Re: Questions for beginner stuff for SAS 9.4

I still seem to be getting an error.


This is what I have on my editor


data temp;
infile 'C:\Users\wjeon\Desktop\Masters\Data to William\PEI data/PEItxt.txt' delimiter='09'x dsd missover firstobs=2;
length Patient_ID $20;
informat DOB monyy6. Paid_amt best32. Approved_amt best32.;
format DOB monyy6. Paid_amt comma12. Approved_amt comma12.;
input @1 Patient_ID
@2 Gender $
@3 DOB
@4 Postal_Code $
@5 Region $
@6 Date_of_Service mmddyy10.
@7 Facility_type_id $
@8 ICD_9 $
@9 Specialty $
@10 Paid_amt
@11 Approved_amt
@12 Fee_Code $;



Super User
Posts: 6,637

Re: Questions for beginner stuff for SAS 9.4

You're writing code without understanding what you're writing.  It's not just here and there ... it's throughout your code.  You will need to study a bit on the INPUT statement and what things mean.  Here are just a couple of examples.


When you specify @1, @2, @3, it should not be numbering the fields that you are reading it.  Rather, it indicates the starting column within the line where the software should begin searching for a variable's value.


When you give an informat, those are instructions to SAS on how to read a variable.  For example, when the informat for a variable is "best32." that tells SAS to read 32 characters to find the value for a variable.


There is too much wrong with your program.  You will need to learn more of the basics or there will be too many times that you fix an error but then come back and say, "I still have an error." 

Super User
Posts: 23,323

Re: Questions for beginner stuff for SAS 9.4

How did you remove duplicates, the way to fix it depends on how you accomplished this. 


You can use either a double sort or BY processing, essentially a two step process anyways.


proc sort data= have;
by var1 var2 var3;

proc sort data=have nodupkey out=want1;
by var1 var2;

data want2;
by var1 var2 var3;
if first.var2;


Occasional Contributor
Posts: 12

Re: Questions for beginner stuff for SAS 9.4

I separated duplicates like this:


proc sort data=pei nodupkey
by Patient_ID;


Sorry but could you explain what out=want1 does?

Super User
Posts: 23,323

Re: Questions for beginner stuff for SAS 9.4

If you sort without having an output data set (OUT=) then your sorting 'in place' which means your input dataset gets sorted, and in this case if you remove duplicates then you're modifying your original data set. In general, this isn't good practice because you can overwrite data that you may need later out. 




OUT= SAS-data-set

names the output data set. If SAS-data-set does not exist, then PROC SORT creates it.

Use care when you use PROC SORT without OUT=.

Without the OUT= option, PROC SORT replaces the original data set with the sorted observations when the procedure executes without errors.   [cautionend]

Default: Without OUT=, PROC SORT overwrites the original data set.
Tip: With in-database sorts, the output data set cannot refer to the input table on the DBMS.
Tip: You can use data set options with OUT=.
Featured in: Sorting by the Values of Multiple Variables




Training Module online:

Ask a Question
Discussion stats
  • 6 replies
  • 1 like
  • 3 in conversation