02-09-2016 08:30 PM
Hello SAS community,
I have a few basic questions. I seem to have trouble with some formats.
1) my data for date of birth is in the format mm/yyyy so I wrote:
input @3 DOB monyy6.
I get an "invalid data for DOB in line x" error.
2) my data for paid service is in the format 1,230.00. So I wrote
input @10 comma4.2
I get an "invalid data for Paid_amt in line x" error.
3) Lastly, I was able to remove duplicates by ID but how do I choose which of the duplicates to keep. For example, If a person has 3 visits 04/01/2010, 06/25/2010, 08/21/2010. I would like to keep the first visit which would be 04/01/2010.
Thank you in advance guys!
02-09-2016 08:44 PM
Use an informat, don't specify it on the input line.
There's some good writeups on how this works across the internet so I'm not going into the various input methods for SAS to read files.
informat dob monyy6. paid_service best32.; format dob monyy6. paid_service comma12.; input @3 dob @10 paid_service; run;
02-09-2016 09:29 PM
I still seem to be getting an error.
This is what I have on my editor
infile 'C:\Users\wjeon\Desktop\Masters\Data to William\PEI data/PEItxt.txt' delimiter='09'x dsd missover firstobs=2;
length Patient_ID $20;
informat DOB monyy6. Paid_amt best32. Approved_amt best32.;
format DOB monyy6. Paid_amt comma12. Approved_amt comma12.;
input @1 Patient_ID
@2 Gender $
@4 Postal_Code $
@5 Region $
@6 Date_of_Service mmddyy10.
@7 Facility_type_id $
@8 ICD_9 $
@9 Specialty $
@12 Fee_Code $;
02-09-2016 10:37 PM
You're writing code without understanding what you're writing. It's not just here and there ... it's throughout your code. You will need to study a bit on the INPUT statement and what things mean. Here are just a couple of examples.
When you specify @1, @2, @3, it should not be numbering the fields that you are reading it. Rather, it indicates the starting column within the line where the software should begin searching for a variable's value.
When you give an informat, those are instructions to SAS on how to read a variable. For example, when the informat for a variable is "best32." that tells SAS to read 32 characters to find the value for a variable.
There is too much wrong with your program. You will need to learn more of the basics or there will be too many times that you fix an error but then come back and say, "I still have an error."
02-09-2016 08:46 PM
How did you remove duplicates, the way to fix it depends on how you accomplished this.
You can use either a double sort or BY processing, essentially a two step process anyways.
proc sort data= have; by var1 var2 var3; run; proc sort data=have nodupkey out=want1; by var1 var2; run; data want2; by var1 var2 var3; if first.var2; run;
02-09-2016 09:37 PM
I separated duplicates like this:
proc sort data=pei nodupkey
Sorry but could you explain what out=want1 does?
02-09-2016 11:29 PM
If you sort without having an output data set (OUT=) then your sorting 'in place' which means your input dataset gets sorted, and in this case if you remove duplicates then you're modifying your original data set. In general, this isn't good practice because you can overwrite data that you may need later out.
Without the OUT= option, PROC SORT replaces the original data set with the sorted observations when the procedure executes without errors.
|Default:||Without OUT=, PROC SORT overwrites the original data set.|
|Tip:||With in-database sorts, the output data set cannot refer to the input table on the DBMS.|
|Tip:||You can use data set options with OUT=.|
|Featured in:||Sorting by the Values of Multiple Variables|
Training Module online: