11-02-2016 06:07 PM
I'm working on a data step to import a large file from the Centers for Medicare and Medicaid that contains NPI information in a delimited file (csv).
This NPPES file has many sets of repeating fields that are numbered from 1 to N.
For example there are four fields with the number 1, followed by the next four fields with the number 2, etc to 50:
* Other Provider ID 1
* Other Provider ID Type 1
* Other Provider ID State
* Other Provider ID Issuer 1
Then there are other sets of numbered fields with different N's.
I know that I can format these cells pretty concisely, for example:
format other_prov_id_1 - other_prov_id_50 $20.
other_prov_id_type_1 - other_prov_id_type_50 $2.
other_prov_id_state_1 - other_prov_id_state_50 $2.
other_prov_id_issuer_1 - other_prov_id_issuer_50 $80.
But what about the input statement?
Is there a more efficient way (maybe with arrays - do they work with infiles?) to input this large number of fields other than:
I'm using SAS EG, but I don't think that should matter with a base SAS step...
Thanks so much for your help!
11-02-2016 06:27 PM
The dash or double-dash shortcuts are for one variable only as you saw when you wote your format statement.
For your need, you must use a small macro:
%macro read; %local i; %do =1 %to 50; OTHER_PROV_ID_&i. OTHER_PROV_ID_TYPE_&i. OTHER_PROV_ID_STATE_&i. OTHER_PROV_ID_ISSUER_&i. %end; %mend; data WANT; infile 'xxxxxxxxxxx'; input %read; run;
11-03-2016 05:26 AM
Yes, this is a symptom of a badly structured file. Think how simple your import step would be simply having a normalised data structure such as:
ID TYPE STATE ISSUER
Then you would simply set those few variables, and if needed you could transpose them up. I would suspect bad data strucuture is responsible for at least half the questions we get on here.
Next, what does your data import specifications say, if they are detailed (as they should be), then it would be a matter of copy paste from there and then add some syntax.
A final option, run proc import on the file - this will guess your data from the file, but then you can copy that code and alter it as needed.
11-03-2016 09:20 AM
Thanks so much for your ideas.
Unfortunately the layout is in a pdf file, and it doesn't copy/paste in a way that would make it easy to use (sigh).
And there is no header record in the data file (it is in a separate file by itself) so a proc import won't do it either (sigh, sigh).
Have a good rest of the week!
11-03-2016 09:57 AM
Oh, so you dont even have a datafile, but are attempting to import a rendered PDF file. Thats bad. I would really advise you to go back to the actual data behind the report, it is quite difficult or time consuming to get information from things like PDF.
11-03-2016 11:33 AM
See if someone in your organization has a PDF editor program such as Adobe Acrobat Pro and have them convert the layout document to RTF or some word processing file format . Often the layout sections become Word tables which are bit more amenable to copy and paste. You may need to do some editing as PDF to wordprocessor will often have a table that crosses page boundaries export as two or more individual tables that are actually one in concept.
If they have full descriptions then you may also be able to use copy and paste to get useable lables.
WARNING: Watch out for malformed variable names. I've deal with several government file layout documents and there will sometimes be imbedded blanks that vary from document to document and so you think you are dealing with VAriableXYZ but some times they have VAriable XYZ for the same thing (or not which is possibly worse).
11-07-2016 12:30 PM
01-12-2017 02:14 PM
I was wondering if you were able to successfully create SAS dataset/s out of the NPPES file? I also use SAS EG, and there are team members who are curious about the file as well, but we're having a problem with the original *.csv being so large and no place to store it. Do you have a functioning dataset?