Hi all,
I'm working on a data step to import a large file from the Centers for Medicare and Medicaid that contains NPI information in a delimited file (csv).
This NPPES file has many sets of repeating fields that are numbered from 1 to N.
For example there are four fields with the number 1, followed by the next four fields with the number 2, etc to 50:
* Other Provider ID 1
* Other Provider ID Type 1
* Other Provider ID State
* Other Provider ID Issuer 1
Then there are other sets of numbered fields with different N's.
I know that I can format these cells pretty concisely, for example:
format other_prov_id_1 - other_prov_id_50 $20.
other_prov_id_type_1 - other_prov_id_type_50 $2.
other_prov_id_state_1 - other_prov_id_state_50 $2.
other_prov_id_issuer_1 - other_prov_id_issuer_50 $80.
;
But what about the input statement?
Is there a more efficient way (maybe with arrays - do they work with infiles?) to input this large number of fields other than:
input other_prov_id_1
other_prov_id_type_1
other_prov_id_state_1
other_prov_id_issuer_1
other_prov_id_2
other_prov_id_type_2
other_prov_id_state_2
other_prov_id_issuer_2
other_prov_id_3
etc.
;
I'm using SAS EG, but I don't think that should matter with a base SAS step...
Thanks so much for your help!
Barb
Two years too late, I know, but others searching may find it helpful to know that the good folks at the National Bureau of Economic Research have already done the heavy lifting (AND keep things updated) by converting the NPPES NPI file csv file into a SAS file: www.nber.org/data/npi.html
The dash or double-dash shortcuts are for one variable only as you saw when you wote your format statement.
For your need, you must use a small macro:
%macro read;
%local i;
%do =1 %to 50;
OTHER_PROV_ID_&i.
OTHER_PROV_ID_TYPE_&i.
OTHER_PROV_ID_STATE_&i.
OTHER_PROV_ID_ISSUER_&i.
%end;
%mend;
data WANT;
infile 'xxxxxxxxxxx';
input %read;
run;
Thank you so much, ChrisNZ!
Barb
Yes, this is a symptom of a badly structured file. Think how simple your import step would be simply having a normalised data structure such as:
ID TYPE STATE ISSUER
...
Then you would simply set those few variables, and if needed you could transpose them up. I would suspect bad data strucuture is responsible for at least half the questions we get on here.
Next, what does your data import specifications say, if they are detailed (as they should be), then it would be a matter of copy paste from there and then add some syntax.
A final option, run proc import on the file - this will guess your data from the file, but then you can copy that code and alter it as needed.
Thanks so much for your ideas.
Unfortunately the layout is in a pdf file, and it doesn't copy/paste in a way that would make it easy to use (sigh).
And there is no header record in the data file (it is in a separate file by itself) so a proc import won't do it either (sigh, sigh).
Have a good rest of the week!
Barb
Oh, so you dont even have a datafile, but are attempting to import a rendered PDF file. Thats bad. I would really advise you to go back to the actual data behind the report, it is quite difficult or time consuming to get information from things like PDF.
See if someone in your organization has a PDF editor program such as Adobe Acrobat Pro and have them convert the layout document to RTF or some word processing file format . Often the layout sections become Word tables which are bit more amenable to copy and paste. You may need to do some editing as PDF to wordprocessor will often have a table that crosses page boundaries export as two or more individual tables that are actually one in concept.
If they have full descriptions then you may also be able to use copy and paste to get useable lables.
WARNING: Watch out for malformed variable names. I've deal with several government file layout documents and there will sometimes be imbedded blanks that vary from document to document and so you think you are dealing with VAriableXYZ but some times they have VAriable XYZ for the same thing (or not which is possibly worse).
Hi, Barb,
I was wondering if you were able to successfully create SAS dataset/s out of the NPPES file? I also use SAS EG, and there are team members who are curious about the file as well, but we're having a problem with the original *.csv being so large and no place to store it. Do you have a functioning dataset?
Two years too late, I know, but others searching may find it helpful to know that the good folks at the National Bureau of Economic Research have already done the heavy lifting (AND keep things updated) by converting the NPPES NPI file csv file into a SAS file: www.nber.org/data/npi.html
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.