BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BRKS
Quartz | Level 8

Hi all,

 

I'm working on a data step to import a large file from the Centers for Medicare and Medicaid that contains NPI information in a delimited file (csv).

 

This NPPES file has many sets of repeating fields that are numbered from 1 to N.

 

For example there are four fields with the number 1, followed by the next four fields with the number 2, etc to 50:

 

  * Other Provider ID 1

  * Other Provider ID Type 1

  * Other Provider ID State

  * Other Provider ID Issuer 1

 

Then there are other sets of numbered fields with different N's.

 

I know that I can format these cells pretty concisely, for example:  

 

format  other_prov_id_1 - other_prov_id_50 $20.

            other_prov_id_type_1 - other_prov_id_type_50 $2.

            other_prov_id_state_1 - other_prov_id_state_50 $2.

            other_prov_id_issuer_1 - other_prov_id_issuer_50 $80.

;

 

But what about the input statement?

Is there a more efficient way (maybe with arrays - do they work with infiles?) to input this large number of fields other than:

 

input  other_prov_id_1

          other_prov_id_type_1

          other_prov_id_state_1

          other_prov_id_issuer_1

          other_prov_id_2

          other_prov_id_type_2

          other_prov_id_state_2

          other_prov_id_issuer_2

          other_prov_id_3

             etc.

;

 

 I'm using SAS EG, but I don't think that should matter with a base SAS step...

Thanks so much for your help!

 

Barb

1 ACCEPTED SOLUTION

Accepted Solutions
carlhenriksen
Calcite | Level 5

Two years too late, I know, but others searching may find it helpful to know that the good folks at the National Bureau of Economic Research have already done the heavy lifting (AND keep things updated) by converting the NPPES NPI file csv file into a SAS file: www.nber.org/data/npi.html

View solution in original post

10 REPLIES 10
ChrisNZ
Tourmaline | Level 20

The dash or double-dash shortcuts are for one variable only as you saw when you wote your format statement.

For your need, you must use a small macro:

 

%macro read;
  %local i;
  %do =1 %to 50;
    OTHER_PROV_ID_&i.
    OTHER_PROV_ID_TYPE_&i.
    OTHER_PROV_ID_STATE_&i.
    OTHER_PROV_ID_ISSUER_&i.
  %end;
%mend;

data WANT;
  infile 'xxxxxxxxxxx';
  input %read;
run;

 

BRKS
Quartz | Level 8

Thank you so much, ChrisNZ!

 

Barb

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, this is a symptom of a badly structured file.  Think how simple your import step would be simply having a normalised data structure such as:

ID    TYPE   STATE   ISSUER

...

 

Then you would simply set those few variables, and if needed you could transpose them up.  I would suspect bad data strucuture is responsible for at least half the questions we get on here.  

 

Next, what does your data import specifications say, if they are detailed (as they should be), then it would be a matter of copy paste from there and then add some syntax.

 

A final option, run proc import on the file - this will guess your data from the file, but then you can copy that code and alter it as needed.

BRKS
Quartz | Level 8

Thanks so much for your ideas.

 

Unfortunately the layout is in a pdf file, and it doesn't copy/paste in a way that would make it easy to use (sigh).

And there is no header record in the data file (it is in a separate file by itself) so a proc import won't do it either (sigh, sigh).

 

Have a good rest of the week!

 

Barb

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Oh, so you dont even have a datafile, but are attempting to import a rendered PDF file.  Thats bad.  I would really advise you to go back to the actual data behind the report, it is quite difficult or time consuming to get information from things like PDF.

ballardw
Super User

See if someone in your organization has a PDF editor program such as Adobe Acrobat Pro and have them convert the layout document to RTF or some word processing file format . Often the layout sections become Word tables which are bit more amenable to copy and paste. You may need to do some editing as PDF to wordprocessor will often have a table that crosses page boundaries export as two or more individual tables that are actually one in concept.

If they have full descriptions then you may also be able to use copy and paste to get useable lables.

 

WARNING: Watch out for malformed variable names. I've deal with several government file layout documents and there will sometimes be imbedded blanks that vary from document to document and so you think you are dealing with VAriableXYZ but some times they have VAriable XYZ for the same thing (or not which is possibly worse).

BRKS
Quartz | Level 8
Thank you so much for the ideas!

Barb
------------------------------------------------------------------------------
CONFIDENTIALITY NOTICE: If you have received this email in error,
please immediately notify the sender by e-mail at the address shown.
This email transmission may contain confidential information. This
information is intended only for the use of the individual(s) or entity to
whom it is intended even if addressed incorrectly. Please delete it from
your files if you are not the intended recipient. Thank you for your
compliance. Copyright (c) 2016 Cigna
==============================================================================
armosqueda
Calcite | Level 5

Hi, Barb,

 

I was wondering if you were able to successfully create SAS dataset/s out of the NPPES file?  I also use SAS EG, and there are team members who are curious about the file as well, but we're having a problem with the original *.csv being so large and no place to store it.  Do you have a functioning dataset?

carlhenriksen
Calcite | Level 5

Two years too late, I know, but others searching may find it helpful to know that the good folks at the National Bureau of Economic Research have already done the heavy lifting (AND keep things updated) by converting the NPPES NPI file csv file into a SAS file: www.nber.org/data/npi.html

BRKS
Quartz | Level 8
Thanks so much for posting this!
🙂

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1926 views
  • 1 like
  • 6 in conversation