DATA Step, Macro, Functions and more

How do I efficiently import a file with many repeated & numbered sets of data?

Reply
Contributor
Posts: 29

How do I efficiently import a file with many repeated & numbered sets of data?

Hi all,

 

I'm working on a data step to import a large file from the Centers for Medicare and Medicaid that contains NPI information in a delimited file (csv).

 

This NPPES file has many sets of repeating fields that are numbered from 1 to N.

 

For example there are four fields with the number 1, followed by the next four fields with the number 2, etc to 50:

 

  * Other Provider ID 1

  * Other Provider ID Type 1

  * Other Provider ID State

  * Other Provider ID Issuer 1

 

Then there are other sets of numbered fields with different N's.

 

I know that I can format these cells pretty concisely, for example:  

 

format  other_prov_id_1 - other_prov_id_50 $20.

            other_prov_id_type_1 - other_prov_id_type_50 $2.

            other_prov_id_state_1 - other_prov_id_state_50 $2.

            other_prov_id_issuer_1 - other_prov_id_issuer_50 $80.

;

 

But what about the input statement?

Is there a more efficient way (maybe with arrays - do they work with infiles?) to input this large number of fields other than:

 

input  other_prov_id_1

          other_prov_id_type_1

          other_prov_id_state_1

          other_prov_id_issuer_1

          other_prov_id_2

          other_prov_id_type_2

          other_prov_id_state_2

          other_prov_id_issuer_2

          other_prov_id_3

             etc.

;

 

 I'm using SAS EG, but I don't think that should matter with a base SAS step...

Thanks so much for your help!

 

Barb

PROC Star
Posts: 1,567

Re: How do I efficiently import a file with many repeated & numbered sets of data?

The dash or double-dash shortcuts are for one variable only as you saw when you wote your format statement.

For your need, you must use a small macro:

 

%macro read;
  %local i;
  %do =1 %to 50;
    OTHER_PROV_ID_&i.
    OTHER_PROV_ID_TYPE_&i.
    OTHER_PROV_ID_STATE_&i.
    OTHER_PROV_ID_ISSUER_&i.
  %end;
%mend;

data WANT;
  infile 'xxxxxxxxxxx';
  input %read;
run;

 

Contributor
Posts: 29

Re: How do I efficiently import a file with many repeated & numbered sets of data?

Thank you so much, ChrisNZ!

 

Barb

Super User
Super User
Posts: 7,430

Re: How do I efficiently import a file with many repeated & numbered sets of data?

Yes, this is a symptom of a badly structured file.  Think how simple your import step would be simply having a normalised data structure such as:

ID    TYPE   STATE   ISSUER

...

 

Then you would simply set those few variables, and if needed you could transpose them up.  I would suspect bad data strucuture is responsible for at least half the questions we get on here.  

 

Next, what does your data import specifications say, if they are detailed (as they should be), then it would be a matter of copy paste from there and then add some syntax.

 

A final option, run proc import on the file - this will guess your data from the file, but then you can copy that code and alter it as needed.

Contributor
Posts: 29

Re: How do I efficiently import a file with many repeated & numbered sets of data?

Thanks so much for your ideas.

 

Unfortunately the layout is in a pdf file, and it doesn't copy/paste in a way that would make it easy to use (sigh).

And there is no header record in the data file (it is in a separate file by itself) so a proc import won't do it either (sigh, sigh).

 

Have a good rest of the week!

 

Barb

 

Super User
Super User
Posts: 7,430

Re: How do I efficiently import a file with many repeated & numbered sets of data?

Oh, so you dont even have a datafile, but are attempting to import a rendered PDF file.  Thats bad.  I would really advise you to go back to the actual data behind the report, it is quite difficult or time consuming to get information from things like PDF.

Super User
Posts: 10,550

Re: How do I efficiently import a file with many repeated & numbered sets of data?

See if someone in your organization has a PDF editor program such as Adobe Acrobat Pro and have them convert the layout document to RTF or some word processing file format . Often the layout sections become Word tables which are bit more amenable to copy and paste. You may need to do some editing as PDF to wordprocessor will often have a table that crosses page boundaries export as two or more individual tables that are actually one in concept.

If they have full descriptions then you may also be able to use copy and paste to get useable lables.

 

WARNING: Watch out for malformed variable names. I've deal with several government file layout documents and there will sometimes be imbedded blanks that vary from document to document and so you think you are dealing with VAriableXYZ but some times they have VAriable XYZ for the same thing (or not which is possibly worse).

Contributor
Posts: 29

Re: How do I efficiently import a file with many repeated

Thank you so much for the ideas!

Barb
------------------------------------------------------------------------------
CONFIDENTIALITY NOTICE: If you have received this email in error,
please immediately notify the sender by e-mail at the address shown.
This email transmission may contain confidential information. This
information is intended only for the use of the individual(s) or entity to
whom it is intended even if addressed incorrectly. Please delete it from
your files if you are not the intended recipient. Thank you for your
compliance. Copyright (c) 2016 Cigna
==============================================================================
Regular Learner
Posts: 1

Re: How do I efficiently import a file with many repeated

Hi, Barb,

 

I was wondering if you were able to successfully create SAS dataset/s out of the NPPES file?  I also use SAS EG, and there are team members who are curious about the file as well, but we're having a problem with the original *.csv being so large and no place to store it.  Do you have a functioning dataset?

Ask a Question
Discussion stats
  • 8 replies
  • 347 views
  • 0 likes
  • 5 in conversation