I have a large text file with fixed columns that I need to import to SAS. The same file contains information at three levels: firms, associates and economic variables.
The first character of every observation tells wich level the values in the row are about (1 is about firms, 2 about associates...). According to the layout, each level has different variables of different lenghts.
This is a small example of the data I have to import. The first 4 variable are the same but afterwards each level has its own variables with different lenghts. The variable "type" identifies which level the observation is about. And all the other variables are bunched up in "other" and all the lines end with a "F":
DATA WORK.test;
INFILE DATALINES;
INPUT
type $ 1-1
full_registry $ 2-2
type_update $ 3-3
cnpj $ 4-17
other $ 18-1200
;
DATALINES;
1F 000000000001911BANCO DO BRASIL SA DIRECAO GERAL 022005110300 2038196608016422100QUADRA SAUN QUADRA 5 LOTE B TORRES III E III SN ANDAR 1 A 16 SALA 101 A 1601 ANDAR 1 A 16 SALA 101 A 1601 ANDAR 1 A 16 SALA 101 A 1601 ASA NORTE 70040912DF9701BRASILIA 61 34939002 61 34931040SECEX'BB.COM.BR 10060000000000000500000000000000000N F
2F 000000000001912MARCIO HAMILTON FERREIRA 000***923641**100000020101117 ***000000**CPF INVALIDO 00 F
2F 000000000001912NILSON MARTINIANO MOREIRA 000***491386**100000020101117 ***000000**CPF INVALIDO 00 F
2F 000000000001912WALTER MALIENI JUNIOR 000***718468**100000020101117 ***000000**CPF INVALIDO 00 F
2F 000000000001912PAULO ROGERIO CAFFARELLI 000***887279**160000020101117 ***000000**CPF INVALIDO 00 F
2F 000000000001912ADRIANO MEIRA RICCI 000***550741**100000020120327 ***000000**CPF INVALIDO 00 F
2F 000000000001912CARLOS ALBERTO ARAUJO NETTO 000***415907**100000020120327 ***000000**CPF INVALIDO 00 F
2F 000000000001912JOSE RICARDO **bleep**ONDE FORNI 000***261501**100000020180410 ***000000**CPF INVALIDO 00 F
2F 000000000001912MARCOS RENATO COLTRI 000***653918**100000020180410 ***000000**CPF INVALIDO 00 F
2F 000000000001912CARLA NESI 000***295868**100000020180412 ***000000**CPF INVALIDO 00 F
2F 000000000001912BERNARDO DE AZEVEDO SILVA ROTHE 000***890627**100000020180418 ***000000**CPF INVALIDO 00 F
2F 000000000001912ANTONIO GUSTAVO MATOS DO VALE 000***370266**100000020170614 ***000000**CPF INVALIDO 00 F
6F 00000000000191649999900000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 F
1F 000000000002722BANCO DO BRASIL SA MANAUS (AM) 022005110300 2038197608136421200RUA GUILHERME MOREIRA 315 CENTRO 69005300AM0255MANAUS 10000000000000000500000000000000000 00000000 F
1F 000000000003532BANCO DO BRASIL SA PRESIDENTE VARGAS BELEM (PA) 022005110300 2038196608016422100 AVEN PRESIDENTE VARGAS 248 COMERCIO 66010900PA0427BELEM 10000000000000000500000000000000000 00000000 F
1F 000000000004342BANCO DO BRASIL SA SANTOS - SANTOS (SP) 022005110300 2038196608016422100RUA 15 DE NOVEMBRO 195 CENTRO 11010908SP7071SANTOS 10000000000000000500000000000000000 00000000 F
1F 000000000005152BANCO DO BRASIL SA CAMPOS EST.UNIF. 022005110300 2038196608016421200PRACA DAS 4 JORNADAS 11 CENTRO 28030002RJ5819CAMPOS DOS GOYTACAZES 10000000000000000500000000000000000 00000000 F
1F 000000000006042BANCO DO BRASIL SA SALVADOR (BA) 022005110300 20381966080164221004A AVENIDA CENTRO ADMINISTRATIVO DA BAHIA 600 CENTRO ADMINISTRATIVO DA BAHIA 41745002BA3849SALVADOR 71 3362880371 3362895271 32420844AGE0006'BB.COM.BR 10000000000000000500000000000000000 00000000 F
1F 000000000261582BANCO DO BRASIL SA MACAPA-EST.UNIF. 022005110300 2038196608016421200RUA - INDEPENDENCIA 250 CENTRO 68900090AP0605MACAPA 096 2232155 10000000000000000500000000000000000 00000000 F
1F 000000000262392BANCO DO BRASIL SA VOLTA REDONDA-EST.UNIF. 022005110300 2038196608016421200RUA VEREADOR L F GUIMARAES 12 ATERRADO 27211130RJ5925VOLTA REDONDA 10000000000000000500000000000000000 00000000 F
;;;;
run;
I thought of two possibilities:
I. Read the txt three times with the three layouts for each level. Throw away the the observations that were read wrong. I'm not sure if this will work.
II. Read the first 4 variabels and put the rest of the line on a string variable, like I did in this example. Then use a bunch of substr() to separate the several variables. I'm not sure if this is the smartest way to do this.
Am I'm missing a much simpler way to do this?
I'm using Enterprise version 7.13.
Read the first character and use a line hold (@). After that, you can selectively input the rest of the line, according to what the first character tells you.
Read the first character and use a line hold (@). After that, you can selectively input the rest of the line, according to what the first character tells you.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.