- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It's trial and error to get the import acting like expected.
I use truncover, missover and dsd like it was a lotery ticket. But it's because it never does what I want it to do.
When I explicitly declare ";" as a delimiter, why the execution does not interpret them in that way?
Why it ignores blank columns instead of writing missing to the corresponding columns?
Why yymmdd10. does not work for the date columns?
The log says:
FILENAME REFFILE '/caslibs/risknb/IFRS9_CONTRATOS_202008.TXT' encoding=wlatin1;
data public.test ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile REFFILE delimiter = ';' truncover MISSOVER lrecl=32767 firstobs=2 n=1000;
input CODAAEST CODMMEST CODEMPRE CODOPERA $char12. temp1 temp2 CODIGIF $char10. FECFORMO $CHAR10. FECVECI $CHAR20.
TIPOCLIE $32. RATICLI 8 XEMPLEAD $32. CODPRODU $32. ESTVEHIC $32. PCTIEFEC 8 PCTOMI 8 IMPCAPE 8
IMPSALVE 8 FECMATRI $32. FECREVEH $32. IMPVRESI 8 FECPRIAC $32. XIDPOCI $32. XIDFALL $32. XDEFAULT $32.
XIDSUBJ $32. FECDEFAU $32. IMPPDDEF 8 IMPVCDEF 8 XISTAGE 8 IMPDOTAC 8 QDIAATI 8 QDIASMAD 8
QMESESDF 8 PCTPD 8 PCTLGD 8 REMTBOOK 8 TIMEHORI 8 QAAPDTDF 8 IMPEXPDF 8 IMPDOTCU 8 IMPDOTDU
8 IMPDOTVR 8 FECDATOS $32. USOFUTUR 8 CODSEGPD 8 CODSELGD 8 ;
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try running next code, derived from yours with next changes:
1) omit all informat length, just assign $ for char=type variables;
one exception for dates to be read with :yymmdd10. informat;
2) Add a LENGTH statement to assign the max length of char=type variables;
3) %let is a declarative statement and not a datastep statemnt.
better move it before the datastep.
4) add a FORMAT statement to dates;
FILENAME REFFILE '/caslibs/risknb/IFRS9_CONTRATOS_202008.TXT' encoding=wlatin1;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
data public.test ;
LENGTH <all char-type variables and their max length> ;
INFILE REFFILE delimiter = ';' truncover MISSOVER lrecl=32767 firstobs=2 n=1000;
INPUT CODAAEST CODMMEST CODEMPRE CODOPERA $ temp1 temp2 CODIGIF $ FECFORMO $
FECVECI TIPOCLIE $ RATICLI XEMPLEAD $ CODPRODU $ ESTVEHIC $ PCTIEFEC PCTOMI
IMPCAPE IMPSALVE FECMATRI $ FECREVEH $ IMPVRESI FECPRIAC $ XIDPOCI $ XIDFALL $
XDEFAULT $ XIDSUBJ $ FECDEFAU $ IMPPDDEF IMPVCDEF XISTAGE IMPDOTAC QDIAATI
QDIASMAD QMESESDF PCTPD PCTLGD REMTBOOK TIMEHORI QAAPDTDF MPEXPDF IMPDOTCU
IMPDOTDU IMPDOTVR FECDATOS $ USOFUTUR CODSEGPD CODSELGD ;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
When you have informats as part of the input statement the way you have those then the number of characters in the informat is read. Period. Which means some of the delimiters end up in the data when the actual values are shorter or missing.
Try adding a : in front of all the $ formats.
Or have a separate INFORMAT statement with the character variables and remove the informats from the INPUT statement.
Is there some specific reason you want the date values to be character? Any manipulation of the dates will require creating a date value. You could likely save yourself some headaches later by reading the dates with yymmdd10. informat and then assign any desired format so people understand them.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you ballardw.
But it's getting only slightly better, the proceeding : before the char format helps.
But when I try to define the informat everything turns chaotic again.
here are the last 2 versions.
data public.test ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile REFFILE delimiter = ';' truncover dsd lrecl=32767 firstobs=2 n=1000;
input CODAAEST CODMMEST CODEMPRE CODOPERA :$char12. temp1 temp2 CODIGIF :$char10. FECFORMO FECVECI
TIPOCLIE :$32. RATICLI 12 XEMPLEAD :$32. CODPRODU :$32. ESTVEHIC :$32. PCTIEFEC 12 PCTOMI 12 IMPCAPE 12
IMPSALVE 12 FECMATRI :$32. FECREVEH :$32. IMPVRESI 12 FECPRIAC :$32. XIDPOCI :$32. XIDFALL :$32. XDEFAULT :$32.
XIDSUBJ :$32. FECDEFAU :$32. IMPPDDEF 12 IMPVCDEF 12 XISTAGE 12 IMPDOTAC 12 QDIAATI 12 QDIASMAD 12
QMESESDF 12 PCTPD 12 PCTLGD 12 REMTBOOK 12 TIMEHORI 12 QAAPDTDF 12 IMPEXPDF 12 IMPDOTCU 12 IMPDOTDU
12 IMPDOTVR 12 FECDATOS USOFUTUR 12 CODSEGPD 12 CODSELGD 12 ;
informat FECFORMO FECVECI fecdatos yymmdd10.;
run;
data public.test ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile REFFILE delimiter = ';' truncover dsd lrecl=32767 firstobs=2 n=1000;
input CODAAEST CODAAEST CODMMEST CODEMPRE CODOPERA temp1 temp2 CODIGIF FECFORMO FECVECI TIPOCLIE RATICLI XEMPLEAD CODPRODU ESTVEHIC PCTIEFEC
PCTOMI IMPCAPE IMPSALVE FECMATRI FECREVEH IMPVRESI FECPRIAC XIDPOCI XIDFALL XDEFAULT XIDSUBJ FECDEFAU IMPPDDEF IMPVCDEF XISTAGE
IMPDOTAC QDIAATI QDIASMAD QMESESDF PCTPD PCTLGD REMTBOOK TIMEHORI QAAPDTDF IMPEXPDF IMPDOTCU IMPDOTDU IMPDOTVR FECDATOS USOFUTUR
CODSEGPD CODSELGD;
informat CODOPERA $12. CODIGIF $10. XIDPOCI XIDFALL XDEFAULT XIDSUBJ $1.;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@acordes wrote:
Thank you ballardw.
But it's getting only slightly better, the proceeding : before the char format helps.
But when I try to define the informat everything turns chaotic again.
here are the last 2 versions.
data public.test ; %let _EFIERR_ = 0; /* set the ERROR detection macro variable */ infile REFFILE delimiter = ';' truncover dsd lrecl=32767 firstobs=2 n=1000; input CODAAEST CODMMEST CODEMPRE CODOPERA :$char12. temp1 temp2 CODIGIF :$char10. FECFORMO FECVECI TIPOCLIE :$32. RATICLI 12 XEMPLEAD :$32. CODPRODU :$32. ESTVEHIC :$32. PCTIEFEC 12 PCTOMI 12 IMPCAPE 12 IMPSALVE 12 FECMATRI :$32. FECREVEH :$32. IMPVRESI 12 FECPRIAC :$32. XIDPOCI :$32. XIDFALL :$32. XDEFAULT :$32. XIDSUBJ :$32. FECDEFAU :$32. IMPPDDEF 12 IMPVCDEF 12 XISTAGE 12 IMPDOTAC 12 QDIAATI 12 QDIASMAD 12 QMESESDF 12 PCTPD 12 PCTLGD 12 REMTBOOK 12 TIMEHORI 12 QAAPDTDF 12 IMPEXPDF 12 IMPDOTCU 12 IMPDOTDU 12 IMPDOTVR 12 FECDATOS USOFUTUR 12 CODSEGPD 12 CODSELGD 12 ; informat FECFORMO FECVECI fecdatos yymmdd10.; run; data public.test ; %let _EFIERR_ = 0; /* set the ERROR detection macro variable */ infile REFFILE delimiter = ';' truncover dsd lrecl=32767 firstobs=2 n=1000; input CODAAEST CODAAEST CODMMEST CODEMPRE CODOPERA temp1 temp2 CODIGIF FECFORMO FECVECI TIPOCLIE RATICLI XEMPLEAD CODPRODU ESTVEHIC PCTIEFEC PCTOMI IMPCAPE IMPSALVE FECMATRI FECREVEH IMPVRESI FECPRIAC XIDPOCI XIDFALL XDEFAULT XIDSUBJ FECDEFAU IMPPDDEF IMPVCDEF XISTAGE IMPDOTAC QDIAATI QDIASMAD QMESESDF PCTPD PCTLGD REMTBOOK TIMEHORI QAAPDTDF IMPEXPDF IMPDOTCU IMPDOTDU IMPDOTVR FECDATOS USOFUTUR CODSEGPD CODSELGD; informat CODOPERA $12. CODIGIF $10. XIDPOCI XIDFALL XDEFAULT XIDSUBJ $1.; run;
Try placing the INFORMAT prior to the Input statement. Your second data step is missing a lot of variables from the first one, many of the character variables. So if these are supposed to read the same file then you are reading columns into the wrong variables.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
See this example:
infile REFFILE
delimiter = ';'
dsd /* this option causes two delimiters being read as a missing value */
truncover
lrecl=32767
firstobs=2
;
input
CODAAEST :$7. /* read codes as character */
CODMMEST :$2.
CODEMPRE :$2.
CODOPERA :$char12.
temp1
temp2
CODIGIF :$char10.
FECFORMO :yymmdd10.
FECVECI :yymmdd10.
;
format
FECFORMO yymmdd10.
FECVECI yymmdd10.
;
add other variables in the same manner as required.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
nearly...
works fine with this code, but commax format does not behave well for the second variable, the first is applied correctly.
and the informat for the date does nothing.
later I'll try your approach
data public.test ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile REFFILE delimiter = ';' lrecl=32767 firstobs=2 n=1000;
input CODAAEST CODMMEST CODEMPRE CODOPERA :$char12. temp1 temp2 CODIGIF :$char10. FECFORMO FECVECI
TIPOCLIE :$1. RATICLI XEMPLEAD :$1. CODPRODU :$2. ESTVEHIC :$1. PCTIEFEC commax8.5 PCTOMI commax8.5 IMPCAPE
IMPSALVE FECMATRI FECREVEH IMPVRESI FECPRIAC XIDPOCI :$1. XIDFALL :$1. XDEFAULT :$1.
XIDSUBJ :$1. FECDEFAU IMPPDDEF IMPVCDEF XISTAGE IMPDOTAC QDIAATI QDIASMAD
QMESESDF PCTPD PCTLGD REMTBOOK TIMEHORI QAAPDTDF IMPEXPDF IMPDOTCU IMPDOTDU
IMPDOTVR FECDATOS USOFUTUR CODSEGPD CODSELGD ;
informat FECFORMO FECVECI FECMATRI FECREVEH FECPRIAC FECDEFAU fecdatos yymmdd10.;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I gave you an example for nicely structured, easily readable code. Please stop writing such ugly spaghetti code.
Don't use fractions in informats when commas are present to mark the decimal dot. Your informat will override the comma in the data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Okay.
But the problem remains even with your nicely structured code.
What format or informat reads correctly those variables where I wrongly apply the commax format?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@acordes wrote:
Okay.
But the problem remains even with your nicely structured code.
What format or informat reads correctly those variables where I wrongly apply the commax format?
Would you please take the time and read my post in its entirety? I already gave you the answer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Let's focus on this part of the INPUT statement, where you are still not satisfied with the results:
PCTIEFEC commax8.5 PCTOMI commax8.5
As was mentioned, you don't need to be using commax8.5. Let the raw data indicate where the decimal point should go. So the first change would be:
PCTIEFEC commax8. PCTOMI commax8.
Next, I would assume you are happy with the value assigned to PCTIEFEC, but need to fix the value assigned to PCTOMI. To do that, recognize that the problem is that the first character read for PCTOMI is the semicolon between the variables. That won't be properly interpreted (generating your log message about invalid data for PCTOMI), unless you add a colon. Do it for both variables:
PCTIEFEC : commax8. PCTOMI : commax8.
That change should take care of those two variables, and would give you a roadmap to follow if others are causing trouble.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try running next code, derived from yours with next changes:
1) omit all informat length, just assign $ for char=type variables;
one exception for dates to be read with :yymmdd10. informat;
2) Add a LENGTH statement to assign the max length of char=type variables;
3) %let is a declarative statement and not a datastep statemnt.
better move it before the datastep.
4) add a FORMAT statement to dates;
FILENAME REFFILE '/caslibs/risknb/IFRS9_CONTRATOS_202008.TXT' encoding=wlatin1;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
data public.test ;
LENGTH <all char-type variables and their max length> ;
INFILE REFFILE delimiter = ';' truncover MISSOVER lrecl=32767 firstobs=2 n=1000;
INPUT CODAAEST CODMMEST CODEMPRE CODOPERA $ temp1 temp2 CODIGIF $ FECFORMO $
FECVECI TIPOCLIE $ RATICLI XEMPLEAD $ CODPRODU $ ESTVEHIC $ PCTIEFEC PCTOMI
IMPCAPE IMPSALVE FECMATRI $ FECREVEH $ IMPVRESI FECPRIAC $ XIDPOCI $ XIDFALL $
XDEFAULT $ XIDSUBJ $ FECDEFAU $ IMPPDDEF IMPVCDEF XISTAGE IMPDOTAC QDIAATI
QDIASMAD QMESESDF PCTPD PCTLGD REMTBOOK TIMEHORI QAAPDTDF MPEXPDF IMPDOTCU
IMPDOTDU IMPDOTVR FECDATOS $ USOFUTUR CODSEGPD CODSELGD ;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks to @ballardw for the hint with the : and the informat.
Thanks to @Astounding for the commax explanation.
Thanks to @Shmuel for the final tips & tricks that let me approach the final solution.
and thanks to @Kurt_Bremser for his patience and for the incentive he gives me to keep learning more.
The final code is as follows, like an ensemble.
FILENAME REFFILE '/caslibs/risknb/IFRS9_CONTRATOS_202008.TXT' encoding=wlatin1;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
data public.test ;
LENGTH CODOPERA $12 CODIGIF $10 XEMPLEAD $1 CODPRODU $2 ESTVEHIC $1 XIDPOCI $1 XIDFALL $1 XDEFAULT $1
XIDSUBJ $1;
informat FECFORMO FECVECI FECMATRI FECREVEH FECPRIAC FECDEFAU fecdatos yymmdd10. ;
format FECFORMO FECVECI FECMATRI FECREVEH FECPRIAC FECDEFAU fecdatos date9. PCTIEFEC PCTOMI PCTPD PCTLGD percent9.2;
INFILE REFFILE delimiter = ';' truncover MISSOVER lrecl=32767 firstobs=2 n=1000;
INPUT CODAAEST CODMMEST CODEMPRE CODOPERA $ temp1 temp2 CODIGIF $ FECFORMO $
FECVECI TIPOCLIE $ RATICLI XEMPLEAD $ CODPRODU $ ESTVEHIC $ PCTIEFEC : commax8. PCTOMI : commax8.
IMPCAPE IMPSALVE FECMATRI $ FECREVEH $ IMPVRESI FECPRIAC $ XIDPOCI $ XIDFALL $
XDEFAULT $ XIDSUBJ $ FECDEFAU $ IMPPDDEF IMPVCDEF XISTAGE IMPDOTAC QDIAATI
QDIASMAD QMESESDF PCTPD : commax8. PCTLGD : commax8. REMTBOOK TIMEHORI QAAPDTDF MPEXPDF IMPDOTCU
IMPDOTDU IMPDOTVR FECDATOS $ USOFUTUR CODSEGPD CODSELGD ;
array div100(4) PCTIEFEC PCTOMI PCTPD PCTLGD;
do i=1 to dim(div100);
div100(i)=div100(i)/100;
end;
drop temp: i;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It looks a lot like you have mixed up the syntax of the LENGTH statement and the INPUT statement.
If this a LENGTH statement then just want to use $nnn for the character variables. No need for the period, lengths are always integer values and are not format specifications. So you cannot use an informat like $CHAR in a LENGTH statement.
If it is an INPUT statement you do not want to read all of those numeric variables as the single digit in column 8. You should just remove the 8 and not have an informat for those variables. And for the variables where you do want to include an informat specification you want to add the colon modifier in front to prevent the informat from reading past the delimiters.
It is much clearer and easier if you first define the variables with a LENGTH statement. Then the INPUT statement just needs to list the variables. So you are not forcing the INPUT statement to do double duty. If any of the variables need to be read with a special informat (and most numbers or character strings do NOT need special input instructions) then add an INFORMAT statement. If any of the variables need special instructions for how to display them (and most numbers and character strings do NOT need special output instructions) then add a FORMAT statement.
So perhaps your data step will look something like this.
data public.test ;
infile refile dsd dlm= ';' truncover firstobs=2 ;
length CODAAEST CODMMEST CODEMPRE 8
CODOPERA $12
temp1 temp2 CODIGIF FECFORMO $10
FECVECI $20 TIPOCLIE $32 RATICLI 8
XEMPLEAD CODPRODU ESTVEHIC $32
PCTIEFEC PCTOMI IMPCAPE IMPSALVE 8
FECMATRI FECREVEH $32
IMPVRESI 8
FECPRIAC XIDPOCI XIDFALL XDEFAULT XIDSUBJ FECDEFAU $32
IMPPDDEF IMPVCDEF XISTAGE IMPDOTAC QDIAATI QDIASMAD 8
QMESESDF PCTPD PCTLGD REMTBOOK TIMEHORI QAAPDTDF IMPEXPDF 8
IMPDOTCU IMPDOTDU IMPDOTVR 8
FECDATOS $32
USOFUTUR CODSEGPD CODSELGD 8
;
input CODAAEST -- CODSELGD ;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content