I have a data set which looks like this:
|
|
|
|
|
|
|
|
|
|
ID |
TIME |
CMT |
Y1 |
Y2 |
CP1 |
CP2 |
DV |
MDVX |
COUNT |
1.00E+00 |
0.00E+00 |
1.00E+00 |
0.00E+00 |
0.00E+00 |
0.00E+00 |
0.00E+00 |
0.00E+00 |
1.00E+00 |
1.00E+00 |
Whenever the variable count changes the data looks like this with a row of literals (i.e., row 2 below) redefining the columns.
1.40E+01 |
2.40E+01 |
2.40E+01 |
2.30E+00 |
1.90E-02 |
9.18E-01 |
2.00E-02 |
0.00E+00 |
0.00E+00 |
1.00E+00 |
3.02E+03 |
2.09E+02 |
1.54E+03 |
1.66E+01 |
1.03E+02 |
1.11E+00 |
1.58E+02 |
3.45E+00 |
1.21E+00 |
0.00E+00 |
0.00E+00 |
0.00E+00 |
0.00E+00 |
0.00E+00 |
1.21E+00 |
0.00E+00 |
0.00E+00 |
ID |
TIME |
CMT |
Y1 |
Y2 |
CP1 |
CP2 |
DV |
MDVX |
COUNT |
RHYD |
RHYL |
RMETD |
RMETL |
CDLV |
CLLV |
CVD |
CVL |
CPREDI |
CRESI |
CWRESI |
CRES |
CWRES |
DV |
PRED |
RES |
WRES |
1.00E+00 |
0.00E+00 |
1.00E+00 |
0.00E+00 |
0.00E+00 |
0.00E+00 |
0.00E+00 |
0.00E+00 |
1.00E+00 |
2.00E+00 |
0.00E+00 |
0.00E+00 |
0.00E+00 |
0.00E+00 |
0.00E+00 |
0.00E+00 |
0.00E+00 |
0.00E+00 |
0.00E+00 |
0.00E+00 |
0.00E+00 |
0.00E+00 |
0.00E+00 |
0.00E+00 |
0.00E+00 |
0.00E+00 |
0.00E+00 |
When I run the following code:
data one;
infile '/folders/myfolders/partialAUC/run20fit.csv/' dsd dlm=',' truncover firstobs=2;
INPUT ID TIME CMT Y1 Y2 CP1 CP2 DV ?? MDVX ?? COUNT;
if ID='ID' then delete;
run;
I get this error in the log:
NOTE: Invalid numeric data, 'ID' , at line 77 column 9.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
2 1.00E+00,0.00E+00,1.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,1.00E+00,1.00E+00,0.00E+00,0
101 .00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00,0.
201 00E+00,0.00E+00,0.00E+00,0.00E+00,0.00E+00 242
ID=1 TIME=0 CMT=1 Y1=0 Y2=0 CP1=0 CP2=0 DV=0 MDVX=1 COUNT=1 _ERROR_=1 _N_=1
NOTE: Invalid numeric data, 'ID' , at line 77 column 9.
This repeats several times.
How can I edit my code to have the program either skip or not read the line with the id time cmt etc each time it appears
whenever the count changes from n to n+1 etc?
Sounds like you have as input a text file, not a dataset. Please post an example set of lines from the file. Use the INSERT CODE button to get a pop-up where you can paste the lines of text so they are not treated as paragraphs. To see what the file looks like make sure to open it with a text editor and not a spreadsheet program, don't use Excel.
Are you saying that meaning of the fields change as you go through the file?
Or just that some of the lines have only some of the fields?
If it is the later then it pretty easy to deal with and your approach should work. If the rows with headers always start with ID then you simple test will work, but you need to make the test before trying to read the rest of the line. No need to skip the first header line as the test will skip that one also. The trailing @ on the INPUT statement tells SAS to keep the line around for more INPUT statements. The @1 says to move the pointer back to the start of the line so that the later INPUT can read the real ID value.
data one;
infile '/folders/myfolders/partialAUC/run20fit.csv/' dsd truncover ;
input check $ @1 @ ;
if check='ID' then delete;
drop check;
INPUT ID TIME CMT Y1 Y2 CP1 CP2 DV MDVX COUNT
RHYD RHYL RMETD RMETL CDLV CLLV CVD CVL CPREDI
CRESI CWRESI CRES CWRES DV PRED RES WRES
;
run;
Sounds like you have as input a text file, not a dataset. Please post an example set of lines from the file. Use the INSERT CODE button to get a pop-up where you can paste the lines of text so they are not treated as paragraphs. To see what the file looks like make sure to open it with a text editor and not a spreadsheet program, don't use Excel.
Are you saying that meaning of the fields change as you go through the file?
Or just that some of the lines have only some of the fields?
If it is the later then it pretty easy to deal with and your approach should work. If the rows with headers always start with ID then you simple test will work, but you need to make the test before trying to read the rest of the line. No need to skip the first header line as the test will skip that one also. The trailing @ on the INPUT statement tells SAS to keep the line around for more INPUT statements. The @1 says to move the pointer back to the start of the line so that the later INPUT can read the real ID value.
data one;
infile '/folders/myfolders/partialAUC/run20fit.csv/' dsd truncover ;
input check $ @1 @ ;
if check='ID' then delete;
drop check;
INPUT ID TIME CMT Y1 Y2 CP1 CP2 DV MDVX COUNT
RHYD RHYL RMETD RMETL CDLV CLLV CVD CVL CPREDI
CRESI CWRESI CRES CWRES DV PRED RES WRES
;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.