Hi SAS Folks,
I need an efficient solution(robust performance) in reading a very large(over 5 million records) comma separated file conditionally. The CSV file looks like the following:
Id,date, name, address. post_code......etc totally 32 variables.
The condition is that values start from the 2nd observation with firstobs=2 as straight forward reading as one would expect, however I first need to read values of two variables from the header row and retain them across all obs.
The logic in my mind is something like:
data want;
if first obs 1 then infile 'mycsvfile' obs=1;
input var1 informat. var2 informat.
else if 2nd obs to EOF then infile 'same csv file' and input all variables with informats
So in essence var1 and var2 will have to read in the beginning and it's value needs to be retained across all iterations until end of file.
Your help with the right logic would really mean a lot.
Cheers,
Charlotte
Here's one approach:
data want;
infile "mycsvfile" dsd;
if _n_=1 then do;
input var1 informat. var2 informat.;
delete;
run;
retain var1 var2;
input id date name address ...;
run;
I'm assuming you have all the other pieces you would need ... lengths/informats spelled out appropriately.
Good luck.
One way is to set a flag variable:
data want;
informat var1 informat. var2 informat. ;
retain flag . var1 var2;
if flag=. then input var1 var2;
flag=1; /* after the first read;
input <the rest of the stuff>;
drop flag;
run;
Other would be to create two datasets, one only ready line 1 for your var1 and 2 the other starting at obs2 then combine the two
Not sure this is what you are looking for but I do not know how to perform what you are asking while the file is being pulled into SAS. Hopefully someone else comes up with something better.
data have;
infile cards dsd;
input year code bep_org_code bep_pgm_ele_code tran_pgm_ele_code end_date$ ;
cards;
2009,10,45,34,234,null
2010,15,45,34,234,current
2010,15,45,23,234,null
2010,15,45,34,234,current
2010,15,45,23,234,current
2010,15,45,13,234,null
;
run;
data want(rename=(year2=year code2=code));
set have;
retain year2 code2;
if _N_ = 1 then do;
year2 = year;
code2 = code;
end;
if _N_ > 1 then do;
year = year2;
code = code2;
end;
drop year code;
run;
Here's one approach:
data want;
infile "mycsvfile" dsd;
if _n_=1 then do;
input var1 informat. var2 informat.;
delete;
run;
retain var1 var2;
input id date name address ...;
run;
I'm assuming you have all the other pieces you would need ... lengths/informats spelled out appropriately.
Good luck.
Hi , Sorry for the bother, in the code, may i seek a small clarification and tweak that's commented
data want;
infile "mycsvfile" dsd;
if _n_=1 then do;
input var1 informat. var2 informat.;/* If i want the read the value of Var5, how to skip the 3rd and 4th value and read the value of Var5 correctly?Coz my understanding is clear to read var1 and var2 being the first 2 values and a third value to read happens to be the 5th in the buffer*/
delete;
run;
retain var1 var2;
input id date name address ...;
run;
Charlotte,
This should do it:
length dummyvar $ 1;
drop dummyvar;
input var1 informat. var2 informat. dummyvar dummyvar var5 informat.;
The INPUT statement will keep reading till it finds a delimiter (not stopping just because it read one character of DUMMYVAR).
Good luck.
Thank you Sir, I should have guessed it. My apologies for bothering again. Have a nice day
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.