BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Hi SAS Folks,

I need an efficient solution(robust performance) in reading a very large(over 5 million records) comma separated file conditionally.  The CSV file looks like the following:

Id,date, name, address. post_code......etc totally 32 variables.

The condition is that values start from the 2nd observation with firstobs=2 as straight forward reading as one would expect, however I first need to read values of two variables from the header row and retain them across all obs.

The logic in my mind is something like:

data want;

if first obs 1 then infile 'mycsvfile' obs=1;

input var1 informat. var2 informat.

else if 2nd obs to EOF then infile 'same csv file'  and input all variables with informats

So in essence var1 and var2 will have to read in the beginning and it's value needs to be retained across all iterations until end of file. 

Your help with the right logic would really mean a lot.

Cheers,

Charlotte

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Here's one approach:

data want;

infile "mycsvfile" dsd;

if _n_=1 then do;

   input var1 informat. var2 informat.;

   delete;

run;

retain var1 var2;

input id date name address ...;

run;

I'm assuming you have all the other pieces you would need ... lengths/informats spelled out appropriately.

Good luck.

View solution in original post

7 REPLIES 7
ballardw
Super User

One way is to set a flag variable:

data want;

     informat var1 informat. var2 informat. ;

     retain flag . var1 var2;

     if flag=. then input var1 var2;

     flag=1; /* after the first read;

     input <the rest of the stuff>;

     drop flag;

run;

Other would be to create two datasets, one only ready line 1 for your var1 and 2 the other starting at obs2 then combine the two

Steelers_In_DC
Barite | Level 11

Not sure this is what you are looking for but I do not know how to perform what you are asking while the file is being pulled into SAS.  Hopefully someone else comes up with something better.

data have;

infile cards dsd;

input year code bep_org_code   bep_pgm_ele_code  tran_pgm_ele_code   end_date$ ;

cards;

2009,10,45,34,234,null

2010,15,45,34,234,current

2010,15,45,23,234,null

2010,15,45,34,234,current

2010,15,45,23,234,current

2010,15,45,13,234,null

;

run;

data want(rename=(year2=year code2=code));

set have;

retain year2 code2;

if _N_ = 1 then do;

    year2 = year;

    code2 = code;

end;

if _N_ > 1 then do;   

    year = year2;

    code = code2;

end;

drop year code;

run;

Astounding
PROC Star

Here's one approach:

data want;

infile "mycsvfile" dsd;

if _n_=1 then do;

   input var1 informat. var2 informat.;

   delete;

run;

retain var1 var2;

input id date name address ...;

run;

I'm assuming you have all the other pieces you would need ... lengths/informats spelled out appropriately.

Good luck.

CharlotteCain
Quartz | Level 8

Hi , very neat logic. Great! and Thank you!

CharlotteCain
Quartz | Level 8

Hi , Sorry for the bother, in the code, may i seek a small clarification and tweak that's commented

data want;

infile "mycsvfile" dsd;

if _n_=1 then do;

   input var1 informat. var2 informat.;/* If i want the read the value of Var5, how to skip the 3rd and 4th value and read the value of Var5 correctly?Coz my understanding is clear to read var1 and var2 being the first 2 values and a third value to read happens to be the 5th in the buffer*/

   delete;

run;

retain var1 var2;

input id date name address ...;

run;

Astounding
PROC Star

Charlotte,

This should do it:

length dummyvar $ 1;

drop dummyvar;

input var1 informat. var2 informat. dummyvar dummyvar var5 informat.;

The INPUT statement will keep reading till it finds a delimiter (not stopping just because it read one character of DUMMYVAR).

Good luck.

CharlotteCain
Quartz | Level 8

Thank you Sir, I should have guessed it. My apologies for bothering again. Have a nice day

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 735 views
  • 3 likes
  • 4 in conversation