BookmarkSubscribeRSS Feed
Arsenio_Staer
Calcite | Level 5

Hey guys,

I have a quick question. I extract and process a lot of .xls and .csv data with strange data structure so usually do all data management in R then export it to SAS to run the SQL and data analysis. Decided to give it a run in SAS though.

I extract in .csv format the following sample structure:

AA UN Equity




AIG UN Equity



DatePX_LASTPX_LOWPX_HIGHVOLUME DatePX_LASTPX_LOWPX_HIGHVOLUME
#NAME?40.468840.187541.78132705600 #NAME?1160.92741150.4561200.0206151718
1/4/200040.656340.187540.90634112400 1/4/20001101.58961098.09911140.6827340523
1/5/20004340.543.255844600 1/5/20001103.68381080.64681116.2495248678
1/6/200042.437542.406343.18758595200 1/6/20001139.28661100.19341157.437261000
1/7/200042.62542.281343.43757026800 1/7/20001223.05761153.24841227.9443311985

And want to transform into (upper row is var names or columns names) :

Ticker DatePX_LASTPX_LOWPX_HIGHVOLUME
AA UN Equity 40.468840.187541.78132705600
AA UN Equity1/4/200040.656340.187540.90634112400
AA UN Equity1/5/20004340.543.255844600
AA UN Equity1/6/200042.437542.406343.18758595200
AA UN Equity1/7/200042.62542.281343.43757026800
AIG UN Equity 1160.92741150.4561200.0206151718
AIG UN Equity1/4/20001101.58961098.09911140.6827340523
AIG UN Equity1/5/20001103.68381080.64681116.2495248678
AIG UN Equity1/6/20001139.28661100.19341157.437261000
AIG UN Equity1/7/20001223.05761153.24841227.9443311985

TickerDatePX_LASTPX_LOWPX_HIGHVOLUME
AA UN Equity
40.468840.187541.78132705600
AA UN Equity1/4/200040.656340.187540.90634112400
AA UN Equity1/5/20004340.543.255844600
AA UN Equity1/6/200042.437542.406343.18758595200
AA UN Equity1/7/200042.62542.281343.43757026800
AIG UN Equity
1160.9271150.4561200.021151718
AIG UN Equity1/4/20001101.591098.0991140.683340523
AIG UN Equity1/5/20001103.6841080.6471116.25248678
AIG UN Equity1/6/20001139.2871100.1931157.437261000
AIG UN Equity1/7/20001223.0581153.2481227.944311985

Series can be very long, a lot of tickers pero .csv and number of variables per ticker may vary so the most flexibility ( parametrization of the whole process) would be great!!

Would really appreciate any hint or help,

Thanks,

Arsenio

p.s. the new posting window layout is pretty good!

10 REPLIES 10
RD2
Fluorite | Level 6 RD2
Fluorite | Level 6

You can use  proc import datafile= "C:\FOLDER\*.csv" to import data first.

Then process the data in SAS by reomoved unwanted row and rename the variable.

Good luck.

Arsenio_Staer
Calcite | Level 5

Randy,

The main problem for me is operating on the data once it's in SAS, I'm used to indexing matrices in R, and see no way on how to perform that kind of manipulation.

Arsenio

art297
Opal | Level 21

I think you might be able to use something like the following to accomplish the task fairly easily.  I've included some comments in the code to explain the suggested process:

/* Identify the path where your csv files exist */

%let path=c:\art\test\;

/* pipe all of the filenames into indata */

filename indata pipe "dir &path.*.csv /b";

/* Build a dataset containing the filenames and accomplish the imports */

data filenames (keep=fil2write);

  length fil2write $50;

  length fil2read $80;

  infile indata truncover;

  informat f2r $50.;

  input f2r &;

  fil2write=tranwrd(translate(strip(f2r),'_',' '),".csv","");

  fil2read="&path."||f2r;

  call execute (

         "PROC IMPORT OUT= WORK."||fil2write

            ||'DATAFILE="'||strip(fil2read)||'" '

            ||"DBMS=CSV REPLACE;

            GETNAMES=YES;

            DATAROW=2;

          RUN;");

run;

/* Create a macro variable to later be used in a datastep set statement */

proc sql noprint;

  select fil2write||" (rename=(date=datein))"

    into :fnames separated by " "

      from filenames

  ;

quit;

/* Create the desired file correct the date field at the same time */

data want (drop=datein);

  format ticker $50.;

  format date date9.;

  set &fnames. indsname=dsn;

  ticker=scan(dsn,2,".");

  date=input(datein, ?? mmddyy8.);

run;

FriedEgg
SAS Employee

filename out1 '/temp/tmp1.csv';

filename out2 '/temp/tmp2.csv';

data s1;

infile cards dsd dlm='09'x;

input (date px_last px_low px_high volume) (:$20.);

file out1 dsd dlm=',';

  put (date px_last px_low px_high volume) (:$20.);

cards;

#NAME?          40.4688          40.1875          41.7813          2705600

1/4/2000          40.6563          40.1875          40.9063          4112400

1/5/2000          43          40.5          43.25          5844600

1/6/2000          42.4375          42.4063          43.1875          8595200

1/7/2000          42.625          42.2813          43.4375          7026800

;

run;

data s2;

infile cards dsd dlm='09'x;

input (date px_last px_low px_high volume) (:$20.);

file out2 dsd dlm=',';

  put (date px_last px_low px_high volume) (:$20.);

cards;

#NAME?          1160.9274          1150.456          1200.0206          151718

1/4/2000          1101.5896          1098.0991          1140.6827          340523

1/5/2000          1103.6838          1080.6468          1116.2495          248678

1/6/2000          1139.2866          1100.1934          1157.437          261000

1/7/2000          1223.0576          1153.2484          1227.9443          311985

;

run;

/* this is the part that matters to your situation */

filename in ('/temp/tmp1.csv' '/temp/tmp2.csv');

data have;

length _file $256;

infile in dsd dlm=',' filename=_file;

input date :??mmddyy8. (px_last px_low px_high) (:8.4) volume :7.;

ticker=scan(scan(strip(_file),-1,'/'),1,'.');

run;

/* end */

x 'rm -rf /temp/tmp*.csv';

Arsenio_Staer
Calcite | Level 5

Hey FriedEgg,

Thanks for the reply! I tried your code and work.s1 from the first piece of code comes out as :

datepx_lastpx_lowpx_highvolume

#NAME?          40.4
1/4/2000          40
1/5/2000          43
1/6/2000          42
1/7/2000          42

With this printed in the logs:

23           put (date px_last px_low px_high volume) (:$20.);

24        

25         cards;

NOTE: The file OUT1 is:

      Filename=C:\temp\tmp1.csv,

      RECFM=V,LRECL=256,File Size (bytes)=0,

      Last Modified=04Oct2011:15:22:38,

      Create Time=04Oct2011:15:13:22

NOTE: LOST CARD.

RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+

37         ;

date=  px_last=  px_low=  px_high=  volume=  _ERROR_=1 _N_=3

NOTE: 2 records were written to the file OUT1.

      The minimum record length was 44.

      The maximum record length was 64.

I haven't been able to convert this to the table structure in the original message. I think it's a problem on my side.

FriedEgg
SAS Employee

Make sure the delimiter is actually a tab.

Also that part of my code was only to recreate your example data files.

This is the part that is actually revelant to your situation:

filename in ('/temp/*.csv');

 

data have;

length _file $256;

infile in dsd dlm=',' filename=_file;

input date :??mmddyy8. (px_last px_low px_high) (:8.4) volume :7.;

ticker=scan(scan(strip(_file),-1,'/'),1,'.');

run;

Arsenio_Staer
Calcite | Level 5

RandyDai, art297, FriedEgg,

Thanks for a lot for the info! Will go over it and get back to you guys.

Ksharp
Super User

How about:

data temp;
infile datalines expandtabs dlm=' ' truncover;
if _n_ in (1 2) then do;
                     count+1;input _ticker & $10. @;output;
                     if _n_ eq 1 then do;count+1;input _ticker & $10.  ;output;end;
                         else do;count+1;input _ticker & $10.  /;output;end;
                          end;

else do;
                  count+1;
                     input  (Date PX_LAST     PX_LOW     PX_HIGH     VOLUME) (: $10.)@;
                     output;
                 count+1;
                     input  (Date PX_LAST     PX_LOW     PX_HIGH     VOLUME) (: $10.);
                     output;
                     end;

datalines;
AA UN                                      AIG UN 
Equity                                        Equity     
Date     PX_LAST     PX_LOW     PX_HIGH     VOLUME          Date     PX_LAST     PX_LOW     PX_HIGH     VOLUME
#NAME?     40.4688     40.1875     41.7813     2705600          #NAME?     1160.9274     1150.456     1200.0206     151718
1/4/2000     40.6563     40.1875     40.9063     4112400          1/4/2000     1101.5896     1098.0991     1140.6827     340523
1/5/2000     43     40.5     43.25     5844600          1/5/2000     1103.6838     1080.6468     1116.2495     248678
1/6/2000     42.4375     42.4063     43.1875     8595200          1/6/2000     1139.2866     1100.1934     1157.437     261000
1/7/2000     42.625     42.2813     43.4375     7026800          1/7/2000     1223.0576     1153.2484     1227.9443     311985
;
run;
data temp;
  set temp;
  if mod(count,2)=1 then flag=0;
   else flag=1;
run;
proc sort data=temp;by flag count;run;
data want(drop=count _ticker flag);
length ticker $ 20;
retain ticker;
 set temp;
 by flag;
 if first.flag then  call missing(ticker);
 if not missing(_ticker) then do;
      ticker=catx(' ',ticker,_ticker);delete;
                              end;
run;

Ksharp

Arsenio_Staer
Calcite | Level 5

Hey Ksharp,

Been a long time since my rolling correlation question! Anyway, thanks for the reply! i have been testing that code.

So the work.temp comes out as:

count_tickerDatePX_LASTPX_LOWPX_HIGHVOLUMEflag
1AA UN




0
3Equity




0
5
#NAME?40.468840.187541.781327056000
7
1/4/200040.656340.187540.906341124000
9
1/5/20004340.543.2558446000
11
1/6/200042.437542.406343.187585952000
13
1/7/200042.62542.281343.437570268000
2AIG UN




1
4Equity




1
6
#NAME?1160.92741150.4561200.02061517181

This is an  already shifted and stacked version of the original table, right? Nice reminder on retain!

From there your code works great. Fills the ticker with the ticker name for the corresponding observations. Any way to convert it to that already stacked form you started with?

Thanks!

Arsenio

Ksharp
Super User

Oh. I remember that question about correlation coefficient.

Yes. This is an  already shifted and stacked version of the original table. However not completed.So I need the following code to complete it.

But I do not understand what you mean about the last sentence.

"Any way to convert it to that already stacked form you started with?"

I only add another variable FLAG to prepare for the next process.

Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1082 views
  • 0 likes
  • 5 in conversation