Reading a CDF file in SAS

Reply
N/A
Posts: 1

Reading a CDF file in SAS

Hello SAS users,

I need help with reading a CDF file.

file is structured as defined -- variable names are stored as values.

Input file:--

ID1,Trait2

ID2,Trait1,Trait2,Trait3

ID3,Trait2,Trait3,Trait5

ID4,Trait5

ID5,Trait1

Output should look like:-

ID Trait1 Trait2 Trait3 Trait4 Trait5

1    0        1       0      0        0

2    1        1       1      0        0

3    0        1       1      0        1

4    0        0       0      0        1

5    1        0       0      0        0

I have a code (below) in place now. But that takes a significant amount of time(8 hours) to process 2.5mil rows and 50 traits. Is there a better way?

data test;

     infile "inputfile.txt" delimiter = ',' missover dsd lrecl=32767 ;

     array traitsx (50) Trait1 - Trait50  ;

     input @;

     ID = scan(_infile_, 1 );

     lenx = length(scan(_infile_, 1 )) + 2;

     _infile_ = substr(_infile_, lenx);

     ntraits = countw(_infile_, ',', 'mo');

     do cntr = 1 to ntraits ;

          do var_cntr = 1 to 50;

               if strip(scan(_infile_, cntr )) = vname(traitsx{var_cntr}) then do;

                    traitsx{var_cntr} = 1;

                    var_cntr = 51;

               end;

          end;

     end;

     input;

     drop cntr var_cntr ntraits lenx ;

run;

Thanks for your time and help.

Suresh

Super User
Posts: 19,063

Re: Reading a CDF file in SAS

How long does it take to read the file in, without the manipulation?

Respected Advisor
Posts: 3,786

Re: Reading a CDF file in SAS

Maybe...

If you could use missing to mean 0 you can leave out the initialization of array Trait.   I don't have any metrics on how doing the SET compares to say CALL POKE with regards to performance.

data init;
   array Trait[5];
   retain Trait 0;
  
run;
data cdf;
   infile cards dlmstr=',Trait' dsd missover;
  
input id :$8. i @;
   Array Trait[5];
   point=1;
  
set init point=point;
   do while(not missing(i));
      Trait=1;
     
input i @;
      end;
  
drop i;
   cards;
ID1,Trait2
ID2,Trait1,Trait2,Trait3
ID3,Trait2,Trait3,Trait5
ID4,Trait5
ID5,Trait1
;;;;
   run;
proc print;
  
run;

2-3-2015 2-52-45 PM.png


Message was edited by: data _null_

Trusted Advisor
Posts: 1,301

Re: Reading a CDF file in SAS

data cdf;

   infile cards dlmstr=',Trait' dsd missover;

   input @3 id : i @;

   array trait[5];

   call pokelong("%sysfunc(repeat(%sysfunc(putn(0, rb8.), hex16.), 4))"x, addrlong(trait[1]), %eval(8*5));

   do until(missing(i));

      trait=1;

      input i @;

      end;

   drop i;

   cards4;

ID1,Trait2

ID2,Trait1,Trait2,Trait3

ID3,Trait2,Trait3,Trait5

ID4,Trait5

ID5,Trait1

;;;;

Since CALL POKE...LONG was mentioned.

PROC Star
Posts: 7,432

Re: Reading a CDF file in SAS

: Just FYI .. Using pokelong took more than 4 times longer to run than using data_null's approach.

Trusted Advisor
Posts: 1,301

Re: Reading a CDF file in SAS

That difference seems rather large to me.  I gave it a test, out of curiosity, and I come up with a difference of maybe a few tenths of a second on 2.5 million rows and 50 traits and across several runs they average out to pretty much even and switched between which ran ever so slightly faster than the other.  I wonder why you are seeing such a large differential in run times.

I use both of these methods regularly.  Typically I will use the method presented by DN when I have a array to initialize to non-repeated values (or to initialize the variable metadata in addition to the values) and will use POKELONG when initializing to a single value, as in this case.  For large arrays I will typically load the 'init' table to memory with sasfile (can't say I know whether it actually makes any difference though)

Respected Advisor
Posts: 3,786

Re: Reading a CDF file in SAS

According to my "calculations" is correct the methods are very similar with POKE edging out SET slightly in this run.

34         options generic=0 ps=max fullstimer=0;
35         filename FT77F001 "%sysfunc(pathname(work))/t.txt" recfm=v lrecl=32767;
36         options generic=1;
37         data _null_;
38            file FT77F001 dlmstr=',Trait';
39            do i=1 to 2500000;
40               put 'ID' i @;
41               numout=int(50*ranuni(8889)+1);
42               do j=numout to 50 by 3;
43                  put j @;
44                  end;
45                  put;
46               end;
47            run;

NOTE:
The file FT77F001 is:
      (system-specific pathname),
      (system-specific file attributes)

NOTE:
2500000 records were written to the file (system-specific pathname).
      The minimum record length was
12.
      The maximum record length was
142.
NOTE: DATA statement used (Total process time):
      real time          
7.65 seconds
      cpu time           
7.66 seconds
     

48            quit;
49         data _null_;
50            infile FT77F001 obs=10;
51            input;
52            put _infile_;
53            run;

NOTE:
The infile FT77F001 is:
      (system-specific pathname),
      (system-specific file attributes)

ID1,Trait7,Trait10,Trait13,Trait16,Trait19,Trait22,Trait25,Trait28,Trait31,Trait34,Trait37,Trait40,Trait43,Trait46,Trait49
ID2,Trait6,Trait9,Trait12,Trait15,Trait18,Trait21,Trait24,Trait27,Trait30,Trait33,Trait36,Trait39,Trait42,Trait45,Trait48
ID3,Trait2,Trait5,Trait8,Trait11,Trait14,Trait17,Trait20,Trait23,Trait26,Trait29,Trait32,Trait35,Trait38,Trait41,Trait44,Trait47,Tra
it50
ID4,Trait40,Trait43,Trait46,Trait49
ID5,Trait7,Trait10,Trait13,Trait16,Trait19,Trait22,Trait25,Trait28,Trait31,Trait34,Trait37,Trait40,Trait43,Trait46,Trait49
ID6,Trait46,Trait49
ID7,Trait28,Trait31,Trait34,Trait37,Trait40,Trait43,Trait46,Trait49
ID8,Trait11,Trait14,Trait17,Trait20,Trait23,Trait26,Trait29,Trait32,Trait35,Trait38,Trait41,Trait44,Trait47,Trait50
ID9,Trait31,Trait34,Trait37,Trait40,Trait43,Trait46,Trait49
ID10,Trait36,Trait39,Trait42,Trait45,Trait48
NOTE:
10 records were read from the infile (system-specific pathname).
      The minimum record length was
19.
      The maximum record length was
136.
NOTE: DATA statement used (Total process time):
      real time          
0.00 seconds
      cpu time           
0.00 seconds
     

54         data init;
55            array Trait[50];
56            retain Trait 0;
57            run;

NOTE:
The data set WORK.INIT has 1 observations and 50 variables.
NOTE: DATA statement used (Total process time):
      real time          
0.00 seconds
      cpu time           
0.00 seconds
     

58         options fullstimer=1;
59         data _null_;
60            infile FT77F001 dlmstr=',Trait' dsd missover eof=eof;
61            length id $12;
62            Array Trait[50];
63            retain point 1;
64            do while(1);
65               input id :$12. i @;
66               set init point=point;
67               do while(not missing(i));
68                  Trait=1;
69                  input i @;
70                  end;
71               *output;
72               input;
73               end;
74         eof: stop;
75            drop i;
76            run;

NOTE:
The infile FT77F001 is:
      (system-specific pathname),
      (system-specific file attributes)

NOTE:
2500000 records were read from the infile (system-specific pathname).
      The minimum record length was
12.
      The maximum record length was
142.
NOTE: DATA statement used (Total process time):
      real time          
7.56 seconds
      user cpu time      
7.28 seconds
      system cpu time    
0.29 seconds
      memory             
461.90k
      OS Memory          
17192.00k
      Timestamp          
02/04/2015 03:47:21 AM
      Page Faults                      
0
      Page Reclaims                    
0
      Page Swaps                       
0
      Voluntary Context Switches       
68
      Involuntary Context Switches     
56
      Block Input Operations           
0
      Block Output Operations          
1
     

77         data _null_;
78            infile FT77F001 dlmstr=',Trait' dsd missover eof=eof;
79            length id $12;
80            Array Trait[50];
81            addr = addrlong(trait[1]);
82            length poke $%sysevalf(8*50,integer);
83            poke = repeat(put(0,rb8.),49);
84            do while(1);
85               input id :$12. i @;
86               call pokelong(poke,addr,%sysevalf(8*50,integer),4);
87               do while(not missing(i));
88                  Trait=1;
89                  input i @;
90                  end;
91               *output;
92               input;
93               end;
94          eof:
95            stop;
96            drop i addr poke;
97            run;

NOTE:
The infile FT77F001 is:
      (system-specific pathname),
      (system-specific file attributes)

NOTE:
2500000 records were read from the infile (system-specific pathname).
      The minimum record length was
12.
      The maximum record length was
142.
NOTE: DATA statement used (Total process time):
      real time          
6.46 seconds
      user cpu time      
6.22 seconds
      system cpu time    
0.24 seconds
      memory             
373.53k
      OS Memory          
17192.00k
      Timestamp          
02/04/2015 03:47:28 AM
      Page Faults                      
0
      Page Reclaims                    
0
      Page Swaps                       
0
      Voluntary Context Switches       
7
      Involuntary Context Switches     
49
      Block Input Operations           
0
      Block Output Operations          
0
     

Super User
Posts: 9,865

Re: Reading a CDF file in SAS

data cdf;
   infile cards  dsd truncover; 
   input id : $20. i : $20. @;
   id=compress(id,,'kd');
   retain v 1;
   do until(missing(i));
    output;
     input i : $20. @;
   end;
   cards; 
ID1,Trait2
ID2,Trait1,Trait2,Trait3
ID3,Trait2,Trait3,Trait5
ID4,Trait5
ID5,Trait1
;;;;
   run; 
proc transpose data=cdf out=temp(drop=_NAME_);
 by id;
 var v;
 id i;
run;
proc stdize data=temp out=temp1 missing=0 reponly;run;
proc sql;
 select name into : list separated by ' '
  from dictionary.columns
   where libname='WORK' and memname='TEMP1' 
    order by input(compress(name,,'kd'),best8.);
quit;
data want;
 retain &list ;
 set temp1;
run;

Xia Keshan

Ask a Question
Discussion stats
  • 7 replies
  • 270 views
  • 0 likes
  • 6 in conversation