<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: programming help in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/programming-help/m-p/403061#M97922</link>
    <description>&lt;P&gt;Hi Kurt ..&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;transpose is out of the question as there are millions of rows (which would be turned into columns in a transpose) .. i could go on a nice long vacation while the program runs &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; .. can't transpose .. tried it with a small subset of the data (20,000 rows) .. got tired of twiddling my thumbs :))&lt;/P&gt;</description>
    <pubDate>Wed, 11 Oct 2017 08:31:03 GMT</pubDate>
    <dc:creator>jfaruqui</dc:creator>
    <dc:date>2017-10-11T08:31:03Z</dc:date>
    <item>
      <title>programming help</title>
      <link>https://communities.sas.com/t5/SAS-Programming/programming-help/m-p/403056#M97920</link>
      <description>&lt;P&gt;&lt;FONT face="andale mono,times"&gt;so I have a huge dataset(millions of rows) with&amp;nbsp;over 100 variables (and more get added regularly)&amp;nbsp; .. Each variable having a 2 digit value. For example:&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;C1 C2 C3 C4 C5 .. ..&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;10 12 15 55&amp;nbsp;15 .. ..&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;10 10 15 10 12 .. ..&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;10 10 10 10 10 .. ..&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;10 10 12 12 15 .. ..&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;.. .. .. .. .. .. ..&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;.. .. .. .. .. .. ..&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;Now I want to make and populate a result variable which basically read each row and outputs the value that has the most occurrences. For example:&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;C1 C2 C3 C4 C5 .. .. RESULT&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;10 12 15 55&amp;nbsp;15 .. ..&amp;nbsp; &amp;nbsp;15&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;10 10 15 10 12 .. ..&amp;nbsp; &amp;nbsp;10&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;10 10 10 10 10 .. ..&amp;nbsp; &amp;nbsp;10&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;10 10 12 12 15 .. ..&amp;nbsp; &amp;nbsp;12 (higher value in case of a tie)&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;.. .. .. .. .. .. ..&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;.. .. .. .. .. .. ..&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;What would be the most efficient code to write for this allowing for the possibility of more test columns(with values) to be added to the dataset.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="andale mono,times"&gt;Thanks in advance !!!&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Oct 2017 08:18:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/programming-help/m-p/403056#M97920</guid>
      <dc:creator>jfaruqui</dc:creator>
      <dc:date>2017-10-11T08:18:28Z</dc:date>
    </item>
    <item>
      <title>Re: programming help</title>
      <link>https://communities.sas.com/t5/SAS-Programming/programming-help/m-p/403059#M97921</link>
      <description>&lt;P&gt;First step: transpose from a wide to a long format:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;10 12 15 55 15&lt;/PRE&gt;
&lt;P&gt;to&lt;/P&gt;
&lt;PRE&gt;10
12
15
55
15&lt;/PRE&gt;
&lt;P&gt;sort (within your groups that identify original lines)&lt;/P&gt;
&lt;PRE&gt;10
12
15
15
55&lt;/PRE&gt;
&lt;P&gt;Now you can get a count with (eg) proc summary, and retrieve the highest count for each group.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Oct 2017 08:24:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/programming-help/m-p/403059#M97921</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2017-10-11T08:24:07Z</dc:date>
    </item>
    <item>
      <title>Re: programming help</title>
      <link>https://communities.sas.com/t5/SAS-Programming/programming-help/m-p/403061#M97922</link>
      <description>&lt;P&gt;Hi Kurt ..&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;transpose is out of the question as there are millions of rows (which would be turned into columns in a transpose) .. i could go on a nice long vacation while the program runs &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; .. can't transpose .. tried it with a small subset of the data (20,000 rows) .. got tired of twiddling my thumbs :))&lt;/P&gt;</description>
      <pubDate>Wed, 11 Oct 2017 08:31:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/programming-help/m-p/403061#M97922</guid>
      <dc:creator>jfaruqui</dc:creator>
      <dc:date>2017-10-11T08:31:03Z</dc:date>
    </item>
    <item>
      <title>Re: programming help</title>
      <link>https://communities.sas.com/t5/SAS-Programming/programming-help/m-p/403065#M97923</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
	set have;
	array C(*) C:;
	array a(100) a100-a1;

	drop i a:;

	call missing(of a(*));
	do i=1 to dim(C);
		a(100-C(i))+1;
	end;

	idx=100-whichn(max(of a(*)), of a(*));
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 11 Oct 2017 09:08:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/programming-help/m-p/403065#M97923</guid>
      <dc:creator>gamotte</dc:creator>
      <dc:date>2017-10-11T09:08:12Z</dc:date>
    </item>
    <item>
      <title>Re: programming help</title>
      <link>https://communities.sas.com/t5/SAS-Programming/programming-help/m-p/403069#M97926</link>
      <description>&lt;P&gt;Hi Gamotte&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks so much for this awesome solution ..&lt;/P&gt;&lt;P&gt;can you please explain the significance of '100' and the 'call missing' ??&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thanx&lt;/P&gt;</description>
      <pubDate>Wed, 11 Oct 2017 09:23:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/programming-help/m-p/403069#M97926</guid>
      <dc:creator>jfaruqui</dc:creator>
      <dc:date>2017-10-11T09:23:25Z</dc:date>
    </item>
    <item>
      <title>Re: programming help</title>
      <link>https://communities.sas.com/t5/SAS-Programming/programming-help/m-p/403072#M97929</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/98537"&gt;@jfaruqui&lt;/a&gt; wrote:&lt;BR /&gt;
&lt;P&gt;Hi Gamotte&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks so much for this awesome solution ..&lt;/P&gt;
&lt;P&gt;can you please explain the significance of '100' and the 'call missing' ??&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;thanx&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;call missing() initializes the new array, which is necessary, because arrays are automatically retained.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/30622"&gt;@gamotte&lt;/a&gt;'s program needs to know the number of variables beforehand, so that's the reason for using 100. But you can get that value by running a preliminary step:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
set have;
array vars{*} C:;
call symput('nvars',put(dim(vars),best.));
stop;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;so you can use &amp;amp;nvars in the next step in place of a numeric constant.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Oct 2017 09:35:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/programming-help/m-p/403072#M97929</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2017-10-11T09:35:00Z</dc:date>
    </item>
    <item>
      <title>Re: programming help</title>
      <link>https://communities.sas.com/t5/SAS-Programming/programming-help/m-p/403075#M97930</link>
      <description>&lt;P&gt;call missing is used to reinitialize the array for each observation.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The idea is to create an array of size 100 which will store counters for each possible values of&lt;/P&gt;
&lt;P&gt;the C&amp;lt;i&amp;gt; variables. Since you said that they are on two digits, possible values are 00 to 99.&lt;/P&gt;
&lt;P&gt;When C&amp;lt;i&amp;gt;=10 for instance the counter a10 is increased.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Since you added the condition that incase of tie, the maximum shoud be retained and&lt;/P&gt;
&lt;P&gt;because the whichn function returns the first match, I reversed the order of columns :&lt;/P&gt;
&lt;P&gt;a100 to a1 instead of a1 to a100.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Oct 2017 09:40:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/programming-help/m-p/403075#M97930</guid>
      <dc:creator>gamotte</dc:creator>
      <dc:date>2017-10-11T09:40:14Z</dc:date>
    </item>
    <item>
      <title>Re: programming help</title>
      <link>https://communities.sas.com/t5/SAS-Programming/programming-help/m-p/403097#M97938</link>
      <description>&lt;P&gt;A transpose of 20000 recs is lightning fast. If you actually found the time to twiddle your thumbs, you should replace your 20-year old computer with something more modern:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%macro make_have(no_vars,no_recs);
data have;
do id = 1 to &amp;amp;no_recs;
%do i = 1 %to &amp;amp;no_vars;
v_&amp;amp;i = int(rand('uniform')*55);
%end;
output;
end;
run;
%mend;
%make_have(200,20000)

proc transpose data=have out=have1 (rename=(col1=v));
by id;
var v_:;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Log:&lt;/P&gt;
&lt;PRE&gt;24         %macro make_have(no_vars,no_recs);
25         data have;
26         do id = 1 to &amp;amp;no_recs;
27         %do i = 1 %to &amp;amp;no_vars;
28         v_&amp;amp;i = int(rand('uniform')*55);
29         %end;
30         output;
31         end;
32         run;
33         %mend;
34         %make_have(200,20000)

NOTE: The data set WORK.HAVE has 20000 observations and 201 variables.
NOTE: DATA statement used (Total process time):
      real time           1.15 seconds
      cpu time            0.18 seconds
      

35         
36         proc transpose data=have out=have1 (rename=(col1=v));
37         by id;
38         var v_:;
39         run;

NOTE: There were 20000 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.HAVE1 has 4000000 observations and 3 variables.
NOTE: PROZEDUR TRANSPOSE used (Total process time):
      real time           2.10 seconds
      cpu time            0.34 seconds
&lt;/PRE&gt;
&lt;P&gt;Note that's from a 2-core server, something considered very low in computing power for SAS.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Or you made an error in writing your transpose.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My suggested solution looks like this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%macro make_have(no_vars,no_recs);
data have;
do id = 1 to &amp;amp;no_recs;
%do i = 1 %to &amp;amp;no_vars;
c&amp;amp;i = int(rand('uniform')*55);
%end;
output;
end;
run;
%mend;
%make_have(200,1000000)

proc transpose data=have out=have1 (drop=_name_ rename=(col1=c));
by id;
var c:;
run;

proc sort data=have1;
by id c;
run;

data count;
set have1;
by id c;
if first.c
then count = 1;
else count + 1;
if last.c then output;
run;

proc sort data=count;
by id descending count descending c;
run;

data want;
set count;
by id;
if first.id;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The whole test for 200 variables and 1 million records took this:&lt;/P&gt;
&lt;PRE&gt;24         %macro make_have(no_vars,no_recs);
25         data have;
26         do id = 1 to &amp;amp;no_recs;
27         %do i = 1 %to &amp;amp;no_vars;
28         c&amp;amp;i = int(rand('uniform')*55);
29         %end;
30         output;
31         end;
32         run;
33         %mend;
34         %make_have(200,1000000)

NOTE: The data set WORK.HAVE has 1000000 observations and 201 variables.
NOTE: DATA statement used (Total process time):
      real time           19.98 seconds
      cpu time            10.29 seconds
      

35         
36         proc transpose data=have out=have1 (drop=_name_ rename=(col1=c));
37         by id;
38         var c:;
39         run;

NOTE: There were 1000000 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.HAVE1 has 200000000 observations and 2 variables.
NOTE: PROZEDUR TRANSPOSE used (Total process time):
      real time           31.36 seconds
      cpu time            14.90 seconds
      

40         
41         proc sort data=have1;
42         by id c;
43         run;

NOTE: There were 200000000 observations read from the data set WORK.HAVE1.
NOTE: The data set WORK.HAVE1 has 200000000 observations and 2 variables.
NOTE: PROZEDUR SORT used (Total process time):
      real time           1:28.38
      cpu time            51.98 seconds
      

44         
45         data count;
46         set have1;
47         by id c;
48         if first.c
49         then count = 1;
50         else count + 1;
51         if last.c then output;
52         run;

NOTE: There were 200000000 observations read from the data set WORK.HAVE1.
NOTE: The data set WORK.COUNT has 53600561 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           1:09.82
      cpu time            26.76 seconds
      

53         
54         proc sort data=count;
55         by id descending count descending c;
56         run;

NOTE: There were 53600561 observations read from the data set WORK.COUNT.
NOTE: The data set WORK.COUNT has 53600561 observations and 3 variables.
NOTE: PROZEDUR SORT used (Total process time):
      real time           35.41 seconds
      cpu time            16.39 seconds
      

57         
58         data want;
59         set count;
60         by id;
61         if first.id;
62         run;

NOTE: There were 53600561 observations read from the data set WORK.COUNT.
NOTE: The data set WORK.WANT has 1000000 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           14.40 seconds
      cpu time            4.04 seconds
 &lt;/PRE&gt;
&lt;P&gt;At 31 seconds, you can see that the transpose will not pose a problem.&lt;/P&gt;
&lt;P&gt;Alternatively, one can use a proc summary after the transpose:&lt;/P&gt;
&lt;PRE&gt;24         proc summary data=have1 n nway;
25         by id;
26         class c;
27         var c;
28         output out=count (drop=_type_ c rename=(_freq_=count n_c=c)) n=n_c;
29         run;

NOTE: There were 200000000 observations read from the data set WORK.HAVE1.
NOTE: The data set WORK.COUNT has 53600561 observations and 3 variables.
NOTE: PROZEDUR SUMMARY used (Total process time):
      real time           1:27.70
      cpu time            48.06 seconds
&lt;/PRE&gt;
&lt;P&gt;so you get a runtime of ~ 4 minutes for 1 million records with 200 variables&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note that the program (especially the datastep/sort version) makes no assumptions for&lt;/P&gt;
&lt;P&gt;- number of variables&lt;/P&gt;
&lt;P&gt;- number of possible values&lt;/P&gt;
&lt;P&gt;- variable type&lt;/P&gt;
&lt;P&gt;and is therefore completely data-driven.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Oct 2017 11:27:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/programming-help/m-p/403097#M97938</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2017-10-11T11:27:07Z</dc:date>
    </item>
    <item>
      <title>Re: programming help</title>
      <link>https://communities.sas.com/t5/SAS-Programming/programming-help/m-p/403185#M97964</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/98537"&gt;@jfaruqui&lt;/a&gt; wrote:&lt;BR /&gt;
&lt;P&gt;&lt;FONT face="andale mono,times"&gt;so I have a huge dataset(millions of rows) with&amp;nbsp;over 100 variables (and more get added regularly)&amp;nbsp; .. &lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="andale mono,times"&gt;What would be the most efficient code to write for this allowing for the possibility of more test columns(with values) to be added to the dataset.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="andale mono,times"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Almost any process that continually adds variables is poorly designed. Having to change code because of constantly added variables that are essentially the same as current ones means that the advantage of computers doing the routine and tedious work is partially negated.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Typically this approach seems to come from a spreadsheet oriented solution that is then "automated with manual changes" to account for poor data structures. There is a large body of work related to normalizing data for automated processing that may be of interest.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Oct 2017 14:29:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/programming-help/m-p/403185#M97964</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2017-10-11T14:29:54Z</dc:date>
    </item>
    <item>
      <title>Re: programming help</title>
      <link>https://communities.sas.com/t5/SAS-Programming/programming-help/m-p/403196#M97966</link>
      <description>&lt;P&gt;How about this one ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input C1 C2 C3 C4 C5;
cards;
10 12 15 55 15
10 10 15 10 12
10 10 10 10 10
10 10 12 12 15 
;
run;
data want;
 if _n_=1 then do;
   declare hash h(ordered:'a');
   declare hiter hi('h');
   h.definekey('k');
   h.definedata('k','count');
   h.definedone();
 end;
set have;
array x{*} c1-c5;
do i=1 to dim(x);
 k=x{i};
 rc=h.find();
 if rc=0 then do;count=count+1;h.replace();end;
  else do;count=1;h.add();end;
end;
do while(hi.next()=0);
  if count&amp;gt;=max then do;max=count;RESULT=k; end;
end;
h.clear();
drop i k rc max count;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;P.S. If you can use SAS/IML, that could fast you a lot more and save you a lot code.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Oct 2017 14:53:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/programming-help/m-p/403196#M97966</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2017-10-11T14:53:57Z</dc:date>
    </item>
  </channel>
</rss>

