<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Using HASH to add sequence number to unsorted dataset in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Using-HASH-to-add-sequence-number-to-unsorted-dataset/m-p/630817#M186794</link>
    <description>&lt;P&gt;I assume that the big dataset has many more columns than shown in the example, and that is why it takes so long to sort.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In that case I would suggest something like this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data extract;
  set have(keep=USUBJID PARCAT1N PARCAT2N PARACMD AVALC);
  obsno=_N_;
run;

proc sort data=extract;
  by USUBJID PARCAT1N PARCAT2N PARACMD AVALC;
run;

data counts(keep=obsno count);
  set extract;
  by USUBJID;
  if first.USUBJID then count=1;
  else count+1;
run;

proc sort data=counts;
  by obsno;
run;

data want;
  set have;
  set counts(drop=obsno);
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Tue, 10 Mar 2020 08:38:16 GMT</pubDate>
    <dc:creator>s_lassen</dc:creator>
    <dc:date>2020-03-10T08:38:16Z</dc:date>
    <item>
      <title>Using HASH to add sequence number to unsorted dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-HASH-to-add-sequence-number-to-unsorted-dataset/m-p/630812#M186790</link>
      <description>&lt;P&gt;Hello everybody!&lt;/P&gt;&lt;P&gt;I am a frequent reader and fan of the community and have now a problem where I can't find a previous solution on the board or other publications.&lt;/P&gt;&lt;P&gt;This issue is that I am currently getting into HASH objects due to the sheer size of datasets I have to work with. So far HASH has made my life much easier.&lt;/P&gt;&lt;P&gt;But the problem I have is following: I have a large dataset with multiple rows for each subject. I want to allocate a sequence number which increments each line of a subject and starts again at 1 for the next subject. The subjects have to be sorted in a certain order. I have created a test dataset and the "normal" code I am using for this step.&lt;/P&gt;&lt;P&gt;As the datastep is massive I would rather not sort it, but determeine the order by using the sorting variables as key. I have a feeling that this is possible but everything I've tried hasn't worked.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Create Input Data Set */ 
data source;   
	length USUBJID PARCAT1N PARCAT2N PARAMCD AVALC $ 16;   
	input USUBJID PARCAT1N PARCAT2N PARAMCD AVALC; 
	datalines; 
		1010835359 1 1 VAR1 VALUE1
		1010835359 1 2 VAR2 VALUE2
		1010836424 1 1 VAR1 VALUE1
		1010836424 1 2 VAR2 VALUE2
		1010836424 2 3 VAR3 VALUE3
		1010835359 2 3 VAR3 VALUE3
		1010835359 2 4 VAR4 VALUE4
		1010835359 3 5 VAR5 VALUE5
		1010835359 3 6 VAR6 VALUE6
		1010836424 2 4 VAR4 VALUE4
		1010836424 3 5 VAR5 VALUE5
		1010836424 3 6 VAR6 VALUE6

	;
run;&lt;BR /&gt;
/*The "normal" code I would ususally use. This take too miuch time to run*/
proc sort data=source out=source_sort; by USUBJID PARCAT1N PARCAT2N PARAMCD AVALC; run;

data want;
	set source_sort;
		count + 1;
		by USUBJID;
		if first.USUBJID then count = 1;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I thought about something like this, where the sequence number is according to the order of the variables (&lt;CODE class=" language-sas"&gt;USUBJID PARCAT1N PARCAT2N PARAMCD AVALC&lt;/CODE&gt;) but I dont have to do an extra sort:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Load and iterate over hash */ 
data want;   

declare hash ht(dataset:"source",ordered:"a");   
	ht.definekey('USUBJID','PARCAT1N','PARCAT2N','PARAMCD','AVALC');
	ht.defineDone(); 

declare hiter iter("ht");   

set source;
	rc = iter.first();   count = 1;
	if rc = 0 then output;
	rc = iter.next(); count + 1;
run; &lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Is something like this possible with hash?&lt;/P&gt;&lt;P&gt;I would be very thankful for help!&lt;/P&gt;&lt;P&gt;Greeting,&lt;/P&gt;&lt;P&gt;Marius&lt;/P&gt;</description>
      <pubDate>Tue, 10 Mar 2020 08:13:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-HASH-to-add-sequence-number-to-unsorted-dataset/m-p/630812#M186790</guid>
      <dc:creator>ma4</dc:creator>
      <dc:date>2020-03-10T08:13:06Z</dc:date>
    </item>
    <item>
      <title>Re: Using HASH to add sequence number to unsorted dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-HASH-to-add-sequence-number-to-unsorted-dataset/m-p/630815#M186792</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/183563"&gt;@ma4&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here is a suggestion to achieve this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Sort the table using a hash (avoid PROC SORT) */ 
data _null_;   
	if _n_=1 then do;
		declare hash ht(ordered:"a");   
		ht.definekey('USUBJID','PARCAT1N','PARCAT2N','PARAMCD','AVALC');
		ht.definedata('USUBJID','PARCAT1N','PARCAT2N','PARAMCD','AVALC');
		ht.defineDone(); 
	end;
	set source;
	ht.add();
	ht.output(dataset:'source_sorted');
run; 

/* Create the COUNT variable */
data want;
	set source_sorted;
	by USUBJID;
	if first.USUBJID then count = 0;
	count + 1;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;All the best,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 10 Mar 2020 08:33:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-HASH-to-add-sequence-number-to-unsorted-dataset/m-p/630815#M186792</guid>
      <dc:creator>ed_sas_member</dc:creator>
      <dc:date>2020-03-10T08:33:29Z</dc:date>
    </item>
    <item>
      <title>Re: Using HASH to add sequence number to unsorted dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-HASH-to-add-sequence-number-to-unsorted-dataset/m-p/630817#M186794</link>
      <description>&lt;P&gt;I assume that the big dataset has many more columns than shown in the example, and that is why it takes so long to sort.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In that case I would suggest something like this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data extract;
  set have(keep=USUBJID PARCAT1N PARCAT2N PARACMD AVALC);
  obsno=_N_;
run;

proc sort data=extract;
  by USUBJID PARCAT1N PARCAT2N PARACMD AVALC;
run;

data counts(keep=obsno count);
  set extract;
  by USUBJID;
  if first.USUBJID then count=1;
  else count+1;
run;

proc sort data=counts;
  by obsno;
run;

data want;
  set have;
  set counts(drop=obsno);
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 10 Mar 2020 08:38:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-HASH-to-add-sequence-number-to-unsorted-dataset/m-p/630817#M186794</guid>
      <dc:creator>s_lassen</dc:creator>
      <dc:date>2020-03-10T08:38:16Z</dc:date>
    </item>
    <item>
      <title>Re: Using HASH to add sequence number to unsorted dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-HASH-to-add-sequence-number-to-unsorted-dataset/m-p/630834#M186802</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;thanks for the quick reply! It worked but the log file showed me:&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;1239 data _null_;&lt;BR /&gt;1240 if _n_=1 then do;&lt;BR /&gt;1241 declare hash ht(ordered:"a");&lt;BR /&gt;1242 ht.definekey('USUBJID','PARCAT1N','PARCAT2N','PARAMCD','AVALC');&lt;BR /&gt;1243 ht.definedata('USUBJID','PARCAT1N','PARCAT2N','PARAMCD','AVALC');&lt;BR /&gt;1244 ht.defineDone();&lt;BR /&gt;1245 end;&lt;BR /&gt;1246 set source;&lt;BR /&gt;1247 ht.add();&lt;BR /&gt;1248 ht.output(dataset:'source_sorted');&lt;BR /&gt;1249 run;&lt;/P&gt;&lt;P&gt;NOTE: The data set WORK.SOURCE_SORTED has 1 observations and 5 variables.&lt;BR /&gt;NOTE: The data set WORK.SOURCE_SORTED has 2 observations and 5 variables.&lt;BR /&gt;NOTE: The data set WORK.SOURCE_SORTED has 3 observations and 5 variables.&lt;BR /&gt;NOTE: The data set WORK.SOURCE_SORTED has 4 observations and 5 variables.&lt;BR /&gt;NOTE: The data set WORK.SOURCE_SORTED has 5 observations and 5 variables.&lt;BR /&gt;NOTE: The data set WORK.SOURCE_SORTED has 6 observations and 5 variables.&lt;BR /&gt;NOTE: The data set WORK.SOURCE_SORTED has 7 observations and 5 variables.&lt;BR /&gt;NOTE: The data set WORK.SOURCE_SORTED has 8 observations and 5 variables.&lt;BR /&gt;NOTE: The data set WORK.SOURCE_SORTED has 9 observations and 5 variables.&lt;BR /&gt;NOTE: The data set WORK.SOURCE_SORTED has 10 observations and 5 variables.&lt;BR /&gt;NOTE: The data set WORK.SOURCE_SORTED has 11 observations and 5 variables.&lt;BR /&gt;NOTE: The data set WORK.SOURCE_SORTED has 12 observations and 5 variables.&lt;BR /&gt;NOTE: There were 12 observations read from the data set WORK.SOURCE.&lt;BR /&gt;NOTE: DATA statement used (Total process time):&lt;BR /&gt;real time 0.13 seconds&lt;BR /&gt;cpu time 0.07 seconds&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;If I would use this on my main dataset this would iterate through every line and take prrobably longer than an actual proc sort.&lt;/P&gt;</description>
      <pubDate>Tue, 10 Mar 2020 09:38:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-HASH-to-add-sequence-number-to-unsorted-dataset/m-p/630834#M186802</guid>
      <dc:creator>ma4</dc:creator>
      <dc:date>2020-03-10T09:38:57Z</dc:date>
    </item>
    <item>
      <title>Re: Using HASH to add sequence number to unsorted dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-HASH-to-add-sequence-number-to-unsorted-dataset/m-p/630835#M186803</link>
      <description>&lt;P&gt;Thank you very much! This is an amazing idea! I would have never though of that.&lt;/P&gt;&lt;P&gt;I tried on my main dataset and using only the key variables reduced my sort time considerably (down to 30% of the original time). I will check later the implementation in my main program.&lt;/P&gt;&lt;P&gt;Greetings&lt;/P&gt;&lt;P&gt;Marius&lt;/P&gt;</description>
      <pubDate>Tue, 10 Mar 2020 09:42:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-HASH-to-add-sequence-number-to-unsorted-dataset/m-p/630835#M186803</guid>
      <dc:creator>ma4</dc:creator>
      <dc:date>2020-03-10T09:42:08Z</dc:date>
    </item>
    <item>
      <title>Re: Using HASH to add sequence number to unsorted dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-HASH-to-add-sequence-number-to-unsorted-dataset/m-p/630849#M186809</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/183563"&gt;@ma4&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;No problem. I have just realized that I've made it harder for myself with my previous code&lt;STRONG&gt;.&lt;/STRONG&gt;&amp;nbsp;This one will be much efficient:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;	
	if _n_=1 then do;
		set source;
		declare hash ht (dataset:'source', ordered:'a');
		ht.definekey ('USUBJID','PARCAT1N','PARCAT2N','PARAMCD','AVALC');
		ht.definedata ('USUBJID','PARCAT1N','PARCAT2N','PARAMCD','AVALC');
		ht.definedone ();
	end;
	ht.output(dataset:"source_sorted");	
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;My best,&lt;/P&gt;</description>
      <pubDate>Tue, 10 Mar 2020 10:39:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-HASH-to-add-sequence-number-to-unsorted-dataset/m-p/630849#M186809</guid>
      <dc:creator>ed_sas_member</dc:creator>
      <dc:date>2020-03-10T10:39:23Z</dc:date>
    </item>
    <item>
      <title>Re: Using HASH to add sequence number to unsorted dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Using-HASH-to-add-sequence-number-to-unsorted-dataset/m-p/630853#M186811</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/183563"&gt;@ma4&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm not sure to what extent your sample dataset is representative of your real data. (Note that &lt;FONT face="courier new,courier"&gt;count&lt;/FONT&gt; happens to equal PARCAT2N.) If the input dataset is already sorted by PARCAT1N PARCAT2N PARAMCD AVALC &lt;EM&gt;within&lt;/EM&gt; each subset of observations with the same USUBJID, as is the case in dataset SOURCE, you can assign the sequence numbers like this (and thus avoid any sorting):&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
if _n_=1 then do;
  dcl hash h();
  h.definekey('usubjid');
  h.definedata('count');
  h.definedone();
end;
set source;
if ~h.find() then count=count+1;
else count=1;
h.replace();
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 10 Mar 2020 10:45:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Using-HASH-to-add-sequence-number-to-unsorted-dataset/m-p/630853#M186811</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2020-03-10T10:45:31Z</dc:date>
    </item>
  </channel>
</rss>

