- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
INPUT
ID Item code Profile
000000196 212543179 1
000000270 216630523 1
000000346 217007122 1
000000395 000021678 1
000000752 000294850 1
000001277 000000344 1
000001277 000000546 1
I want to have extra two columns, one is Count, which calculates how many times the same ID number appears in the list. Want to be use Profile divided by Count.
OUTPUT
ID Item code Profile Count Want
000000196 212543179 1 1 1
000000270 216630523 1 1 1
000000346 217007122 1 1 1
000000395 000021678 1 1 1
000000752 000294850 1 1 1
000001277 000000344 1 2 0.5
000001277 000000546 1 2 0.5
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If your data are sorted by ID, then:
data have;
infile cards truncover;
input ID :$10. Item_code :$10. Profile ;
cards;
000000196 212543179 1
000000270 216630523 1
000000346 217007122 1
000000395 000021678 1
000000752 000294850 1
000001277 000000344 1
000001277 000000546 1
;
data want;
set have;
by id;
if last.id; /*Go back if not last.id */
count=_n_-sum(lag(_n_),0); /*At last.id, calculate COUNT*/
do _n_=1 to count; /*Reread COUNT observations */
set have;
want=profile/count;
output;
end;
run;
The data step only drops down to the calculation of count when the end of a by-group is encountered. This means that the lag(_n_) inside that calculation is only updated at the end of each successive by-group (the first lag(_n_) would be missing, so I add a zero). Remember the lag function is not a lookback like in excel. It's an update of a queued value, which is retained in memory across observations, until replaced by another invocation of the same lag function - i.e. at the end of the next by group.
Then once the count value is established, a loop iterated COUNT times issues a second set statement, re-reading the same observations, and outputs them, with the new count and want variables. Note that each SET HAVE statement generates a separate stream of input from HAVE. In this case the two SET's are loosely synchronized. The first SET reads one observation at a time, while the second reads in bunches, of size count.
Note even though you are reading the dataset twice, you are NOT doubling the amount of data transfer from the disk file. Almost certainly the second SET statement is merely reading from a cache in memory generated by the first SET.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not easily readable
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry I just reformat it, some how it got crumbled up
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
infile cards truncover;
input ID : $20. Item_code : $20. Profile ;
cards;
000000196 212543179 1
000000270 216630523 1
000000346 217007122 1
000000395 000021678 1
000000752 000294850 1
000001277 000000344 1
000001277 000000546 1
;
proc sql;
create table want as
select *,count(id) as count, profile/calculated count as want
from have
group by id;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If your data are sorted by ID, then:
data have;
infile cards truncover;
input ID :$10. Item_code :$10. Profile ;
cards;
000000196 212543179 1
000000270 216630523 1
000000346 217007122 1
000000395 000021678 1
000000752 000294850 1
000001277 000000344 1
000001277 000000546 1
;
data want;
set have;
by id;
if last.id; /*Go back if not last.id */
count=_n_-sum(lag(_n_),0); /*At last.id, calculate COUNT*/
do _n_=1 to count; /*Reread COUNT observations */
set have;
want=profile/count;
output;
end;
run;
The data step only drops down to the calculation of count when the end of a by-group is encountered. This means that the lag(_n_) inside that calculation is only updated at the end of each successive by-group (the first lag(_n_) would be missing, so I add a zero). Remember the lag function is not a lookback like in excel. It's an update of a queued value, which is retained in memory across observations, until replaced by another invocation of the same lag function - i.e. at the end of the next by group.
Then once the count value is established, a loop iterated COUNT times issues a second set statement, re-reading the same observations, and outputs them, with the new count and want variables. Note that each SET HAVE statement generates a separate stream of input from HAVE. In this case the two SET's are loosely synchronized. The first SET reads one observation at a time, while the second reads in bunches, of size count.
Note even though you are reading the dataset twice, you are NOT doubling the amount of data transfer from the disk file. Almost certainly the second SET statement is merely reading from a cache in memory generated by the first SET.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
love this -" Almost certainly the second SET statement is merely reading from a cache in memory generated by the first SET."
Never knew mechanics but a lot better now
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Perhaps
data have; informat ID Itemcode $10.; INPUT ID Itemcode Profile ; datalines; 000000196 212543179 1 000000270 216630523 1 000000346 217007122 1 000000395 000021678 1 000000752 000294850 1 000001277 000000344 1 000001277 000000546 1 ; run; proc sql; create table want as select a.id, a.itemcode, a.profile,b.count,(a.profile/b.count) as want from have as a left join (select id, count(*) as count from have group by id) as b on a.id=b.id; quit;
Note use of data step to provide example data and pasted into a code box opened with the {I} icon. Also variable name ITEMCODE as SAS does not accept spaces by default in variable names. Also notice that the variable ID and Itemcode must be character to preserve leading 0s
Since you required having a aggregate value Count merged with all of the base record ID the LEFT JOIN does that with the select bit in () creating the summary.