INPUT
ID Item code Profile
000000196 212543179 1
000000270 216630523 1
000000346 217007122 1
000000395 000021678 1
000000752 000294850 1
000001277 000000344 1
000001277 000000546 1
I want to have extra two columns, one is Count, which calculates how many times the same ID number appears in the list. Want to be use Profile divided by Count.
OUTPUT
ID Item code Profile Count Want
000000196 212543179 1 1 1
000000270 216630523 1 1 1
000000346 217007122 1 1 1
000000395 000021678 1 1 1
000000752 000294850 1 1 1
000001277 000000344 1 2 0.5
000001277 000000546 1 2 0.5
If your data are sorted by ID, then:
data have;
infile cards truncover;
input ID :$10. Item_code :$10. Profile ;
cards;
000000196 212543179 1
000000270 216630523 1
000000346 217007122 1
000000395 000021678 1
000000752 000294850 1
000001277 000000344 1
000001277 000000546 1
;
data want;
set have;
by id;
if last.id; /*Go back if not last.id */
count=_n_-sum(lag(_n_),0); /*At last.id, calculate COUNT*/
do _n_=1 to count; /*Reread COUNT observations */
set have;
want=profile/count;
output;
end;
run;
The data step only drops down to the calculation of count when the end of a by-group is encountered. This means that the lag(_n_) inside that calculation is only updated at the end of each successive by-group (the first lag(_n_) would be missing, so I add a zero). Remember the lag function is not a lookback like in excel. It's an update of a queued value, which is retained in memory across observations, until replaced by another invocation of the same lag function - i.e. at the end of the next by group.
Then once the count value is established, a loop iterated COUNT times issues a second set statement, re-reading the same observations, and outputs them, with the new count and want variables. Note that each SET HAVE statement generates a separate stream of input from HAVE. In this case the two SET's are loosely synchronized. The first SET reads one observation at a time, while the second reads in bunches, of size count.
Note even though you are reading the dataset twice, you are NOT doubling the amount of data transfer from the disk file. Almost certainly the second SET statement is merely reading from a cache in memory generated by the first SET.
Not easily readable
Sorry I just reformat it, some how it got crumbled up
data have;
infile cards truncover;
input ID : $20. Item_code : $20. Profile ;
cards;
000000196 212543179 1
000000270 216630523 1
000000346 217007122 1
000000395 000021678 1
000000752 000294850 1
000001277 000000344 1
000001277 000000546 1
;
proc sql;
create table want as
select *,count(id) as count, profile/calculated count as want
from have
group by id;
quit;
If your data are sorted by ID, then:
data have;
infile cards truncover;
input ID :$10. Item_code :$10. Profile ;
cards;
000000196 212543179 1
000000270 216630523 1
000000346 217007122 1
000000395 000021678 1
000000752 000294850 1
000001277 000000344 1
000001277 000000546 1
;
data want;
set have;
by id;
if last.id; /*Go back if not last.id */
count=_n_-sum(lag(_n_),0); /*At last.id, calculate COUNT*/
do _n_=1 to count; /*Reread COUNT observations */
set have;
want=profile/count;
output;
end;
run;
The data step only drops down to the calculation of count when the end of a by-group is encountered. This means that the lag(_n_) inside that calculation is only updated at the end of each successive by-group (the first lag(_n_) would be missing, so I add a zero). Remember the lag function is not a lookback like in excel. It's an update of a queued value, which is retained in memory across observations, until replaced by another invocation of the same lag function - i.e. at the end of the next by group.
Then once the count value is established, a loop iterated COUNT times issues a second set statement, re-reading the same observations, and outputs them, with the new count and want variables. Note that each SET HAVE statement generates a separate stream of input from HAVE. In this case the two SET's are loosely synchronized. The first SET reads one observation at a time, while the second reads in bunches, of size count.
Note even though you are reading the dataset twice, you are NOT doubling the amount of data transfer from the disk file. Almost certainly the second SET statement is merely reading from a cache in memory generated by the first SET.
love this -" Almost certainly the second SET statement is merely reading from a cache in memory generated by the first SET."
Never knew mechanics but a lot better now
Perhaps
data have; informat ID Itemcode $10.; INPUT ID Itemcode Profile ; datalines; 000000196 212543179 1 000000270 216630523 1 000000346 217007122 1 000000395 000021678 1 000000752 000294850 1 000001277 000000344 1 000001277 000000546 1 ; run; proc sql; create table want as select a.id, a.itemcode, a.profile,b.count,(a.profile/b.count) as want from have as a left join (select id, count(*) as count from have group by id) as b on a.id=b.id; quit;
Note use of data step to provide example data and pasted into a code box opened with the {I} icon. Also variable name ITEMCODE as SAS does not accept spaces by default in variable names. Also notice that the variable ID and Itemcode must be character to preserve leading 0s
Since you required having a aggregate value Count merged with all of the base record ID the LEFT JOIN does that with the select bit in () creating the summary.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.