Hi all,
I have a dataset in which 2 fields('first' and 'last') contains the value of a particular field('amount'). I would like to use this dataset to create a new one containing the continuous list of the 2 fields in SAS.
Note: All fields are character types in the dataset.
Example below:
Input:
first | last | amount |
0 | 3 | 0% |
4 | 6 | 3% |
7 | 8 | 6% |
9 | 10 | 8% |
11 | 12 | 10% |
Output:
field1 | amount |
0 | 0% |
1 | 0% |
2 | 0% |
3 | 0% |
4 | 3% |
5 | 3% |
6 | 3% |
7 | 6% |
8 | 6% |
9 | 8% |
10 | 8% |
11 | 10% |
12 | 10% |
Please help. Thanks!
The two FIRST variables where first digit is 0 works fine but the one with first digit=1 does not.
May be someone else will give it solve it:
The code is:
data have;
length first last $19 amount $4;
infile cards truncover;
input first last amount $;
cards;
0000333330200007322 0000333330200007326 0%
1000333330200007327 1000333330200007330 1%
0000333330200007327 0000333330200007330 10%
run;
proc sort data=have; by first; run;
data want;
retain field1 amount;
set have;
f = input(first,19.);
l = input(last,19.);
times = int(l - f +1 );
put _N_= times=;
do i=1 to times;
field1 = put(int(f +i -1),z19.);
output;
end;
keep field1 amount;
run;
my issue is getting fault computation of TIMES as shown in the log for _N_=3 (after sort):
_N_=1 times=5 _N_=2 times=4 _N_=3 times=1 <<<< should be 4 !!!
data have;
infile cards;
input first last amount;
cardfs;
0 3 0
4 6 3
7 8 6
9 10 8
11 12 10
; run;
data want;
set have;
do field1=first to last; output; end;
keep field1 amount;
run;
Or you may prefer:
data have;
infile cards truncover;
input first last amount $4.;
cardfs;
0 3 0%
4 6 3%
7 8 6%
9 10 8%
11 12 10%
; run;
data want;
set have;
do field1=first to last; output; end;
keep field1 amount;
run;
As the earlier suggestions from @Shmuel imply, making FIRST and LAST character fields is a mistake. They represent numbers, not character strings. However, if that is a requirement, programming statements can convert back and forth between character and numeric. For example:
data want;
set have;
do _n_=input(first, 8.) to input(last, 8.);
field1 = left(put(_n_, 8.));
output;
end;
keep field1 amount;
run;
Again, it would be simpler to keep FIELD1 as numeric. (The program above makes it character.) Then you could assign it a value with a simpler expression:
field1 = _n_;
But to keep it as character, you need more complex programming as in the top example.
Hello,
In this case, if first and last are kept character, SAS will do an implicit conversion and issue a note,
so @Shmuel's code will also give the wanted result, though in a less "clean" manner.
85 data have;
86 first="0";
87 last="10";
88 do _n_=first to last;
89 put _n_=;
90 end;
91 run;
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
88:12 88:21
_N_=0
_N_=1
_N_=2
_N_=3
_N_=4
_N_=5
_N_=6
_N_=7
_N_=8
_N_=9
_N_=10
NOTE: The data set WORK.HAVE has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
All your values are numeric (sequences or amounts), so they should be stored as such. See Maxim 33.
@Ja5ya wrote:
You do make sense but the character values have around 20 digits and I need to use it in a hash later on using the 20 digit value. My data is a concatenation of an id and sas date. For instance:
start end amount
0000333330200007322 0000333330200007326 0%
1000333330200007327 1000333330200007330 1%
0000333330200007327 0000333330200007330 10%
So what could be the best way to list all the variables in between the start and end values?
Could be numbers with z-Format attached.
Since you can't work reliably with 19-digit numbers in SAS (precision runs out at around 16 digits), I would split the strings into two numbers and run several (partially nested) DO loops:
data have;
input start :$19. end :$19. amount :percent6.;
format amount percent6.;
datalines;
0000333330200007322 0000333330200007326 0%
1000333330200007327 1000333330200007330 1%
0000000000999999999 0000000001000000001 5%
0000333330200007327 0000333330200007330 10%
;
data want;
length field1 $19;
set have;
if substr(start,1,10) = substr(end,1,10)
then do;
do inner = input(substr(start,11,9),9.) to input(substr(end,11,9),9.);
field1 = substr(start,1,10) !! put(inner,z9.);
output;
end;
end;
else do;
do inner = input(substr(start,11,9),9.) to 999999999;
field1 = substr(start,1,10) !! put(inner,z9.);
output;
end;
do outer = input(substr(start,11,9),10.) + 1 to input(substr(end,1,10),10.) - 1;
do inner = 0 to 999999999;
field1 = put(outer,z10.) !! put(inner,z9.);
output;
end;
end;
do inner = 0 to input(substr(end,11,9),9.);
field1 = substr(end,1,10) !! put(inner,z9.);
output;
end;
end;
keep field1 amount;
run;
proc print data=want noobs;
run;
I added an observation to illustrate the "outer" switch.
Result:
field1 amount 0000333330200007322 0% 0000333330200007323 0% 0000333330200007324 0% 0000333330200007325 0% 0000333330200007326 0% 1000333330200007327 1% 1000333330200007328 1% 1000333330200007329 1% 1000333330200007330 1% 0000000000999999999 5% 0000000001000000000 5% 0000000001000000001 5% 0000333330200007327 10% 0000333330200007328 10% 0000333330200007329 10% 0000333330200007330 10%
The two FIRST variables where first digit is 0 works fine but the one with first digit=1 does not.
May be someone else will give it solve it:
The code is:
data have;
length first last $19 amount $4;
infile cards truncover;
input first last amount $;
cards;
0000333330200007322 0000333330200007326 0%
1000333330200007327 1000333330200007330 1%
0000333330200007327 0000333330200007330 10%
run;
proc sort data=have; by first; run;
data want;
retain field1 amount;
set have;
f = input(first,19.);
l = input(last,19.);
times = int(l - f +1 );
put _N_= times=;
do i=1 to times;
field1 = put(int(f +i -1),z19.);
output;
end;
keep field1 amount;
run;
my issue is getting fault computation of TIMES as shown in the log for _N_=3 (after sort):
_N_=1 times=5 _N_=2 times=4 _N_=3 times=1 <<<< should be 4 !!!
@Ja5ya wrote:
... the character values have around 20 digits and I need to use it in a hash later on using the 20 digit value. My data is a concatenation of an id and sas date.
Hi @Ja5ya,
I would avoid that concatenation as long as possible. As has been shown, incrementing a numeric variable containing integer values such as SAS dates is very easy. Then the ID would not be affected by the incrementation, which is good, and it could be used as a BY variable, which is even better, not to mention the formatting options for the separate date variable.
If you're talking about a SAS hash object -- they can use composite keys consisting of two or more separate variables (also of mixed types). Otherwise, a concatenation could be done temporarily, just for the hashing operations.
Though you signed my previous post as a result, due to important remarks of other contributors and assuming that last 4 digits of FIRST and LAST variables are the sas date while the 1st part is common then next code will correct where
the 1st digit is 1:
data have;
length first last $19 amount $4;
infile cards truncover;
input first last amount $;
cards;
0000333330200007322 0000333330200007326 0%
1000333330200007327 1000333330200007330 1%
0000333330200007327 0000333330200007330 10%
run;
proc sort data=have; by first; run;
data want;
retain field1 amount;
set have;
f = input(substr(first,16,4),4.);
l = input(substr(last,16,4),4.);
times = int(l - f +1 );
put _N_= f= l= times=;
do i=1 to times;
field1 = cats(substr(first,1,15),put((f+i-1),z4.));
output;
end;
keep field1 amount;
run;
The key to the problem is that you intend to create a hash table later. Don't do it. Use a format for look-up purposes instead of a hash table. Formats easily handle ranges. Your intention to use a hash table is what forces you to jump through all these hoops. Here's how easy(?) a format would be.
data fmt_in;
set have end=done;
length label $ 9;
label=amount;
rename first=start last=end;
retain fmtname '$amt';
output;
if done;
hlo='O';
label='Not found';
output;
run;
proc format cntlin=fmt_in;
run;
To use the format, you have to be able to create the key that you would have needed for a hash table: ID + date. You can then use either of these as appropriate:
format key $amt.;
retrieved_amt = put(key, $amt.);
Concur with @FreelanceReinh.
Keep the variables separate, especially when you use a hash object, which allows multiple keys.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.