Hello all
I have a dataset like below (have) and need an output (want)
Please help me with the code in terms of proc sql or data step
TIA
HAVE | WANT | ||||||||
ID | PRICE | YEARMONTH | ID | PRE_PRICE | PRE_YEAR | POST_PRICE | POST_YEAR | ||
1 | 455412 | 2312 | 1 | 455412 | 2312 | 455405 | 2404 | ||
1 | 455423 | 2402 | 2 | 455406 | 2312 | 455409 | 2410 | ||
1 | 455404 | 2403 | 3 | 455407 | 2402 | 455405 | 2409 | ||
1 | 455405 | 2404 | |||||||
2 | 455406 | 2312 | |||||||
2 | 455407 | 2402 | |||||||
2 | 455408 | 2403 | |||||||
2 | 455409 | 2404 | |||||||
2 | 455405 | 2409 | |||||||
2 | 455409 | 2410 | |||||||
3 | 455407 | 2402 | |||||||
3 | 455408 | 2403 | |||||||
3 | 455409 | 2404 | |||||||
3 | 455402 | 2405 | |||||||
3 | 455403 | 2406 | |||||||
3 | 455404 | 2407 | |||||||
3 | 455404 | 2408 | |||||||
3 | 455405 | 2409 |
Use a different selection criteria for when to remember the price.
data want;
set have;
by id;
retain pre_price pre_year ;
if first.id then call missing(pre_price,pre_year);
if missing(pre_price) then do;
pre_price=price;
pre_year=yearmonth;
end;
if last.id;
rename price=post_price yearmonth=post_year;
run;
Why do you need SQL code if the data step code works??
its a requirement from the client as they need most of the code in sql to be run on other applications with different language than SAS for comparing results.
@kajal_30 wrote:
its a requirement from the client as they need most of the code in sql to be run on other applications with different language than SAS for comparing results.
Then look for solutions in that language.
What flavor of SQL are you talking about? Each one has its own added features and quirky implementations. If you are lucky they are using a version of SQL that is newer than what SAS uses so they can use what are called "windowing" features of SQL.
I am talking about SAS SQL only we are not prioritizing other language our milestone is to get the SAS code working. So you are saying it is not possible in SAS to use sql to get this output ?
1) Since you are doing this in SAS, why not to use DATA STEP? It is _the_ tool to do such tasks.
2) SQL does not support firs. and last. by design. You would have to do it differently.
3) I would not recommend this method for a production use, but if you really have to you could try something like this:
data have;
input ID PRICE YEARMONTH;
cards;
1 455412 2312
1 455423 2402
1 455404 2403
1 455405 2404
2 455406 2312
2 455407 2402
2 455408 2403
2 455409 2404
2 455405 2409
2 455409 2410
3 455407 2402
3 455408 2403
3 455409 2404
3 455402 2405
3 455403 2406
3 455404 2407
3 455404 2408
3 455405 2409
;
run;
proc print;
run;
proc sql;
select y.ID
,sum(case when y.n=y.min_n then y.PRICE else . end) as PRE_PRICE
,sum(case when y.n=y.min_n then y.YEARMONTH else . end) as PRE_YEAR
,sum(case when y.n=y.max_n then y.PRICE else . end) as POST_PRICE
,sum(case when y.n=y.max_n then y.YEARMONTH else . end) as POST_YEAR
from
(
select c.*
, min(n) as min_n
, max(n) as max_n
from
(
select x.ID, x.PRICE, x.YEARMONTH, count(distinct ym) as n
from
(
select a.*, b.YEARMONTH as ym
from have as a
, have as b
where a.ID = b.ID
and a.YEARMONTH >= b.YEARMONTH
) as x
group by x.ID, x.PRICE, x.YEARMONTH
) as c
group by c.ID
) as y
group by y.ID
order by y.ID
;
quit;
Bart
Thank you can also please share the data step code and as my code is selecting all obs.
data want;
set have ;
by yearmonth ;
if first.yearmonth then pre_price = price and pre_year = yearmonth;
output;
if last.yearmonth then post_price = price and post_year = yearmonth ;
output;
run;
TIA
kajal
I thought you wanted to group by ID, not by your yearmonth variable.
Here is how to share data (rather then just posting a listing).
data have;
input ID PRICE YEARMONTH;
cards;
1 455412 2312
1 455423 2402
1 455404 2403
1 455405 2404
2 455406 2312
2 455407 2402
2 455408 2403
2 455409 2404
2 455405 2409
2 455409 2410
3 455407 2402
3 455408 2403
3 455409 2404
3 455402 2405
3 455403 2406
3 455404 2407
3 455404 2408
3 455405 2409
;
data expect ;
input ID PRE_PRICE PRE_YEAR POST_PRICE POST_YEAR ;
cards;
1 455412 2312 455405 2404
2 455406 2312 455409 2410
3 455407 2402 455405 2409
;
So assuming the data is already sorted the data step to roll up the first/last month values per ID could look like this:
data want;
set have;
by id yearmonth;
if first.id then do;
pre_price=price;
pre_year=yearmonth;
end;
retain pre_price pre_year;
if last.id;
rename price=post_price yearmonth=post_year;
run;
So let's check if we got what you expected.
proc compare data=want compare=expect;
id id;
run;
Results
The COMPARE Procedure Comparison of WORK.WANT with WORK.EXPECT (Method=EXACT) Data Set Summary Dataset Created Modified NVar NObs WORK.WANT 07OCT24:13:15:35 07OCT24:13:15:35 5 3 WORK.EXPECT 07OCT24:13:15:35 07OCT24:13:15:35 5 3 Variables Summary Number of Variables in Common: 5. Number of ID Variables: 1. Observation Summary Observation Base Compare ID First Obs 1 1 ID=1 Last Obs 3 3 ID=3 Number of Observations in Common: 3. Total Number of Observations Read from WORK.WANT: 3. Total Number of Observations Read from WORK.EXPECT: 3. Number of Observations with Some Compared Variables Unequal: 0. Number of Observations with All Compared Variables Equal: 3. NOTE: No unequal values were found. All values compared are exactly equal.
My take on an SQL solution:
proc sql; create table work.want as select a.id, a.pre_price, a.pre_year, b.post_price, b.post_year from ( select id, price as Pre_price, yearmonth as Pre_year from have group by id having yearmonth = min(yearmonth) ) as a left join ( select id, price as post_price, yearmonth as post_year from have group by id having yearmonth = max(yearmonth)) as b on a.id=b.id ; quit;
@kajal_30 wrote:
Thank you can also please share the data step code and as my code is selecting all obs.
data want; set have ; by yearmonth ; if first.yearmonth then pre_price = price and pre_year = yearmonth; output; if last.yearmonth then post_price = price and post_year = yearmonth ; output; run;
TIA
kajal
Doesn't consider ID at all which your desired seems to imply the ID is important.
If you want the first and last observations you would have to RETAIN the PRE_ variables until you output with the last per ID.
Your example data as presented actually would throw errors because the data is not sorted by yearmonth. If it were, then you would have scrambled the ID values.
Note that eventually bad things come those you think a number like Yearmonth is a good way to handle date information.
Thank you so much multiple solution worked but I got one more challenge with the data as below.
can we please accommodate the same
data have;
input ID PRICE YEARMONTH;
cards;
1 . 2312
1 455423 2402
1 455404 2403
1 455405 2404
2 455406 2312
2 455407 2402
2 455408 2403
2 455409 2404
2 455405 2409
2 . 2410
3 455407 2402
3 455408 2403
3 455409 2404
3 455402 2405
3 455403 2406
3 455404 2407
3 455404 2408
3 455405 2409
;
RUN;
data expect ;
input ID PRE_PRICE PRE_YEAR POST_PRICE POST_YEAR ;
cards;
1 455423 2402 455405 2404
2 455406 2312 . 2410
3 455407 2402 455405 2409
;
run;
while doing the first. we cannot have null value so pre_ value should go to next non null value but post_ value can have null. I am stuck as we can not simply put where price is not null as it will also eliminate null values for post
TIA
kajal
Since the requirement has changed then just make sure that the part that gets the minimum yearmonth value only looks at bits where the price has a value:
proc sql; create table work.want as select a.id, a.pre_price, a.pre_year, b.post_price, b.post_year from ( select id, price as Pre_price, yearmonth as Pre_year from (select * from have where not missing(price)) group by id having yearmonth = min(yearmonth) ) as a left join ( select id, price as post_price, yearmonth as post_year from have group by id having yearmonth = max(yearmonth)) as b on a.id=b.id ; quit;
The more things related to order of values you impose the more likely that SQL answers, if possible, will become more complicated, hard to follow, and likely have much poorer performance.
Use a different selection criteria for when to remember the price.
data want;
set have;
by id;
retain pre_price pre_year ;
if first.id then call missing(pre_price,pre_year);
if missing(pre_price) then do;
pre_price=price;
pre_year=yearmonth;
end;
if last.id;
rename price=post_price yearmonth=post_year;
run;
thank you so much
regards
kajal
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.