Hi,
If I have a dataset by City, and date and price. How can I get a count of consecutive months that the price is missing by City.
Have
City Month Price
Tor 01 345
Tor 02 .
Tor 03 .
Tor 04 367
Tor 05 678
Mon 01 .
Mon 02 235
Mon 03 .
Mon 04 .
Win 01 234
Win 02 456
Win 03 .
Win 04 .
Win 05 .
Want
City ConsMonth
Tor 2
Mon 1
Win 5
Thanks
How did you come up the number in data set want?
Well, first of all it would help if I gave out the right want results.:smileyblush: Sorry about that..
It should be
Want
City ConsMonth
Tor 2
Mon 2
Win 3
How many consecutive blank months (shown by prices)... Again, sorry about that.
hi ... here's one way ...
data x;
input city :$3. month price @@;
datalines;
Tor 01 345 Tor 02 . Tor 03 . Tor 04 367 Tor 05 678
Mon 01 . Mon 02 235 Mon 03 . Mon 04 . Win 01 234
Win 02 456 Win 03 . Win 04 . Win 05 .
;
proc summary data=x missing nway;
by city price notsorted;
output out=stats;
run;
proc sql;
select city, max(_freq_) as consmonths from stats group by city;
quit;
city consmonths
Mon 2
Tor 2
Win 3
Mike,
Your code is not right ,try the data below.
Tor | 01 | 345 |
Tor | 02 | 345 |
Tor | 03 | . |
Tor | 04 | 367 |
Tor | 05 | 678 |
Mon | 01 | . |
Mon | 02 | 235 |
Mon | 03 | . |
Mon | 04 | . |
hi ... you are correct, I forgot the WHERE in SUMMARY ...
proc summary data=x missing nway;
by city price notsorted;
output out=stats (where=(missing(price)));
run;
proc sql;
select city, max(_freq_) as consmonths from stats group by city;
quit;
Mike,
First of all, I like your code.
But After the discussion with mkeintz at another session , I found an interesting problem, if missing value is a special like .A .Z .Result is totally different.
Ksharp
You can fix that with a FORMAT.
proc format;
value miss(default=8) ._-.Z = .;
run;
proc summary data=x missing nway;
by city price notsorted;
output out=stats(where=(missing(price)));
format price miss.;
run;
hi ... correct again, so group all possible missing values with a format ......
data x;
input city :$3. month price @@;
datalines;
Tor 01 345 Tor 02 .H Tor 03 . Tor 04 367 Tor 05 678
Mon 01 . Mon 02 235 Mon 03 . Mon 04 .
Win 01 234 Win 02 456 Win 03 . Win 04 .L Win 05 .L
;
proc format;
value price ._ - .Z = ' ';
run;
proc summary data=x missing nway;
by city price notsorted;
output out=stats (where=(missing(price)));
format price price.;
run;
proc sql;
select city, max(_freq_) as consmonths from stats group by city;
quit;
city consmonths
Mon 2
Tor 2
Win 3
How about:
data have; input City $ Month Price ; cards; Tor 01 345 Tor 02 . Tor 03 . Tor 04 367 Tor 05 678 Mon 01 . Mon 02 235 Mon 03 . Mon 04 . Win 01 234 Win 02 456 Win 03 . Win 04 . Win 05 . ; run; data have; set have; if city ne lag(city) or not missing(price) then n=0; if missing(price) then n+1; run; proc sql; create table want(where=(n ne 0)) as select city,max(n) as n from have group by city ; quit;
Ksharp
Message was edited by: xia keshan
This is a case where the nested DOW loop approach, in combination with the NOTSORTED by option, is a compact solution:
data want (keep=city nmonths);
do until (last.city);
do n=1 by 1 until (last.price);
set have;
by city price notsorted;
end;
nmonths=max(nmonths,ifn(missing(price),n,0));
end;
run;;
I understood this problem to be finding the maximum number of consecutive months with a missing pri9ce. This program assumes that the data are sorted by city and month.
But see data _null_'s followup below to treat the entire range of missing values (._, .A, .B ... .Z, .) as a single BY level for price.
Message was edited by: Mark Keintz
Add the GROUPFORMAT option to the BY statement and apply format to group all missing values together in the case of special missing.
data want (keep=city nmonths);
do until (last.city);
do n=1 by 1 until (last.price);
set x;
by city price notsorted groupformat;
end;
nmonths=max(nmonths,ifn(missing(price),n,0));
format price miss.;
end;
run;
Thanks all.. I like Mikeintz's method because it does not count other missing values that belong to the same city but consecutively numbered lower than the larger consecutive series.. for example if there are 5 missing months and then 6 priced months and then again 4 missing months. then it counts only 5 and not 5+4 = 9 ..
hi .. yes, that solution to your problem is very nice
but, in case you did not notice, the SUMMARY+SQL solution also does this
"for example if there are 5 missing months and then 6 priced months and then again 4 missing months. then it counts only 5 and not 5+4 = 9"
Ksharp's code also gave you 5, not 5+4=9.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.