My data set MYTAB has variables ID, MONTH, QUANTITY.
If an ID has no data for a MONTH in the sequence 1 - 12, I want to add a record with ID, MONTH and QUANTITY = . (dot for missing numeric value).
So all ID:s should have 12 records each.
Have you any idea how to solve this?
For this example where all ID's have the same month values you can do that with PROC SUMMARY CLASSDATA=
Thanks for your answer. I am sure that it works, but for a person with little experience it's hard to understand the code.
Wouldn't it be possible to use CASE WHEN in PROC SQL?
Some sort of pseudo code: When month is missing in [1,12} add record with ID, MONTH, QUANTITY (= . ).
I would suggest the easiest thing is to create a template dataset with the sequence you require and then merge that on. For example (and I haven't tested this):
data template;
do I=1 to 12;
month=I;
output;
end;
run;
proc sql;
create table WANT as
select SOME_VARIABLES, /* note update per your table */
COALESCE(A.MONTH,B.MONTH) as MONTH
from WORK.HAVE A
full join WORK.TEMPLATE B
on A.MONTH=B.MONTH;
quit;
This will add in any missing months and the other variables will be missing. If you need other variables populated then create your template with all necessary variables.
The COALESCE function returns the first non-missing value from a list of numeric arguments.
I don't understand how that function could be of any help.
Its a trick with merging and coalescing. Say you have these two datasets and full join them on month:
Data A Data B
Month Result Month
Jan 12 Jan
Feb 10 Feb
Mar
... ...
The full join expands the first table to include values from the right, however the month would be missing if we did not do the coalesce for Mar. Eg.
select a.month... = Month=missing, Result=missing as A.MONTH does not contain anything.
select coalesce(a.month,b.month) = Month=Mar, Result = missing, as the data from b is used as a is missing.
So, the full join creates all the necessary extra rows, the coalesce ensures that the variables are populated where necessary.
Your code doesn't work.
I get the same number of records, but in ascending order with regard to MONTH. So first all MONTH =1, then all MONTH =2 , etc.
When I started each ID:s all records was displayed together( in ascending order with regard to MONTH ).
So here is the full code, tested this time. The point is to create a dataset which will be the template for the other one.
data have;
attrib id format=$1. month quantity format=best.;
infile datalines delimiter=",";
input id $ month quantity;
datalines;
a,1,100
a,2,200
a,4,150
b,1,60
;
run;
/* Create template, need to base it on your have data */
proc sql;
create table IDS as
select distinct ID
from WORK.HAVE;
quit;
data template (drop=i);
set ids;
do I=1 to 12;
month=I;
output;
end;
run;
proc sql;
create table WANT as
select COALESCE(A.ID,B.ID) as ID,
COALESCE(A.MONTH,B.MONTH) as MONTH,
A.QUANTITY
from WORK.HAVE A
full join WORK.TEMPLATE B
on A.ID=B.ID
and A.MONTH=B.MONTH;
quit;
Thankyou RW9 for your solution. Now it worked.
I am forced to use SAS.
This would have been an easy problem to solve in an ordinary programming language, working with arrays, and that code would have been easy to understand for anyone with basic knowledge in programming.
The suggestion posted was merely for one method of doing such a task. SAS also has arrays syntax:
array months {12} 3. (1,2,3,4,5,6,7,8,9,10,11,12);
You could also look at this snippet:
retain lstmonth;
if month ne lstmonth then do;
do I=lstmonth to month-1;
month=I;
output;
end;
end;
output;
So keep the last value of month, and if the current rows month is not the previous month +1 then output rows for the difference. Note you need to check for month=12, and set other values to missing for these additional rows.
As with most problems there is no one singular method to get what you want
try this:
*get all ID's;
proc sort
data=mytab (keep=ID)
out=control
nodupkey
;
by ID;
run;
*to be on the safe side;
proc sort data=mytab;
by ID MONTH;
run;
*make 12 months for every ID;
data control2;
set control;
do MONTH = 1 to 12;
output;
end;
run;
data result;
merge
mytab (in=a)
control2 (in=b)
;
if b;
by ID MONTH;
*everytime there is no record from mytab, quantity is implicitely set to .;
run;
Edit: Ups, forgot "by ID MONTH;" in the last data step.
Thankyou KurtBremser.
Your code was more easy to understand.
EASY FOR SQL.
data have; input id $ month quantitiy; cards; a 2 11 a 7 12 b 10 11 b 4 9 ;;;; run; data m; do month=1 to 12; output; end; run; proc sql; create table temp as select * from (select distinct id from have) as a,m; create table want as select temp.*,have.quantitiy from have right join temp on temp.id=have.id and temp.month=have.month; quit;
Xia Keshan
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.