BookmarkSubscribeRSS Feed
jorheej
Obsidian | Level 7

 

data have;
  input date1 kind date2 date3 date4 code1 code2 num tmp_no;
  cards;
  201501 1 20000930 20210722 19600702 N 1 60 2000 1
  201502 1 20000930 20210722 19600702 N 1 60 3000 1
  201501 3 20011231 20211002 19990821 Y 2 61 3000 2
  201507 5 20040308 99991231 19661230 Y 2 77 5000 3
  201508 5 20040308 99991231 19661230 Y 2 77 5000 3
  201512 1 20020125 99991231 19990821 Y 1 130 2000 4
  ;
run;

Hello, 

this datasets is extracted from original data since they have null in the key value (identification number by person).

 

I want to make temporary number column like tmp_no so that I can see it person by person.

 

All the variables are character except for the 'num'.

 

I searched for a while but other solutions like codes below didn't work.

data want;
set have;
by var:;
retain order 0;
if first.var then tmp_no+1; run;

I guess it is because none of my columns are key value.

Can you give me a good suggestion for this problem?

 

Thanks a lot in advance

 

 

 

7 REPLIES 7
Astounding
PROC Star

Let's take a step back from the computer for a second.

 

If you were looking at this data on a piece of paper, how would you know which rows belong to the same person?

 

Does order matter?  For example, would this data represent 2 people or would it represent 4 people?

  201502 1 20000930 20210722 19600702 N 1 60 3000 1
  201501 3 20011231 20211002 19990821 Y 2 61 3000 2
  201502 1 20000930 20210722 19600702 N 1 60 3000 ?
  201501 3 20011231 20211002 19990821 Y 2 61 3000 ?
jorheej
Obsidian | Level 7
Hello,
I can't tell this data is 2 or 4 people because of the null.
If those two question marks are 1 and 2,
it would be 2 persons because they are the same except for the first column.
But when question marks are 1 and 1,
it becomes 3 people.

sorry for my ambiguous description 😞
jorheej
Obsidian | Level 7
There is no 'question mark' in my dataset, I just mentioned that is in @Astounding 's code cell 🙂
Tom
Super User Tom
Super User

Your INPUT statement does not match the number of variables in the data lines.

Your BY statement does not match the variables in your dataset.

You are attempting to use a FIRST.xxx variable that does not exist since it was not included in the BY statement.

 

It looks like your KEY consists of some, but not all , of the variables.  In particular the first "date" and the last "num" are not part of the key.

data have;
  input date1 kind date2 date3 date4 code1 $ code2 num1 num2 tmp_no;
cards;
201501 1 20000930 20210722 19600702 N 1 60 2000 1
201502 1 20000930 20210722 19600702 N 1 60 3000 1
201501 3 20011231 20211002 19990821 Y 2 61 3000 2
201507 5 20040308 99991231 19661230 Y 2 77 5000 3
201508 5 20040308 99991231 19661230 Y 2 77 5000 3
201512 1 20020125 99991231 19990821 Y 1 130 2000 4
;

proc sort ;
  by kind date2 date3 date4 code1 code2 num1 ;
run;

data want;
  set have;
  by kind date2 date3 date4 code1 code2 num1;
  new_num + first.num1;
run;

Result

Obs     date1    kind      date2       date3       date4     code1    code2    num1    num2    tmp_no    new_num

 1     201501      1     20000930    20210722    19600702      N        1        60    2000       1         1
 2     201502      1     20000930    20210722    19600702      N        1        60    3000       1         1
 3     201512      1     20020125    99991231    19990821      Y        1       130    2000       4         2
 4     201501      3     20011231    20211002    19990821      Y        2        61    3000       2         3
 5     201507      5     20040308    99991231    19661230      Y        2        77    5000       3         4
 6     201508      5     20040308    99991231    19661230      Y        2        77    5000       3         4

 

jorheej
Obsidian | Level 7

Hi,

My post has lots of mistakes sorry for that 😅

To clarify what I want to do, I write example codes again.

( I want to attach the file I have but it is not possible for the server system that I'm using now)

 

data have;
  input date1 kind date2 date3 date4 code1 code2 num price;
  cards;
  201501 1 20000930 20210722 19600702 N 1 60 2000 
  201502 1 20000930 20210722 19600702 N 1 60 3000 
  201501 3 20011231 20211002 19990821 Y 2 61 3000 
  201507 5 20040308 99991231 19661230 Y 2 77 5000 
  201508 5 20040308 99991231 19661230 Y 2 77 5000 
  201512 1 20020125 99991231 19990821 Y 1 130 2000 
  ;
run;

data want;
  input date1 kind date2 date3 date4 code1 code2 num price tmp_no;
  cards;
  201501 1 20000930 20210722 19600702 N 1 60 2000 1
  201502 1 20000930 20210722 19600702 N 1 60 3000 1
  201501 3 20011231 20211002 19990821 Y 2 61 3000 2
  201507 5 20040308 99991231 19661230 Y 2 77 5000 3
  201508 5 20040308 99991231 19661230 Y 2 77 5000 3
  201512 1 20020125 99991231 19990821 Y 1 130 2000 4
  ;
run;

It seems your solution is setting the num1 as key value

however it might be problem because I have 3,000 rows more and there could be same amount ..

so I think it needs to be considered as it has 8 key values (kind date2 date3 date4 code1 code2 num),

but I'm not sure if it's possible.

 

Thank you for your reply!

 

jorheej
Obsidian | Level 7
I think I've done wrong with the input data statement.
column 'price' needs to be added between num and tmp_no,
and the tmp_no is what I'm trying to make.

Sorry to confuse.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 827 views
  • 0 likes
  • 4 in conversation