BookmarkSubscribeRSS Feed
msrenee1984
Obsidian | Level 7

Hello, I am trying to run a program to only pull student data from only certain (Codes) and certain (dates) and I'm unsure how-to code exactly.  I have about 150 (codes) to search through and I'm only interested in dates July 1, 2025 -December 31, 2025.  I attempted using substrate function however unsure if this is even correct.  The record is made up of the Date the letter P and the 6-digit course code.

PROC SQL;
CREATE TABLE R. comment_data AS SELECT
A.studentID,
B.COMMENT_TYPE /*type= T10 */
B.recordkey (The format is Date-"P"- 6digitcode;EXAMPLE ='20250709PA01001'...)
FROM R.list A left join DATA.BASE_12 B
ON A.STUDENTID=B.STUDENTID
WHERE B.CITYPCMT='T10'
AND SUBSTR (B.CIRCDKEY, 1, 1) IN ('A','B') AND
INPUT(SUBSTR(B.CIRCDKEY, 15,9), DATE9.) BETWEEN '07JUL2025 'D AND '31DEC2025'D ;
FORMAT Date_Part DATE9.
;QUIT;

the course codes I'm interested in:

('A01001',
'A01002',
'A03001',
'A03004',
'A05002',
'A05003',
'A05004',
'A05007',
'A05010',
'A05101',
'A05102',
'A05103',
'A05104',
'A05203',
'A07007',
'A51001',
'A51002',
'A53001',
'A55014',
'A55015',
'A55017',
'A55021',
'A57001',
'A58001',
'A58006',
'A58007',
'A58009',
'A58011',
'B01003',
'B01004',
'B01006',
'B01007',
'B01008',
'B01009',
'B01010',
'B01012',
'B01013',
'B01014',
'B01015',
'B02001',
'B02004',
'B02005',
'B02006',
'B03007',
'B04001',
'B04006',
'B04007',
'B04008',
'B05005',
'B05006',
'B05007',
'B05008',
'B05009',
'B05012',
'B05023',
'B05024',
'B05025',
'B05027',
'B05028',
'B05029',
'B05030',
'B05032',
'B06009',
'B09009',
'B09011',
'B09014',
'B09019',
'B09031',
'B10001',
'B10009',
'B10010',
'B10011',
'B10012',
'B10013',
'B10014',
'B10015',
'B10017',
'B10020',
'B12009',
'B12017',
'B12020',
'B13018',
'B16004',
'B19001',
'B21002',
'B23002',
'B26001',
'B26002',
'B26003',
'B26004',
'B51001',
'B51002',
'B51003',
'B51004',
'B51005',
'B51006',
'B51007',
'B52005',
'B54003',
'B54005',
'B55001',
'B55002',
'B55003',
'B55004',
'B55005',
'B55006',
'B55008',
'B55009',
'B55010',
'B55012',
'B55509',
'B56001',
'B56002',
'B56003',
'B56004',
'B59005',
'B59006',
'B59007',
'B59008',
'B59009',
'B59010',
'B59012',
'B60003',
'B60004',
'B60006',
'B60007',
'B60008',
'B62001',
'B62003',
'B62005',
'B62006',
'B62007',
'B62008',
'B62009',
'B62010',
'B62011',
'B62012',
'B62013',
'B62014',
'B62015',
'B62016',
'B62017',
'B62018',
'B62021',
'B62024',
'B62025',
'B62030',
'B62033',
'B62034',
'B62035',
'B62036',
'B62037',
'B62038',
'B62039',
'B62040',
'B62041',
'B62043',
'B62044',
'B62045',
'B62047',
'B62048',
'B62049',
'B62050',
'B63002',
'B63003',
'B64001')

8 REPLIES 8
Kathryn_SAS
SAS Employee

If recordkey is in this format as you show, 

B.recordkey /*(The format is Date-"P"- 6digitcode;EXAMPLE ='20250709PA01001'...)*/

 

then I would suggest the following:

PROC SQL;
CREATE TABLE r.comment_data AS SELECT
A.studentID,
B.COMMENT_TYPE, /*type= T10 */
B.recordkey /*(The format is Date-"P"- 6digitcode;EXAMPLE ='20250709PA01001'...)*/,
INPUT(SUBSTR(B.recordKEY, 1,8), yymmdd8.) as date_part format=date9.
FROM r.list A left join data.BASE_12 B
ON A.STUDENTID=B.STUDENTID
WHERE B.CITYPCMT='T10'
AND SUBSTR (B.CIRCDKEY, 1, 1) IN ('A','B') AND
calculated date_part BETWEEN '07JUL2025'D AND '31DEC2025'D 
and substr(b.recordkey,10,6) in ('A01001',
'A01002',
'A03001',
'A03004',
'A05002'/* rest of codes here */);
QUIT;

If this does not work for you, share some actual data and clarify what you are expecting.

msrenee1984
Obsidian | Level 7

Thanks Kathryn (and others), I tried using your logic however my system kept hanging and never returned any results I had to shut down and reopen the session multiple times.  I tried to remove the date9 and just use the dates as is and use record >='20250101' and recordkey <= '20251231' ...that still didn't work.   Eventually many hours later I tried to break the logic up running the SQL then 2 additional Data steps and joined back to my original "List" dataset.  Although this seemed to work, I'd really like to know how this could be accomplished all within the SQL statement.  Thank you everyone for your assistance.

What I did:

PROC SQL;
CREATE TABLE R.COMMENT AS SELECT

studentID,
Comment_type,
Recordkey
FROM data.base
WHERE comment_type='T10';
RUN;
DATA R.DATE; SET R.COMMENT;
YEAR=SUBSTR(recordkey,1,4);
MONTH=SUBSTR(recordkey,5,2);
DAY=SUBSTR(recordkey,7,2);
RUN;
DATA R.DATE1; SET R.DATE;
IF YEAR='2025' THEN OUTPUT;
IF MONTH GE '07' THEN OUTPUT;
IF DAY GE '01' THEN OUTPUT;
IF YEAR='2026' THEN DELETE;
RUN;

PROC SQL;

A.*,

B.*

from R.Date1 A left join R.list B

on A.studentID=B.studentID

;quit;

Tom
Super User Tom
Super User

How large are the two datasets?  Do they have indexes?  Trying to do full joins of large datasets could overwhelm the storage on your computer.

 

One of your new data steps looks very strange.

DATA R.DATE1; SET R.DATE;
IF YEAR='2025' THEN OUTPUT;
IF MONTH GE '07' THEN OUTPUT;
IF DAY GE '01' THEN OUTPUT;
IF YEAR='2026' THEN DELETE;
RUN;

First, are YEAR, MONTH and DAY really character variables?   If so make sure that the MONTH and DAY variables always have the leading zeros. Otherwise values like '2' will be treated as greater than '07'.

 

Second let's look at what the logic is doing.

ANY observation with YEAR='2025' are written.

Then ANY observation with MONTH larger than '07' are written no matter what value of YEAR they have.  So the observations that also have year='2025' are written twice.

Similarly with the DAY checking.

And the DELETE statement is doing nothing because it is the last statement in the step and the explicit OUTPUT statement will have suppressed the normal implied OUTPUT that occurs at the end of a data step iteration.

 

So basically any observations that has a valid DAY will be written out.  Those that also have MONTH between 07 an 12 will be written again.  And those that are from 2025 will be written at least once whatever value they have for MONTH or DAY.  And depending on the value of MONTH and DAY will be written another one or two times.

 

What did you actually want that step to do?

 

msrenee1984
Obsidian | Level 7

Examples of the recordkey(aka circdkey)

recordkey
20070312JA01034
20070711JA01034
20070711JA01034
20070716JA01034
20070716JA01034
20070719JA01002
20070719JA01002
20070808JA01002
20070808JA01002
20070917JA01034
20070917JA01034
20071019JA01033
20071019JA01033
20071126PE52003
20071126PE52003
20090430JD11010
20090430JD11010
20090430JD11010
20090706JA01034
20090706JA01034
20090706JA01034
20090706JA01034
20090811JD11028
20090811JD11028

thanks Kurt!

Kurt_Bremser
Super User

Then it is obvious why this

INPUT(SUBSTR(B.CIRCDKEY, 15,9), DATE9.)

cannot work. One, your date is stored in the first 8 characters (and not starting at position 15), and two in a YMD format. Your function call must therefore be

input(substr(b.circdkey,1,8),yymmdd8.)
Tom
Super User Tom
Super User

Since as your comments show

B.recordkey (The format is Date-"P"- 6digitcode;EXAMPLE ='20250709PA01001'...)

the values of RECORDKEY start with digits in the style YYYYMMDD you can impose a date range by using simple inequalities.

where B.recordkey >= '20250701'
  and B.recordkey <= '20251231'

That is one the advantages of using YMD order when storing dates as strings.  When sorted lexicographically they are also sorted chronlogically.

 

The other advantage is there is no confusion about which digits represent the month of the year and which digits represent the day of the month.

msrenee1984
Obsidian | Level 7

 I deleted 

INPUT(SUBSTR(B.CIRCDKEY, 1,8), yymmdd8.) as date_part format=date9. and used the date format as is...

PROC SQL;
CREATE TABLE R.comment AS SELECT
A.*,
B.Commenttype,/*type code T10 */
B.text,
B.CIRCDKEY,
FROM R.List A left join data.base B

ON A.studentID=B.studentID
WHERE B.Commentype='T10'
AND B.CIRCDKEY >= '20250701'
AND B.CIRCDKEY <= '20251231'
; QUIT;................is this what you mean?

Catch up on SAS Innovate 2026

Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.

Explore Now →
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 870 views
  • 1 like
  • 4 in conversation