About HB

HB · ‎09-28-2017

Do not edit the text in the code block in the normal editor. Instead place the cursor in the block and click the menu bar icon and edit the code block in the pop-up window. If you edit it directly in the main editor window it loses the special formatting. Add the TRUNCOVER option to your INFILE statement and then you don't need those &'s and :'s in the INPUT statement. Thanks!! Cool!!

HB · ‎09-27-2017

EDIT: I don't know why this answer formatted this way, with no breaks in the SAS code and a scroll on the last piece of text. Given a text file like: Math Exam Student Report Calculus I Fall 2016 Student: Eric ID: 1234567 Date of Birth: 7/1/1990 Grade: 95 Report Date: 12/20/2016 School: 123 Edison District: 123 Edison Description : This is an easy exam Math Exam Student Report Calculus I Fall 2016 Student: Amy ID: 1234519 Date of Birth: 12/18/1990 Grade: 77 Report Date: 12/20/2016 School: 123 Edison District: 123 Edison Description : This is a real tough exam Math Exam Student Report Calculus I Fall 2016 Student: John ID: 1234569 Date of Birth: 3/1/1991 Grade: 90 Report Date: 12/20/2016 School: 123 Edison District: 123 Edison Description : This is an easy exam Math Exam Student Report Calculus I Fall 2016 Student: Ava ID: 1234523 Date of Birth: 9/13/1992 Grade: 89 Report Date: 12/20/2016 School: 123 Edison District: 123 Edison Description : This is a not a hard exam SAS code like data incoming_text; infile 'J:\some location where your file is\unfriendly_text_file.txt'; input exam_type & $20. #2 report_type & $20. #3 subject & $20. #4 mydate & $20. #5 @10 student & $20. #6 @5 id & $20. #7 @16 dob & $20. #8 @8 grade & $20. #9 @14 reportdate & $20. #10 @9 school & $20. #11 @11 district & $20. #12 @15 description & $45 .; run; will give you exam_type report_type subject mydate student id dob grade reportdate school district description Math Exam Student Report Calculus I Fall 2016 Eric 1234567 7/1/1990 95 12/20/2016 123 Edison 123 Edison This is an easy exam Math Exam Student Report Calculus I Fall 2016 Amy 1234519 12/18/1990 77 12/20/2016 123 Edison 123 Edison This is a real tough exam Math Exam Student Report Calculus I Fall 2016 John 1234569 3/1/1991 90 12/20/2016 123 Edison 123 Edison This is an easy exam Math Exam Student Report Calculus I Fall 2016 Ava 1234523 9/13/1992 89 12/20/2016 123 Edison 123 Edison This is a not a hard exam You can mess around with the read instructions and read only the lines you want or just read everything and drop the variables you don't want. You can also read things properly as dates and numbers and not generic character variables of length 20 as I have. Maybe like #6 @5 id & 7. #7 @16 dob:mmddyy10.

HB · ‎09-27-2017

@Reeza That's an excellent reference. Saving that in case i need it! @@@Wei2017 That text looks a lot like the structure of an XML document without the tags. If whereever you are getting that text dump from could give you that as XML instead, it might read easier. <exam_record> <examtype>Math</examtype> <reporttype>Student</reporttype> <subject>Calculus I</subject> <date>Fall 2016</date> <studentname>Eric</studentname> <studentid>1234567</studentid> <dob>7/1/1990</dob> <grade>95</grade> <reportdate>12/20/2016</reportdate> <school>123 Edison</school> <district>123 Edison</district> <description>This is an easy exam</description> </exam_record>

HB · ‎09-27-2017

Keep in mind - garbage in, garbage out. I think if project one has two completed status entries, for example, it blows up. (You might be able to look for max code and max date to try to fix that, I don't know). It will also blow up if you are looking for Completed and the status is completed.

HB · ‎09-27-2017

I agree with @LinusH that if you can do a query in Oracle and pass the result to SAS (or however that is done) that sounds like a better path. Having said that, I present this cheat of a workaround: *edited because I messed up a capitalization* * get in some data; Data project_status; input id:$1. status:$13. mydate:ddmmyy10.; datalines; 1 completed 090916 1 null 080916 1 Inprogress 070716 1 NotCompleted 070716 2 InProgress 090916 2 completed 090916 2 null 090916 2 n/a 050816 2 Failed 100616 3 inprogress 090616 3 started 080616 3 null 070616 4 failed 101017 ; *make a cheaty code table; Data project_codes; input status_code:$1. project_status:$13.; datalines; 7 completed 4 inprogress 5 NotCompleted 1 null 2 n/a 6 Failed 3 started ; *figure out where projects are by slecting the max code; proc sql; create table last_status_code as select c.id, max(d.status_code) as last_status from project_status c inner join project_codes d on c.status = d.project_status where c.mydate < 090916 group by c.id ; quit; *hook them back up; proc sql; create table last_status_words as select a.id, b.project_status from last_status_code a inner join project_codes b on a.last_status = b.status_code ; quit; *lay it out; proc sql; create table project_standing as select a.id, a.status, a.mydate from project_status a inner join last_status_words b on a.id = b.id and a.status = b.project_status ; quit; proc print noobs; format mydate ddmmyy10.; run; This yields The SAS System id status mydate 1 completed 09/09/2016 2 completed 09/09/2016 3 inprogress 09/06/2016 My apologies.

HB · ‎09-26-2017

Throw out a small sample data set, the results you want (even if just a description), and what you have tried.

HB · ‎09-26-2017

I probably don't understand what you want, but if you already have your observatiosn in clusters (genotype group, matched control, etc) isn't that the point you would move first to basic descriptive statistics of the groups and then perhaps to more involved statistical procedures depending on the measures you have? Again, I probably don't fully understand, but I would think at some point you would just grind on basic T-tests or analysis of variance.

HB · ‎09-20-2017

That is very slick.

HB · ‎09-20-2017

Use @Astounding's or @novinosrin''s solution because they are better, but I had to go for a pure SQL solution because that is what I do. data test_scores; input @1 SUBJECT $1. @3 TEST $7. @10 YN $3.; datalines; 1 Test1 Yes 1 Test2 Yes 1 Test2A Yes 1 Test3 Yes 2 Test1 Yes 2 Test2 Yes 2 Test2A No 2 Test3 Yes 2 Test4 Yes 3 Test1 Yes 3 Test2 Yes 3 Test2A Yes 3 Test3 Yes ; run; *we can identify the problem cases like this; proc sql; create table problems as select a.*, 'Yes' as Discrepancy from ( select subject, test, yn from test_scores where test = 'Test2' and YN = 'Yes' union select subject, test, yn from test_scores where test = 'Test2A' and YN = 'Yes' ) as a group by subject having count(subject) > 1; quit; *and mark them in the whole dataset like this; proc sql; create table all_with_marked_cases as select a.subject, a.test, a.yn, b.discrepancy from test_scores a left join problems b on a.subject = b.subject and a.test = b.test and a.yn = b.yn; quit; Gives us The SAS System SUBJECT TEST YN Discrepancy 1 Test1 Yes 1 Test2 Yes Yes 1 Test2A Yes Yes 1 Test3 Yes 2 Test1 Yes 2 Test2 Yes 2 Test2A No 2 Test3 Yes 2 Test4 Yes 3 Test1 Yes 3 Test2 Yes Yes 3 Test2A Yes Yes 3 Test3 Yes You may now return to your regularly scheduled programming.

HB · ‎09-19-2017

Is Gabriel José de la Concordia García Márquez NM in your dataset? Or Casey's General Store CO?

HB · ‎09-18-2017

I followed Reeza's suggestion and did it this way: data pregnancy_data; input visit_id study_id delivery_visit pregnancynum_rank gestational_diabetes; datalines; 1 1 1 1 1 2 1 0 1 1 3 1 1 2 1 4 2 0 5 1 5 2 1 5 1 6 3 0 2 1 7 3 1 2 1 8 3 1 3 1 9 4 0 3 1 10 4 1 3 1 11 4 0 6 1 12 4 1 6 1 ; run; proc sort data = pregnancy_data; by study_id pregnancynum_rank; run; *grab the first pregnancy with gestational diabetes for each mom in the dataset; data first_with_gest_diab; set pregnancy_data; by study_id pregnancynum_rank; if first.study_id and first.pregnancynum_rank and gestational_diabetes = 1; run; *go back and get all the records associated with those pregnancies; data first_with_gest_diab_all; proc sql; create table first_with_gest_diab_all as select pregnancy_data.* from pregnancy_data inner join first_with_gest_diab on pregnancy_data.study_id = first_with_gest_diab.study_id and pregnancy_data.pregnancynum_rank = first_with_gest_diab.pregnancynum_rank order by visit_id, study_id, pregnancynum_rank; quit; That gave me The SAS System visit_id study_id delivery_visit pregnancynum_rank gestational_diabetes 1 1 1 1 1 2 1 0 1 1 4 2 0 5 1 5 2 1 5 1 6 3 0 2 1 7 3 1 2 1 9 4 0 3 1 10 4 1 3 1 Which is the desired result i think.

HB · ‎09-15-2017

People are hesitant to open unknown attachements these days and your attachment won't open for me in any event. Perhaps you could post some sample data and the desired result from that data?

HB · ‎09-15-2017

Just to be sure I understand this correctly, For each cow that has multiple calvingdates, you want the difference between the first and the latest calvingdate? Good question. Or is it that you want to know that cow 4 went about 14 months between birth 1 and 2 (twins?), about 8 months between 2/3 and 4, and about 15 months between 4 and 5 (which is what Kurt's solution does)? 4 02/08/2008 4 13/10/2009 4 13/10/2009 4 21/06/2011 4 17/09/2012

HB · ‎09-15-2017

I agree about provide more data but i will assume data and proceed. I assume you have a terrible data structure that is making the task much more complicated than it needs to be, I did this: data payments; input @1 parentid @5 account_name $12. @17 date MMDDYY10. @28 amount 1.; cards; 101 mcdonalds 01/01/2017 1 103 burger king 01/01/2017 1 105 dominos 01/01/2017 1 101 mcdonalds 02/01/2017 1 103 burger king 02/01/2017 1 105 dominos 02/01/2017 . 101 mcdonalds 08/01/2017 1 103 burger king 08/01/2017 1 105 dominos 08/01/2017 1 101 mcdonalds 11/01/2017 1 103 burger king 11/01/2017 1 105 dominos 11/01/2017 1 101 mcdonalds 12/01/2017 0 103 burger king 12/01/2017 0 105 dominos 12/01/2017 0 ; run; data triggers; input parentid trigger MMDDYY10.; cards; 101 01/01/2017 103 02/01/2017 105 08/01/2017 ; run; * we can see that the sum of AMOUNT for 101 from trigger date 1/1/2017 to 12/1/2017 is 4; * we can see that the sum of AMOUNT for 103 from trigger date 2/1/2017 to 12/1/2017 is 3; * we can see that the sum of AMOUNT for 105 from trigger date 8/1/2017 to 12/1/2017 is 2; proc sql; create table payment_sums as select payments.parentid, sum(payments.amount) as payment_sum from payments inner join triggers on payments.parentid = triggers.parentid where payments.date between triggers.trigger and 21154 group by payments.parentid order by payments.parentid; quit; Resulting in The SAS System parentid payment_sum 101 4 103 3 105 2 Perhaps that isn't what you want. Perhaps it is. I'm not real good at SAS dates and someone else will be able to tell you how to do it better than 21154 if in fact this is what you want.

HB · ‎09-12-2017

To recap all of the above (because it is a good learning experience for me) this code: data birthdates; input dob date9.; datalines; 01jan1962 26oct1973 17aug2000 05may1999 ; run; data age_at_years; set birthdates; array age(2006:2015) age06-age15; do year = 2006 to 2015; age(year) = floor((intck('month',dob,mdy(7, 1, year))-(1<day(dob)))/12); end; drop year; run; produces a table that looks like: dob age06 age07 age08 age09 age10 age11 age12 age13 age14 age15 731 44 45 46 47 48 49 50 51 52 53 5047 32 33 34 35 36 37 38 39 40 41 14839 5 6 7 8 9 10 11 12 13 14 14369 7 8 9 10 11 12 13 14 15 16 Cool. Edit: 1. Standard admonition about storing calculated values. 2. @ballardw I think I learn from your every post. +1

Online Status	Offline
Date Last Visited	‎03-08-2024 08:34 PM

Re: Missing column from CAT table or report

Re: Need to separate id from 3,000 plus record to do simple analysis p...

Re: Please help with SAS code

Re: Select single observation based on two unique dates

Re: Select single observation based on two unique dates

Re: Match two datasets on a partial match

Re: Find repeating users between set weeks

Re: Delete rows with condition

Re: Need help to create table which includes API call

Re: Removing duplicate matched events

Re: Match two datasets on a partial match

Re: Find repeating users between set weeks

Re: How to read in a variable contains "±" into sas

Re: Standard Phone numbers

Re: Iteratively select column values into individual variables

Re: Rearranging a Character Strings Into alphabetical order

Re: Rearranging a Character Strings Into alphabetical order

Re: Find repeating users between set weeks

Re: Match two datasets on a partial match

Re: PROC SQL SELF JOIN remove duplicates

Re: Extract data from a unfriendly text file

Re: Extract data from a unfriendly text file

Re: Extract data from a unfriendly text file

Re: Help in Rank function -SQL

Re: Help in Rank function -SQL

Re: Calculation

Re: Case, control cluster Analysis

Re: Scanning rows in search of specific values

Re: How to identify a discrepancy from multiple records of a single su...

Re: FULLNAME to FIRST NAME; LAST NAME (via Contact Name or Business Co...

Re: Output all observations within a certain group

Re: Calculated field based on row values within class variable

Re: Calculating calving intervals with dates in columns

Re: Sum the rows based on variable values

Re: Do loop + macro

SAS Analytics Explorers