Can I flip the order of text with a Windows program?

W

Wolf K

Usually better is 03/01/1966.





Or on the third of January, depending on what country you are in.
Which is the secondary reason that 1966 01 03 is the SI recommended
format. I will "leave it to the interested student" to figure out the
main reason. ;-)

Best,
Wolf K.
 
J

J. P. Gilliver (John)

In message <[email protected]>, Wolf K
This is actually very simple to do in any word-processor or text editor
that Sorts by paragraph or line. Wordpad and Notepad won't do this. Any
word-processor will do so, and many text editors will, too.

Note that the word-processor doesn't recognise dates as such. A date is
just another string of characters, made up of numerals and slashes
[]
The fact that the sample lines all start with a date has confused what
the OP originally asked for, which was to reverse the order of lines in
a text file. This interesting intellectual exercise has been distracted
by lots of attention to sorting (which wouldn't work with the variable
format dates in the examples given, without a fairly sophisticated date
parser).
 
M

Metspitzer

If you haven't got Microsoft Office, I'm afraid you'll have to. Or else
buy Office. Wordpad and Notepad just won't do what you want. They don't
Sort anything. I just tested Wordpad, to make sure. And Wordpad is
supposedly and improvement over Notepad!

If you have MS Office on your computer, use Word. It's unfortunate, but
a computer doesn't come with much usable software.

See my other post for step-by-step recipe for how to do what you want.

HTH,
Wolf K.
Thanks
 
J

J. P. Gilliver (John)

Wolf K said:
The situation improves slightly, if you changed 3/1966 to 3/1/1966

Usually better is 03/01/1966. []
Or on the third of January, depending on what country you are in.
Which is the secondary reason that 1966 01 03 is the SI recommended
format. I will "leave it to the interested student" to figure out the
main reason. ;-)
[]
The one _I_ usually give is that it is sufficiently unusual
([four-digit] year first) as to make people stop and take notice, for my
British persuadees. (For my American persuadees, it allows them to
have/keep the month/day order they so love, but in something that is
more logical.)
 
S

Steve Hayes

However: I don't think the sort you describe will do what the OP wanted
- basically reverse the order of lines in a file, not sort in any sort
of order.
I thought he wanted them in descending order instead of ascending, which Word
can do, though I doubt it would be able to do it with dates in different
formats. Inmagic might, but I don't know if it is still available. And by the
time you had prepared the text enough for import into Inmagic, you could just
as easily have modified the date formats by hand anyway.
 
P

Paul

Here is a gawk script, to convert your file. I haven't a clue what
I'm doing, but I finally got some output (I had lotsa trouble getting
the sort to work). Do not include the lines with the asterisks lines when
copying into your file metzsort.txt.

************************ Begin file "metzsort.txt" **********************
#
# Dependencies: gawk version 3.1.2 or later from gnuwin32. (Currently 3.1.6)
#
# http://gnuwin32.sourceforge.net/packages/gawk.htm
#
# Syntax: gawk -f metzsort.txt ascending input.txt > output.txt
# gawk -f metzsort.txt descending input.txt > output.txt
# ^ ^ ^
# | | |
# ARGV[0] ARGV[1] ARGV[2]
#
# gawk -f metzsort.txt input.txt > output.txt <--- will be ascending
# ^ ^
# | |
# ARGV[0] ARGV[1]
#
# (Sorts based on just the date. Input sample comes next)
#
# 9/1975 Broken right hand
# 6/22/1965 Broken left Collarbone
# 3/1966 Broken jaw
#
# Convert internally into two arrays, one array "date", carrying only the sortable fields
#
# <- date -> <--------------- line ---------------->
# 06_22_1965 6/22/1965 Broken left Collarbone
# 03_00_1966 3/1966 Broken jaw
# 09_00_1975 9/1975 Broken right hand
#
# Then, sort the date array, use the indices to print out the line array
#

BEGIN { # this clause runs, before the program eats any data...

count = 1

descending = 0

if (ARGV[1] == "descending") {
delete ARGV[1]
descending = 1
}

if (ARGV[1] == "ascending") {
delete ARGV[1]
}

# Your input file name cannot be "ascending" or "descending" !!!
}

{
line[ count ] = $0
numfields = split( $1, dateparts, "/" )
switch (numfields) {
case 2:
tempdate = sprintf("%02d_00_%04d", dateparts[1]+0, dateparts[2]+0)
date[ tempdate ] = count
break
case 3:
tempdate = sprintf("%02d_%02d_%04d", dateparts[1]+0, dateparts[2]+0, dateparts[3]+0)
date[ tempdate ] = count
break
default:
print "Unexpected date field at line " NR > "/dev/stderr"
exit
}
count++
}

END {
j = 1
# the count variable, is now "one past the end"
asorti(date,datesort)
# I had to copy the line data into another array, to support ascending/descending
for ( i in datesort ) {
line2[ j ] = line [ date[ datesort ] ]
j++
}

if ( descending == 0 ) {
for( j=1; j<count; j++) { # then let's print ascending
print line2[ j ]
}
} else { # descending is 1
for( j=count-1; j>=1; j-- ) { # then let's print descending
print line2[ j ]
}
}
}
************************ End file "metzsort.txt" **********************

This shows an example of using the script.

http://img851.imageshack.us/img851/4355/usage.gif

Redirect the output, by adding redirection to the end of the line, like this.
Then it won't "spill onto the screen".

gawk -f metzsort.txt descending metz.txt > output.txt

Doing it like this, is not recommended, for safety reasons.
Don't try to overwrite the source file, on the fly.

gawk -f metzsort.txt descending metz.txt > metz.txt

I suppose I could have added more safeguards to the program, but
this is a quick and dirty effort.

HTH,
Paul
 
M

Metspitzer

Here is a gawk script, to convert your file. I haven't a clue what
I'm doing, but I finally got some output (I had lotsa trouble getting
the sort to work). Do not include the lines with the asterisks lines when
copying into your file metzsort.txt.

************************ Begin file "metzsort.txt" **********************
#
# Dependencies: gawk version 3.1.2 or later from gnuwin32. (Currently 3.1.6)
#
# http://gnuwin32.sourceforge.net/packages/gawk.htm
#
# Syntax: gawk -f metzsort.txt ascending input.txt > output.txt
# gawk -f metzsort.txt descending input.txt > output.txt
# ^ ^ ^
# | | |
# ARGV[0] ARGV[1] ARGV[2]
#
# gawk -f metzsort.txt input.txt > output.txt <--- will be ascending
# ^ ^
# | |
# ARGV[0] ARGV[1]
#
# (Sorts based on just the date. Input sample comes next)
#
# 9/1975 Broken right hand
# 6/22/1965 Broken left Collarbone
# 3/1966 Broken jaw
#
# Convert internally into two arrays, one array "date", carrying only the sortable fields
#
# <- date -> <--------------- line ---------------->
# 06_22_1965 6/22/1965 Broken left Collarbone
# 03_00_1966 3/1966 Broken jaw
# 09_00_1975 9/1975 Broken right hand
#
# Then, sort the date array, use the indices to print out the line array
#

BEGIN { # this clause runs, before the program eats any data...

count = 1

descending = 0

if (ARGV[1] == "descending") {
delete ARGV[1]
descending = 1
}

if (ARGV[1] == "ascending") {
delete ARGV[1]
}

# Your input file name cannot be "ascending" or "descending" !!!
}

{
line[ count ] = $0
numfields = split( $1, dateparts, "/" )
switch (numfields) {
case 2:
tempdate = sprintf("%02d_00_%04d", dateparts[1]+0, dateparts[2]+0)
date[ tempdate ] = count
break
case 3:
tempdate = sprintf("%02d_%02d_%04d", dateparts[1]+0, dateparts[2]+0, dateparts[3]+0)
date[ tempdate ] = count
break
default:
print "Unexpected date field at line " NR > "/dev/stderr"
exit
}
count++
}

END {
j = 1
# the count variable, is now "one past the end"
asorti(date,datesort)
# I had to copy the line data into another array, to support ascending/descending
for ( i in datesort ) {
line2[ j ] = line [ date[ datesort ] ]
j++
}

if ( descending == 0 ) {
for( j=1; j<count; j++) { # then let's print ascending
print line2[ j ]
}
} else { # descending is 1
for( j=count-1; j>=1; j-- ) { # then let's print descending
print line2[ j ]
}
}
}
************************ End file "metzsort.txt" **********************

This shows an example of using the script.

http://img851.imageshack.us/img851/4355/usage.gif

Redirect the output, by adding redirection to the end of the line, like this.
Then it won't "spill onto the screen".

gawk -f metzsort.txt descending metz.txt > output.txt

Doing it like this, is not recommended, for safety reasons.
Don't try to overwrite the source file, on the fly.

gawk -f metzsort.txt descending metz.txt > metz.txt

I suppose I could have added more safeguards to the program, but
this is a quick and dirty effort.

HTH,
Paul


I will give that a shot.......in the morning.
That looks like it must have taken a lot of time.
Thanks.

Doing something like that would probably take me a long time. ok
never.
 
P

Paul

I will give that a shot.......in the morning.
That looks like it must have taken a lot of time.
Thanks.

Doing something like that would probably take me a long time. ok
never.
OK, one more time.

************************ Begin file "metzsort.txt" **********************
# Quelle malheur!
# I had to give up on my original concept, but this will still work for you.
#
# Dependencies: gawk version 3.1.2 or later from gnuwin32. (Currently 3.1.6)
#
# http://gnuwin32.sourceforge.net/packages/gawk.htm
#
# Syntax: gawk -f metzsort.txt ascending input.txt > output.txt
# gawk -f metzsort.txt descending input.txt > output.txt
# ^ ^ ^
# | | |
# ARGV[0] ARGV[1] ARGV[2]
#
# gawk -f metzsort.txt input.txt > output.txt <--- will be ascending
# ^ ^
# | |
# ARGV[0] ARGV[1]
#
# (Sorts based on date and the words. Input sample comes next. Some test data.)
#
# 9/1975 Broken right hand
# 6/22/1965 Broken left Collarbone
# 3/1966 Broken jaw
#
# Need to convert the date field into year_month_day, and append it to the left
# of the user input line.
#
# <- date -> <--------------- line ---------------->
# 1975_09_00 Broken right hand
# 1965_06_22 Broken left Collarbone
# 1966_03_00 Broken jaw
#
# Then, sort the date array, use the indices to print out the result
#

BEGIN { # this clause runs, before the program eats any data...

count = 1

descending = 0

if (ARGV[1] == "descending") {
delete ARGV[1]
descending = 1
}

if (ARGV[1] == "ascending") {
delete ARGV[1]
}

# Note: Your input file name cannot be "ascending" or "descending" !!!
}

{
numfields = split( $1, dateparts, "/" )
switch (numfields) {
case 2:
tempdate = sprintf("%04d_%02d_00", dateparts[2]+0, dateparts[1]+0)
break
case 3:
tempdate = sprintf("%04d_%02d_%02d", dateparts[3]+0, dateparts[1]+0, dateparts[2]+0)
break
default:
print "Unexpected date field at line " NR > "/dev/stderr"
exit
}

for (j=2; j<=NF; j++) { # 1975_09_00Brokenrighthand
tempdate = tempdate $j
}
date[ tempdate ] = $0 # Associative array holds the original user lines
# Two identical lines, one will be a loser and disappear!
count++
}

END {
# the count variable, is now "one past the end"

# This built-in function, sorts by the index field, which is "tempdate"
asorti(date,datesort)

if ( descending == 0 ) {
for( j=1; j<count; j++) { # then let's print ascending
print date[ datesort[ j ]]
}
} else { # descending is 1
for( j=count-1; j>=1; j-- ) { # then let's print descending
print date[ datesort[ j ]]
}
}
}
************************ End file "metzsort.txt" **********************

These things happen.

Here's another test run picture. I added a couple more lines to my test file.
It looks like it's sorting now.

http://img406.imageshack.us/img406/8938/usage2.gif

Paul
 
M

Metspitzer

I will give that a shot.......in the morning.
That looks like it must have taken a lot of time.
Thanks.

Doing something like that would probably take me a long time. ok
never.
OK, one more time.

************************ Begin file "metzsort.txt" **********************
# Quelle malheur!
# I had to give up on my original concept, but this will still work for you.
#
# Dependencies: gawk version 3.1.2 or later from gnuwin32. (Currently 3.1.6)
#
# http://gnuwin32.sourceforge.net/packages/gawk.htm
#
# Syntax: gawk -f metzsort.txt ascending input.txt > output.txt
# gawk -f metzsort.txt descending input.txt > output.txt
# ^ ^ ^
# | | |
# ARGV[0] ARGV[1] ARGV[2]
#
# gawk -f metzsort.txt input.txt > output.txt <--- will be ascending
# ^ ^
# | |
# ARGV[0] ARGV[1]
#
# (Sorts based on date and the words. Input sample comes next. Some test data.)
#
# 9/1975 Broken right hand
# 6/22/1965 Broken left Collarbone
# 3/1966 Broken jaw
#
# Need to convert the date field into year_month_day, and append it to the left
# of the user input line.
#
# <- date -> <--------------- line ---------------->
# 1975_09_00 Broken right hand
# 1965_06_22 Broken left Collarbone
# 1966_03_00 Broken jaw
#
# Then, sort the date array, use the indices to print out the result
#

BEGIN { # this clause runs, before the program eats any data...

count = 1

descending = 0

if (ARGV[1] == "descending") {
delete ARGV[1]
descending = 1
}

if (ARGV[1] == "ascending") {
delete ARGV[1]
}

# Note: Your input file name cannot be "ascending" or "descending" !!!
}

{
numfields = split( $1, dateparts, "/" )
switch (numfields) {
case 2:
tempdate = sprintf("%04d_%02d_00", dateparts[2]+0, dateparts[1]+0)
break
case 3:
tempdate = sprintf("%04d_%02d_%02d", dateparts[3]+0, dateparts[1]+0, dateparts[2]+0)
break
default:
print "Unexpected date field at line " NR > "/dev/stderr"
exit
}

for (j=2; j<=NF; j++) { # 1975_09_00Brokenrighthand
tempdate = tempdate $j
}
date[ tempdate ] = $0 # Associative array holds the original user lines
# Two identical lines, one will be a loser and disappear!
count++
}

END {
# the count variable, is now "one past the end"

# This built-in function, sorts by the index field, which is "tempdate"
asorti(date,datesort)

if ( descending == 0 ) {
for( j=1; j<count; j++) { # then let's print ascending
print date[ datesort[ j ]]
}
} else { # descending is 1
for( j=count-1; j>=1; j-- ) { # then let's print descending
print date[ datesort[ j ]]
}
}
}
************************ End file "metzsort.txt" **********************

These things happen.

Here's another test run picture. I added a couple more lines to my test file.
It looks like it's sorting now.

http://img406.imageshack.us/img406/8938/usage2.gif

Paul
Here is the original file. Give it a shot. I think it would have
been easier for me to just flip the lines one at a time than for you
to go through this. I wanted the easy way out. You have done more
work than I.

Thanks

http://www.filedropper.com/health
 
S

Seth

Metspitzer said:
Here is the original file. Give it a shot. I think it would have
been easier for me to just flip the lines one at a time than for you
to go through this. I wanted the easy way out. You have done more
work than I.

Thanks

http://www.filedropper.com/health
Took me about 30 seconds of work to do it with a spreadsheet...

- opened text file
- CTRL-A (to select all)
- CTRL-C (to copy to clipboard)
- Opened spreadsheet app (in my case, Excel 2010)
- CTRL-V (to paste)
- right clicked on "A" column header and selected "Insert" (now pasted text
is in column B)
- Put a numeric "1" in cell A1
- Put the formula "@sum(a1+1)" in cell A2
- CTRL-C on A2
- highlighted A3 all the way down to end of your text
- CTRL-V to paste formula (spreadsheet app automatically updated A1 to A2 to
A3, etc... as needed all the way down)
- Went to DATA options
- Selected sort, sorted on column A in descending order
- right clicked on column A header and deleted
- CTRL-A to select data
- pasted it below

(flipped file)
3/22/12 percutaneous cholangiogram and liver biopsy (no bile block)
3/2012 Several bouts with gout. Left foot and Right second toe
09/2011 Pneumonia Shot
09/2010 Pneumonia Shot
08/2010 Liver biopsy ruled out blocked bile duct
07/2010 Eyes checked
07/2010 MRI for bile duct
06/2010 Sugar doctor
05/14 Removed Biliary Catheter
05/07 Went into hospital for high sugar 517
04/09 Biliary Catheter Changed
04/01 Biliary Catheter leaked and infected
03/29 Biliary Catheter leaked
03/22 Biliary Catheter unable to flush fully
03/2010 Went in for bile procedure passed out , kept overnight and given
EKG
03/2010 MRI for itching Blocked Bile duct
10/2009 Went in for labs and passed out
10/2009 MRI
09/2009 Pneumonia Shot
07/2009 Gout
05/2009 Liver Biopsy
03/2009 Bone Marrow Biopsy
03/2009 Colonoscopy
02/2009 Gout
01/2009 Blood disorder, low white blood count 0.7, low platelettes ITP
10/2007 Cured Bacteria growth
08/2007 Cystoscopsy
02/2007 Prostrate Biopsy
12/2006 MRI
11/2005 Cured Lupus
02/2006 Stopped Taking drug for Bacteria growth 1st time
06/2005 Chemo for Lupus
05/2005 Chemo for Lupus
05/2005 Biopsy for bacteria growth
05/2005 Kidney Damage due to Lupus
02/2005 Biopsy ? Bone
01/2005 Developed Drug induced Lupus
01/2005 Blood disorder thought to be ITP Low platelettes
12/2004 Biopsy ? Liver
10/2004 Ended Pegasus Cleared HepC
10/2003 Started Taking Pegasus for HepC
10/2003 Ended Intron A
11/2002 Started HepC treatment
06/2002 CMV
05/2002 Liver Transplant
05/2000 Disable
1995 Broken Jaw
1979 Broken Jaw
1975 Broken right hand
1970 Broken wrist
1966 Broken wrist
1965 Broken Collarbone
 
M

Metspitzer

Took me about 30 seconds of work to do it with a spreadsheet...

- opened text file
- CTRL-A (to select all)
- CTRL-C (to copy to clipboard)
- Opened spreadsheet app (in my case, Excel 2010)
- CTRL-V (to paste)
- right clicked on "A" column header and selected "Insert" (now pasted text
is in column B)
- Put a numeric "1" in cell A1
- Put the formula "@sum(a1+1)" in cell A2
- CTRL-C on A2
- highlighted A3 all the way down to end of your text
- CTRL-V to paste formula (spreadsheet app automatically updated A1 to A2 to
A3, etc... as needed all the way down)
- Went to DATA options
- Selected sort, sorted on column A in descending order
- right clicked on column A header and deleted
- CTRL-A to select data
- pasted it below

(flipped file)
Thanks
 
A

Anthony Buckland

...[subsequently to various spreadsheet postings] ...
Your problem is that you're limiting your sorting to alphabetic. If
you tell your spreadsheet program to treat those values as dates
you'll be able to sort them chronologically.
Dragging myself back to the original posting, I note that
Windows, unadorned, does not offer a spreadsheet program.
There are many spreadsheet - and text-processing - programs
that offer sorting, but these are programs running under
Windows, not parts of Windows itself.

(No intention to bad-mouth any of the work done by various
responders)
 
G

Gene E. Bloch

Took me about 30 seconds of work to do it with a spreadsheet...
- opened text file
- CTRL-A (to select all)
- CTRL-C (to copy to clipboard)
- Opened spreadsheet app (in my case, Excel 2010)
- CTRL-V (to paste)
- right clicked on "A" column header and selected "Insert" (now pasted text
is in column B)
- Put a numeric "1" in cell A1
For these five statements:
- Put the formula "@sum(a1+1)" in cell A2
- CTRL-C on A2
- highlighted A3 all the way down to end of your text
- CTRL-V to paste formula (spreadsheet app automatically updated A1 to A2 to
A3, etc... as needed all the way down)
....it's even easier in Excel 2003.
Click on the little square in the lower right corner of cell A1
Control drag that square to the last row with data
 
P

Paul

Metspitzer said:
Here is the original file. Give it a shot. I think it would have
been easier for me to just flip the lines one at a time than for you
to go through this. I wanted the easy way out. You have done more
work than I.

Thanks

http://www.filedropper.com/health
Program modified, to handle new input formats. There are at least five
variants on the date field format.

Part of my purpose in writing a script, is to show how much effort
it takes, to fix up the date field format and make the computer understand
it. Procedural languages such as the one I'm using, are "brittle".

Note that, when you take the shorthand like this...

03/2010 Major event
03/22 Minor event at day 22 of month

and invert it, it doesn't make quite as much sense, and I don't know how
to make it look better in that case. You may need to do some more edits
to the inverted file, like maybe "03/22/2010" for the minor.

03/22 Minor event at day 22 of month
03/2010 Major event

************************ Begin file "metzsort.txt" **********************
# Dependencies: gawk version 3.1.2 or later from gnuwin32. (Currently 3.1.6)
#
# http://gnuwin32.sourceforge.net/packages/gawk.htm
#
# Syntax: gawk -f metzsort.txt ascending input.txt > output.txt
# gawk -f metzsort.txt descending input.txt > output.txt
# ^ ^ ^
# | | |
# ARGV[0] ARGV[1] ARGV[2]
#
# gawk -f metzsort.txt input.txt > output.txt <--- will be ascending
# ^ ^
# | |
# ARGV[0] ARGV[1]
#
# (Sorts based on date and text. Input sample comes next)
#
# 9/1975 Broken right hand <--- Format 1
# 6/22/1965 Broken left Collarbone <--- Format 2
# 3/1966 Broken jaw
# 1965 Broken Collarbone <--- Format 3
# 03/2010 Need to keep "year" in a history buffer...
# 03/22 As the next entry to it assumes the same year <--- Format 4
# 3/22/12 percutaneous <--- Format 5
#
# Need to convert the date field into year_month_day, and append it to the left
# of the user input line.
#
# <- date -> <--------------- line ---------------->
# 1975_09_00 Broken right hand
# 1965_06_22 Broken left Collarbone
# 1966_03_00 Broken jaw
#
# Then, sort the date array, use the indices to print out the result
#

BEGIN { # this clause runs, before the program eats any data...
count = 1

descending = 0
if (ARGV[1] == "descending") {
delete ARGV[1]
descending = 1
}
if (ARGV[1] == "ascending") {
delete ARGV[1]
}
# Note: Your input file name cannot be "ascending" or "descending" !!!
}

{
numfields = split( $1, dateparts, "/" )
switch (numfields) {
case 1: # this is a Format 3
tempdate = sprintf("%04d_00_00", dateparts[1]+0)
break
case 2:
if ( dateparts[2] < 1000 ) { # this is a Format 4
tempdate = sprintf("%04d_%02d_%02d", oldyear, dateparts[1]+0, dateparts[2]+0)
} else { # this is a Format 1
tempdate = sprintf("%04d_%02d_00", dateparts[2]+0, dateparts[1]+0)
oldyear = dateparts[2]+0 # for Format 4
}
break
case 3: # this is a Format 2
# Add fixups for format 5, a short year field. Not for centenarians... Crappy code follows.
if (dateparts[3] < 20) { add = 2000 } # 2000 up to 2020
if (dateparts[3] > 50) { add = 1900 } # 1950 to 1999
if (dateparts[3] > 1000) { add = 0 } # Proper four digit date ?
tempdate = sprintf("%04d_%02d_%02d", dateparts[3]+add, dateparts[1]+0, dateparts[2]+0)
oldyear = dateparts[3]+add # for Format 4
break
default:
print "Unexpected date field at line " NR > "/dev/stderr"
exit
}

for (j=2; j<=NF; j++) { # 1975_09_00Brokenrighthand
tempdate = tempdate $j
}
if ( tempdate in date ) {
print "Warn: Identical line detected at line " NR > "/dev/stderr"
count--
}
# print tempdate # Debug: Uncomment this line, to check the thing we're sorting on...
date[ tempdate ] = $0 # Associative array holds the original user lines
count++
}

END {
# the count variable, is now "one past the end"

# This built-in function, sorts by the index field, which is "tempdate"
asorti(date,datesort)

if ( descending == 0 ) {
for( j=1; j<count; j++) { # then let's print ascending
print date[ datesort[ j ] ]
}
} else { # descending is 1
for( j=count-1; j>=1; j-- ) { # then let's print descending
print date[ datesort[ j ]]
}
}
}
************************ End file "metzsort.txt" **********************

Output at pastebin, in descending order. (As per Seth's example)

http://pastebin.com/nTx3zwUP

Paul
 
M

Metspitzer

Metspitzer said:
Here is the original file. Give it a shot. I think it would have
been easier for me to just flip the lines one at a time than for you
to go through this. I wanted the easy way out. You have done more
work than I.

Thanks

http://www.filedropper.com/health
Program modified, to handle new input formats. There are at least five
variants on the date field format.

Part of my purpose in writing a script, is to show how much effort
it takes, to fix up the date field format and make the computer understand
it. Procedural languages such as the one I'm using, are "brittle".

Note that, when you take the shorthand like this...

03/2010 Major event
03/22 Minor event at day 22 of month

and invert it, it doesn't make quite as much sense, and I don't know how
to make it look better in that case. You may need to do some more edits
to the inverted file, like maybe "03/22/2010" for the minor.

03/22 Minor event at day 22 of month
03/2010 Major event

************************ Begin file "metzsort.txt" **********************
# Dependencies: gawk version 3.1.2 or later from gnuwin32. (Currently 3.1.6)
#
# http://gnuwin32.sourceforge.net/packages/gawk.htm
#
# Syntax: gawk -f metzsort.txt ascending input.txt > output.txt
# gawk -f metzsort.txt descending input.txt > output.txt
# ^ ^ ^
# | | |
# ARGV[0] ARGV[1] ARGV[2]
#
# gawk -f metzsort.txt input.txt > output.txt <--- will be ascending
# ^ ^
# | |
# ARGV[0] ARGV[1]
#
# (Sorts based on date and text. Input sample comes next)
#
# 9/1975 Broken right hand <--- Format 1
# 6/22/1965 Broken left Collarbone <--- Format 2
# 3/1966 Broken jaw
# 1965 Broken Collarbone <--- Format 3
# 03/2010 Need to keep "year" in a history buffer...
# 03/22 As the next entry to it assumes the same year <--- Format 4
# 3/22/12 percutaneous <--- Format 5
#
# Need to convert the date field into year_month_day, and append it to the left
# of the user input line.
#
# <- date -> <--------------- line ---------------->
# 1975_09_00 Broken right hand
# 1965_06_22 Broken left Collarbone
# 1966_03_00 Broken jaw
#
# Then, sort the date array, use the indices to print out the result
#

BEGIN { # this clause runs, before the program eats any data...
count = 1

descending = 0
if (ARGV[1] == "descending") {
delete ARGV[1]
descending = 1
}
if (ARGV[1] == "ascending") {
delete ARGV[1]
}
# Note: Your input file name cannot be "ascending" or "descending" !!!
}

{
numfields = split( $1, dateparts, "/" )
switch (numfields) {
case 1: # this is a Format 3
tempdate = sprintf("%04d_00_00", dateparts[1]+0)
break
case 2:
if ( dateparts[2] < 1000 ) { # this is a Format 4
tempdate = sprintf("%04d_%02d_%02d", oldyear, dateparts[1]+0, dateparts[2]+0)
} else { # this is a Format 1
tempdate = sprintf("%04d_%02d_00", dateparts[2]+0, dateparts[1]+0)
oldyear = dateparts[2]+0 # for Format 4
}
break
case 3: # this is a Format 2
# Add fixups for format 5, a short year field. Not for centenarians... Crappy code follows.
if (dateparts[3] < 20) { add = 2000 } # 2000 up to 2020
if (dateparts[3] > 50) { add = 1900 } # 1950 to 1999
if (dateparts[3] > 1000) { add = 0 } # Proper four digit date ?
tempdate = sprintf("%04d_%02d_%02d", dateparts[3]+add, dateparts[1]+0, dateparts[2]+0)
oldyear = dateparts[3]+add # for Format 4
break
default:
print "Unexpected date field at line " NR > "/dev/stderr"
exit
}

for (j=2; j<=NF; j++) { # 1975_09_00Brokenrighthand
tempdate = tempdate $j
}
if ( tempdate in date ) {
print "Warn: Identical line detected at line " NR > "/dev/stderr"
count--
}
# print tempdate # Debug: Uncomment this line, to check the thing we're sorting on...
date[ tempdate ] = $0 # Associative array holds the original user lines
count++
}

END {
# the count variable, is now "one past the end"

# This built-in function, sorts by the index field, which is "tempdate"
asorti(date,datesort)

if ( descending == 0 ) {
for( j=1; j<count; j++) { # then let's print ascending
print date[ datesort[ j ] ]
}
} else { # descending is 1
for( j=count-1; j>=1; j-- ) { # then let's print descending
print date[ datesort[ j ]]
}
}
}
************************ End file "metzsort.txt" **********************

Output at pastebin, in descending order. (As per Seth's example)

http://pastebin.com/nTx3zwUP

Paul
The code is more complicated than my list. :)
Thanks
 
P

Paul

Metspitzer said:
The code is more complicated than my list. :)
Thanks
True :)

The last script I wrote (before the example one), was
processing a 100MB text file.

And you wouldn't want to edit a file like that by hand.

Generally, you size up the job. If the source file is
small, hand editing might be the answer. If the source
file is huge, or you have thousands of small files to
edit, that's when the script begins to pay off.

Many times I've written scripts, only to discover it
wasn't a good usage of time. So it's not like I'm
a good "estimator" or anything.

Paul
 
R

ray

Metspitzer wrote:

Here is the original file. Give it a shot. I think it would have
been easier for me to just flip the lines one at a time than for you
to go through this. I wanted the easy way out. You have done more
work than I.

Thanks

http://www.filedropper.com/health
Program modified, to handle new input formats. There are at least five
variants on the date field format.

Part of my purpose in writing a script, is to show how much effort it
takes, to fix up the date field format and make the computer understand
it. Procedural languages such as the one I'm using, are "brittle".

Note that, when you take the shorthand like this...

03/2010 Major event
03/22 Minor event at day 22 of month

and invert it, it doesn't make quite as much sense, and I don't know how
to make it look better in that case. You may need to do some more edits
to the inverted file, like maybe "03/22/2010" for the minor.

03/22 Minor event at day 22 of month 03/2010 Major event

************************ Begin file "metzsort.txt"
********************** # Dependencies: gawk version 3.1.2 or later from
gnuwin32. (Currently 3.1.6) #
# http://gnuwin32.sourceforge.net/packages/gawk.htm #
# Syntax: gawk -f metzsort.txt ascending input.txt > output.txt #
gawk -f metzsort.txt descending input.txt > output.txt #
^ ^ ^ # |
| | # ARGV[0] ARGV[1] ARGV[2] #
# gawk -f metzsort.txt input.txt > output.txt
<--- will be ascending # ^ ^
# | | # ARGV[0]
ARGV[1] #
# (Sorts based on date and text. Input sample comes next) #
# 9/1975 Broken right hand <--- Format 1 #
6/22/1965 Broken left Collarbone <--- Format 2 # 3/1966
Broken jaw
# 1965 Broken Collarbone <--- Format 3 #
03/2010 Need to keep "year" in a history buffer... # 03/22 As
the next entry to it assumes the same year <--- Format 4 # 3/22/12
percutaneous <--- Format 5 #
# Need to convert the date field into year_month_day, and append it to
the left # of the user input line.
#
# <- date -> <--------------- line ----------------> # 1975_09_00 Broken
right hand
# 1965_06_22 Broken left Collarbone
# 1966_03_00 Broken jaw
#
# Then, sort the date array, use the indices to print out the result #

BEGIN { # this clause runs, before the program eats any data...
count = 1

descending = 0
if (ARGV[1] == "descending") {
delete ARGV[1]
descending = 1
}
if (ARGV[1] == "ascending") {
delete ARGV[1]
}
# Note: Your input file name cannot be "ascending" or "descending" !!!
}

{
numfields = split( $1, dateparts, "/" ) switch (numfields) {
case 1: # this is a Format 3
tempdate = sprintf("%04d_00_00", dateparts[1]+0) break
case 2:
if ( dateparts[2] < 1000 ) { # this is a Format 4
tempdate = sprintf("%04d_%02d_%02d", oldyear, dateparts[1]+0,
dateparts[2]+0)
} else { # this is a Format 1
tempdate = sprintf("%04d_%02d_00", dateparts[2]+0,
dateparts[1]+0) oldyear = dateparts[2]+0 # for Format 4
}
break
case 3: # this is a Format 2
# Add fixups for format 5, a short year field. Not for
centenarians... Crappy code follows. if (dateparts[3] < 20) { add
= 2000 } # 2000 up to 2020 if (dateparts[3] > 50) { add = 1900 }
# 1950 to 1999 if (dateparts[3] > 1000) { add = 0 } # Proper
four digit date ? tempdate = sprintf("%04d_%02d_%02d",
dateparts[3]+add, dateparts[1]+0, dateparts[2]+0) oldyear =
dateparts[3]+add # for Format 4 break
default:
print "Unexpected date field at line " NR > "/dev/stderr" exit
}

for (j=2; j<=NF; j++) { # 1975_09_00Brokenrighthand
tempdate = tempdate $j
}
if ( tempdate in date ) {
print "Warn: Identical line detected at line " NR > "/dev/stderr"
count--
}
# print tempdate # Debug: Uncomment this line, to check the thing
we're sorting on... date[ tempdate ] = $0 # Associative array holds
the original user lines count++
}

END {
# the count variable, is now "one past the end"

# This built-in function, sorts by the index field, which is
"tempdate" asorti(date,datesort)

if ( descending == 0 ) {
for( j=1; j<count; j++) { # then let's print ascending
print date[ datesort[ j ] ]
}
} else { # descending is 1
for( j=count-1; j>=1; j-- ) { # then let's print descending
print date[ datesort[ j ]]
}
}
}
************************ End file "metzsort.txt" **********************

Output at pastebin, in descending order. (As per Seth's example)

http://pastebin.com/nTx3zwUP

Paul
The code is more complicated than my list. :) Thanks
Here is your file:
3/22/12 percutaneous cholangiogram and liver biopsy (no bile block)
3/2012
Several bouts with gout. Left foot and Right second toe
09/2011 Pneumonia Shot
09/2010 Pneumonia Shot
08/2010 Liver biopsy ruled out blocked bile duct
07/2010 Eyes checked
07/2010 MRI for bile duct
06/2010 Sugar doctor
05/14 Removed Biliary Catheter
05/07 Went into hospital for high sugar 517
04/09 Biliary Catheter Changed
04/01 Biliary Catheter leaked and infected
03/29 Biliary Catheter leaked
03/22 Biliary Catheter unable to flush fully
03/2010 Went in for bile procedure passed out , kept overnight and given
EKG
03/2010 MRI for itching Blocked Bile duct
10/2009 Went in for labs and passed out
10/2009 MRI
09/2009 Pneumonia Shot
07/2009 Gout
05/2009 Liver Biopsy
03/2009 Bone Marrow Biopsy
03/2009 Colonoscopy
02/2009 Gout
01/2009 Blood disorder, low white blood count 0.7, low platelettes ITP
10/2007 Cured Bacteria growth
08/2007 Cystoscopsy
02/2007 Prostrate Biopsy
12/2006 MRI
11/2005 Cured Lupus
02/2006 Stopped Taking drug for Bacteria growth 1st time
06/2005 Chemo for Lupus
05/2005 Chemo for Lupus
05/2005 Biopsy for bacteria growth
05/2005 Kidney Damage due to Lupus
02/2005 Biopsy ? Bone
01/2005 Developed Drug induced Lupus
01/2005 Blood disorder thought to be ITP Low platelettes
12/2004 Biopsy ? Liver
10/2004 Ended Pegasus Cleared HepC
10/2003 Started Taking Pegasus for HepC
10/2003 Ended Intron A
11/2002 Started HepC treatment
06/2002 CMV
05/2002 Liver Transplant
05/2000 Disable
1995 Broken Jaw
1979 Broken Jaw
1975 Broken right hand
1970 Broken wrist
1966 Broken wrist
1965 Broken Collarbone

The complex code used to generate it on my Linux system was:
'tac Health.txt >health.txt'
 
M

Metspitzer

Metspitzer wrote:


Here is the original file. Give it a shot. I think it would have
been easier for me to just flip the lines one at a time than for you
to go through this. I wanted the easy way out. You have done more
work than I.

Thanks

http://www.filedropper.com/health

Program modified, to handle new input formats. There are at least five
variants on the date field format.

Part of my purpose in writing a script, is to show how much effort it
takes, to fix up the date field format and make the computer understand
it. Procedural languages such as the one I'm using, are "brittle".

Note that, when you take the shorthand like this...

03/2010 Major event
03/22 Minor event at day 22 of month

and invert it, it doesn't make quite as much sense, and I don't know how
to make it look better in that case. You may need to do some more edits
to the inverted file, like maybe "03/22/2010" for the minor.

03/22 Minor event at day 22 of month 03/2010 Major event

************************ Begin file "metzsort.txt"
********************** # Dependencies: gawk version 3.1.2 or later from
gnuwin32. (Currently 3.1.6) #
# http://gnuwin32.sourceforge.net/packages/gawk.htm #
# Syntax: gawk -f metzsort.txt ascending input.txt > output.txt #
gawk -f metzsort.txt descending input.txt > output.txt #
^ ^ ^ # |
| | # ARGV[0] ARGV[1] ARGV[2] #
# gawk -f metzsort.txt input.txt > output.txt
<--- will be ascending # ^ ^
# | | # ARGV[0]
ARGV[1] #
# (Sorts based on date and text. Input sample comes next) #
# 9/1975 Broken right hand <--- Format 1 #
6/22/1965 Broken left Collarbone <--- Format 2 # 3/1966
Broken jaw
# 1965 Broken Collarbone <--- Format 3 #
03/2010 Need to keep "year" in a history buffer... # 03/22 As
the next entry to it assumes the same year <--- Format 4 # 3/22/12
percutaneous <--- Format 5 #
# Need to convert the date field into year_month_day, and append it to
the left # of the user input line.
#
# <- date -> <--------------- line ----------------> # 1975_09_00 Broken
right hand
# 1965_06_22 Broken left Collarbone
# 1966_03_00 Broken jaw
#
# Then, sort the date array, use the indices to print out the result #

BEGIN { # this clause runs, before the program eats any data...
count = 1

descending = 0
if (ARGV[1] == "descending") {
delete ARGV[1]
descending = 1
}
if (ARGV[1] == "ascending") {
delete ARGV[1]
}
# Note: Your input file name cannot be "ascending" or "descending" !!!
}

{
numfields = split( $1, dateparts, "/" ) switch (numfields) {
case 1: # this is a Format 3
tempdate = sprintf("%04d_00_00", dateparts[1]+0) break
case 2:
if ( dateparts[2] < 1000 ) { # this is a Format 4
tempdate = sprintf("%04d_%02d_%02d", oldyear, dateparts[1]+0,
dateparts[2]+0)
} else { # this is a Format 1
tempdate = sprintf("%04d_%02d_00", dateparts[2]+0,
dateparts[1]+0) oldyear = dateparts[2]+0 # for Format 4
}
break
case 3: # this is a Format 2
# Add fixups for format 5, a short year field. Not for
centenarians... Crappy code follows. if (dateparts[3] < 20) { add
= 2000 } # 2000 up to 2020 if (dateparts[3] > 50) { add = 1900 }
# 1950 to 1999 if (dateparts[3] > 1000) { add = 0 } # Proper
four digit date ? tempdate = sprintf("%04d_%02d_%02d",
dateparts[3]+add, dateparts[1]+0, dateparts[2]+0) oldyear =
dateparts[3]+add # for Format 4 break
default:
print "Unexpected date field at line " NR > "/dev/stderr" exit
}

for (j=2; j<=NF; j++) { # 1975_09_00Brokenrighthand
tempdate = tempdate $j
}
if ( tempdate in date ) {
print "Warn: Identical line detected at line " NR > "/dev/stderr"
count--
}
# print tempdate # Debug: Uncomment this line, to check the thing
we're sorting on... date[ tempdate ] = $0 # Associative array holds
the original user lines count++
}

END {
# the count variable, is now "one past the end"

# This built-in function, sorts by the index field, which is
"tempdate" asorti(date,datesort)

if ( descending == 0 ) {
for( j=1; j<count; j++) { # then let's print ascending
print date[ datesort[ j ] ]
}
} else { # descending is 1
for( j=count-1; j>=1; j-- ) { # then let's print descending
print date[ datesort[ j ]]
}
}
}
************************ End file "metzsort.txt" **********************

Output at pastebin, in descending order. (As per Seth's example)

http://pastebin.com/nTx3zwUP

Paul
The code is more complicated than my list. :) Thanks

The complex code used to generate it on my Linux system was:
'tac Health.txt >health.txt'
Thanks
 
P

Paul

ray said:
The complex code used to generate it on my Linux system was:
'tac Health.txt >health.txt'
Your code, didn't fix this. These entries are not in date order.

12/2006 MRI
11/2005 Cured Lupus
02/2006 Stopped Taking drug for Bacteria growth 1st time

This is the same section from my output. Chronological order,
ascending or descending, descending mode selected.

02/2007 Prostrate Biopsy
12/2006 MRI
02/2006 Stopped Taking drug for Bacteria growth 1st time
11/2005 Cured Lupus
06/2005 Chemo for Lupus

All in the spirit of pointless exercises :) Which keeps me busy.

Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

FireFox order of windows changes after reboot 1
file orderÉÉÉ 1
Sound Recorder in ZorinOS 7 0
Order of searching for device drivers 15
Sound Recorder 30
Problem Steps Recorder 4
SOLVED Add Flip 3D to Context Menu 10
Surprising Flip 1

Top