Since Don is improving caps handling, I thought I'd post this.

Discuss and share scripts and script files...
Post Reply
Dustydog
Posts: 321
Joined: 13 Jun 2016 04:19

Since Don is improving caps handling, I thought I'd post this.

Post by Dustydog »

This takes a filename and corrects the caps - mostly, and conservatively - if a file is in Title Case. I use another script to handle roman numerals, for example, if I need it. Or to convert to and from roman numerals. I mostly use this for media files. The beginning has a refresher grammar guide for capitalization in the comments. I didn't take the time to make sure it was pretty before posting. Since I correct (i.e. re-capitalize) quite a few things, I left my raw materials list in a comment if I wanted to add something later. This little utility works well for me and the way I punctuate my personal media files.

I left it in the form I use for sticking in a multiscript that's a hodgepodge of little utilities.

My apologies to whomever I copied the style guide from, or grabbed pieces from, for the lack of attribution. It's been a long time. I wasn't very selective about cutting my selection(s?) down.

I've been intending on turning this into a function for a long time with the ability to select a style guide...but oh well. I'll post it if I get around to it, but I think I'll wait on what Don's working on in the current beta first.

Again, this is designed for my personal media files. Most people don't need to make sure there's a cap after a semicolon, but it's useful with how I handle my own media files, so chop out the parts you don't want or add re-capitalization corrections (because that's how this works - it does too many, then changes back what I wanted) that you do that aren't here.

The rename with >> is at least a powerful feature, and this is an example of how it could be used.

This is an old script, but one that I still find useful and good enough for now.

Please do not flame the poster. I thought someone might find it interesting, especially with the changes Don's making currently.

Code: Select all

"Correct Caps In Title Case Folder or File Directly||| : CorrectCaps"

/*
Style Guides
The rules for capitalization in titles of articles (and also books, papers, speeches, etc) can vary according to a particular style guide, such as The Associated Press Stylebook, The Chicago Manual of Style, and MLA Handbook. Generally, you will use title case, although as you will see below sentence case is an option. While you will find similarities between each guide, it's important to pay attention to their differences.

Style guide similarities:

In all three styles, always capitalize the first and last word of any title.
  How to Land Your Dream Job
In all three styles, you must capitalize nouns, pronouns, verbs, adjectives, and adverbs.
  Visiting Beautiful Ruins (noun)
  As She Ran Away (pronoun)
  The Importance of Learning Fast (verb)
  The Poky Little Puppy (adjective)
  She Quietly Waits (adverb)
In all three styles, do not capitalize articles, prepositions, or coordinating conjunctions.
  To Catch a Thief (article)
  One Year in Paris (preposition)
  Magic and Daybreak (coordinating conjunction)

Style guide differences:

In the AP Stylebook, all words with three letters or less are lowercased. However, if any of those short words are verbs (is, are, was, be), they are to be capitalized.
In Chicago style, all prepositions are lowercased, even the lengthier ones, such as between, among, throughout.
In MLA style, words with three letters or less are always lowercased.

The General Rules for Title Case
As we can see, there are some exceptions to the general rules for title case set forth by each style guide, but they mostly follow a similar pattern.
  We know to capitalize the first, last, and important words in a title.
  Important words include nouns, pronouns, verbs, adverbs, and more. So, generally, these parts of speech are capitalized in titles:

  Nouns - man, bus, book
  Adjectives - angry, lovely, small
  Verbs - run, eat, sleep
  Adverbs - slowly, quickly, quietly
  Pronouns - he, she, it
  Subordinating conjunctions - as, because, that
  "Short" words-those with less than five letters-are lowercase in titles unless they are the first or last words. Generally, we do not capitalize:

Articles - a, an, the
Coordinating Conjunctions (fewer than five letters) - and, but, or, for, nor, etc.
Prepositions (fewer than five letters) - on, at, to, from, by, etc.

When in doubt and you do not have a reference guide in front of you, here is one general rule recommended by The U.S. Government Printing Office Style Manual:
  "Capitalize all words in titles of publications and documents, except a, an, the, at, by, for, in, of, on, to, up, and, as, but, or, and nor."

Advanced Rules to Note:

Hyphenated Titles
Let's take a look at The Chicago Manual of Style's guidelines for hyphenated words in titles:

Capitalize the first element of the hyphenated word.
Capitalize subsequent elements unless they are articles, prepositions, or coordinating conjunctions (and, but, for, or, nor):
  High-Quality Web Services
  First-Rate U.S. Lawyers
  Bed-and-Breakfast Options in Savannah
  Capitalize the second element in a hyphenated spelled-out number.
  Forty-Ninth Street Blues
  Do not capitalize the second element if the first element is a prefix that could not stand alone by itself (such as anti- or pre-).
  Anti-inflammatory Dieting

Open Compounds
An open compound comes to life when a modifying adjective is used in conjunction with a noun. This creates a new noun. Hopefully, warning bells will signal in your mind, as nouns are almost always capitalized.
  Salad Dressing Recipes
  The Best Science Fiction and Fantasy of the Year

The First Word Following a Colon
  Both Chicago and AP Stylebook guidelines say you should:

  In title case capitalize the first word after a colon.
Feminine Poetry: Ten Women Writers from Around the World

Prepositions That Belong to a Phrasal Verb
  Prepositions often find themselves on the 'do not capitalize' list. However, when a preposition becomes an important part of a phrasal verb, it does need to be capitalized.

*/

/*This is actually cleaner code than it looks (i.e. it makes sense and is maintainable), and it's fast; it's simply a multi-character list separated by >> for the same list with a cap change. Has surrounding spaces so it catches only interior, discrete words.
  Catches all if done twice - which is why it's done twice. Could avoid this by not having things that could follow one another in a sentence in the same row. This is because overlapping spaces aren't caught by the 'rename' construct (i.e. rename with >>; if it uses a following space in a match it won't use it again for a subsequent beginning match). It's not extremely common, but it happens.
  Separated in rows simply for length, readability, and logical sort order at the moment. Sort of.
*/

 $doTwice=2;
 while ($doTwice--){
  rename s, " A | An | The >> a | an | the ";
  rename s, " And | Nor | But | Or | Yet | So | At | By | For | From | Of | On | To | With >> and | nor | but | or | yet | so | at | by | for | from | of | on | to | with ";
  rename s, " Amid | Mid | As | At | Atop >> amid | mid | as | at | atop ";
  rename s, " But | By | Come | For | From | In | Into | Less | Like >> but | by | come | for | from | in | into | less | like ";
  rename s, " Near | Of | Off | On | Onto | Out | Over  | Per  | Than >> near | of | off | on | onto | out | over  | per  | than ";
  rename s, " To | Till | Upon | Via | With >>  to | till | upon | via | with ";
 };
 
//Corrections: same after \s-\s, ;, \.\s at the beginning is done already.

//After \s-\s, i.e. space dash space.
  rename s, " - a| - an| - the>> - A| - An| - The";
  rename s, " - for| - and| - nor| - but| - or| - yet| - so| - at| - by| - for| - from| - of| - on| - to| - with>> - For| - And| - Nor| - But| - Or| - Yet| - So| - At| - By| - For| - From| - Of| - On| - To| - With";
  rename s, " - amid| - mid| - as| - at| - atop>> - Amid| - Mid| - As| - At| - Atop";
  rename s, " - but| - by| - come| - for| - from| - in| - into| - less| - like>> - But| - By| - Come| - For| - From| - In| - Into| - Less| - Like";
  rename s, " - near| - of| - off| - on| - onto| - out| - over| - per| - than>> - Near| - Of| - Off| - On| - Onto| - Out| - Over| - Per| - Than";
  rename s, " - to| - till| - upon| - via| - with>> - To| - Till| - Upon| - Via| - With";

//After ;
  rename s, "; a|; an|; the|; some|; few>>; A|; An|; The|; Some|; Few";
  rename s, "; for|; and|; nor|; but|; or|; yet|; so|; at|; by|; for|; from|; of|; on|; to|; with>>; For|; And|; Nor|; But|; Or|; Yet|; So|; At|; By|; For|; From|; Of|; On|; To|; With";
  rename s, "; amid|; mid|; as|; at|; atop>>; Amid|; Mid|; As|; At|; Atop";
  rename s, "; but|; by|; come|; for|; from|; in|; into|; less|; like>>; But|; By|; Come|; For|; From|; In|; Into|; Less|; Like";
  rename s, "; near|; of|; off|; on|; onto|; out|; over|; per|; than>>; Near|; Of|; Off|; On|; Onto|; Out|; Over|; Per|; Than";
  rename s, "; to|; till|; upon|; via|; with>>; To|; Till|; Upon|; Via|; With";

//After \.\s, i.e. dot space
  rename s, ". a|. an|. the|. some|. few>>. A|. An|. The|. Some|. Few";
  rename s, ". for|. and|. nor|. but|. or|. yet|. so|. at|. by|. for|. from|. of|. on|. to|. with>>. For|. And|. Nor|. But|. Or|. Yet|. So|. At|. By|. For|. From|. Of|. On|. To|. With";
  rename s, ". amid|. mid|. as|. at|. atop>>. Amid|. Mid|. As|. At|. Atop";
  rename s, ". but|. by|. come|. for|. from|. in|. into|. less|. like>>. But|. By|. Come|. For|. From|. In|. Into|. Less|. Like";
  rename s, ". near|. of|. off|. on|. onto|. out|. over|. per|. than>>. Near|. Of|. Off|. On|. Onto|. Out|. Over|. Per|. Than";
  rename s, ". to|. till|. upon|. via|. with>>. To|. Till|. Upon|. Via|. With";

 //A few special cases of modified CamelBack and ACRONYMN or whatever personal quirks I like.
  rename s, "Dvd|Cd|Ebook|Audiobook|Graphicaudio|Hd|Flac|Cbr|Sd|Vbr|]S|Aka|Vs>>DVD|CD|eBook|AudioBook|GraphicAudio|HD|FLAC|CBR|SD|VBR|]s|aka|vs";
  rename r, "^(.+)s(\d\d)e(\d\d)(.+)$ > $1S$2E$3$4\"; //For TV Seasons
  rename r, "(S\d\d)(e)(\d\d) > $1E$3";

 //"A" is the only possible middle initial from the list.
 rename r, "^(.+?)\b(a)(?=[\.\s]{1,2}\w+? - )(.+) > $1A$3"; //Boundary before, either a space or a dot or both after (no change with the dot), and immediately a word followed by \s-\s. This is overkill for most users.

/*In case I want to edit. I was more conservative than this complete list in what I changed above.
Useful regex to delete unwanted: ((?:\|[^\|]+?(?:of))\b\s?) //This shows replace "of". Do words singly.
 rename s, "a|an|the|one|some|few";
 rename s, "for|and|nor|but|or|yet|so|at|around|by|after|along|for|from|of|on|to|with|without";
 rename s, "aboard|about|above|across|after|against|again|along|alongside|amid|amidst|mid|among|around|as|astride|at|atop|before|behind";
 rename s, "below|beneath|beside|besides|between|beyond|but|by|circa|come|despite|down|during|except|for|from|in|inside|into|less|like";
 rename s, "minus|near|nearer|nearest|notwithstanding|of|off|on|onto|opposite|out|outside|over|per|since|than|through";
 rename s, "throughout|to|toward|towards|under|underneath|unlike|until|till|upon|upside|versus|via|with|within|without";
*/

Post Reply