Page 1 of 1
searh by whole words
Posted: 21 Mar 2005 12:23
by Leopoldus
I needed to find all files with name
A & A***.doc - and I've found out, that TrackerV has not such common optiopn as
"searh by whole words only". And you can imagine, which result we'll get back if use searhing expression:
a && a
I don't see now a decision. Is there any?
Re: searh by whole words
Posted: 21 Mar 2005 12:50
by admin
I don't see your problem. Try these:
a \& a*.doc
a \& a???.doc
Re: searh by whole words
Posted: 21 Mar 2005 19:57
by Leopoldus
admin wrote:I don't see your problem. Try these:
a \& a*.doc
a \& a???.doc
I don't understand how it could get my task. And it does not, I've tried to avoid misleading. Have
you tried this expression yourself to get a file
A & A.doc or
A & ABC.doc or
A & A.xls etc?
Re: searh by whole words
Posted: 22 Mar 2005 09:03
by admin
I tried it and it works. There must be some misunderstanding. Try to explain again what you want to do.
Re: searh by whole words
Posted: 22 Mar 2005 10:34
by Leopoldus
admin wrote:I tried it and it works. There must be some misunderstanding. Try to explain again what you want to do.
Let's create somethere four files with names
"Get.doc", "Get 123.doc", "Gettysburg.doc" and
"target.doc".
Now let us try to find all those (an only those) files which include the whole word
Get as an entirety, i.e.
"Get.doc" and
"Get 123.doc", but not
"Gettysburg.doc" and
"target.doc". You can not do it!
1) if you enter searh expression
get or
*get*, it returns you every file, which includes
"get" in its name (
get.doc,
get 123.doc,
Gettysburg.doc and
target.doc);
2) if you enter searh expression
get*, it returns you
get.doc,
get 123.doc and
Gettysburg.doc.
3) if you enter searh expression
*get, it returns you
nothing.
For such case most of search utilities, text viewers and editors etc. have an option
"searh the whole word only". But Tracker has not.
There is similar problem with file
target.doc. You can not find it separatly from all those
get.doc,
get 123.doc and
Gettysburg.doc (see p. 3 above).
Re: searh by whole words
Posted: 22 Mar 2005 10:53
by admin
OK, I see.
Now, what's a word boundary? Many characters come to mind: .,-() #[]_~=!, start of string, end of string, etc... To set up a general "search the whole word only" algorithm seems to be tedious.
RegExp can do it however! Can some RegExp expert provide a solution for Leopoldus?
Re: searh by whole words
Posted: 22 Mar 2005 14:06
by Leopoldus
admin wrote:Now, what's a word boundary? Many characters come to mind: .,-() #[]_~=!, start of string, end of string, etc...
Not
SO many, if you consider, that Windows does not let to use every symbol in filenames. Besides I hope, that there are some standard codes templates for this common task (for example, I use two plain-text editors, one 25 Kb and another 60 Kb of code, and the both have this option in their find diaolgs).
Re: searh by whole words
Posted: 23 Mar 2005 11:19
by admin
I found a pretty fast way to do it! However, you can't use wildcards in a Whole Words search, so if the search pattern contains wildcards * or ?: Whole Words is internally set to False. And, of course, you can't combine it with a fuzzy search.
Re: searh by whole words
Posted: 23 Mar 2005 17:06
by RalphM
Leopoldus wrote: Let's create somethere four files with names "Get.doc", "Get 123.doc", "Gettysburg.doc" and "target.doc".
Now let us try to find all those (an only those) files which include the whole word Get as an entirety, i.e. "Get.doc" and "Get 123.doc", but not "Gettysburg.doc" and "target.doc". You can not do it!
1) if you enter searh expression get or *get*, it returns you every file, which includes "get" in its name (get.doc, get 123.doc, Gettysburg.doc and target.doc);
2) if you enter searh expression get*, it returns you get.doc, get 123.doc and Gettysburg.doc.
3) if you enter searh expression *get, it returns you nothing.
For such case most of search utilities, text viewers and editors etc. have an option "searh the whole word only". But Tracker has not.
There is similar problem with file target.doc. You can not find it separatly from all those get.doc, get 123.doc and Gettysburg.doc (see p. 3 above).
Ok, I'm not the RegExp specialist, but it's quite easy to do the job and find:
1) those "whole" word file names by using the find string:
>\bget\b.? and
2) the
target.doc by modifying the find string to:
>\Bget\b.?
I didn't check, if all the other find otions still work, but I guess they will, except for "Exact", "Fuzzy" and "Invert", which are not valid when using RegExp.
(For an explanation check out the link for RegExp in the Find files section)
Check it out
Re: searh by whole words
Posted: 23 Mar 2005 20:10
by admin
Interesting: there are slight differences between "RegExp words" and "Whole Words words" when comparing for example ">\ba\b" and "b (+Whole Words)", at least I saw one difference: RegExp does not count "_" (underscore) as a word delimiter.
Re: searh by whole words
Posted: 24 Mar 2005 09:54
by RalphM
admin wrote:Interesting: ...RegExp does not count "_" (underscore) as a word delimiter.
This seems to be not a bug but on purpose, since "_" is regarded as a word character rather than a whitespace throughout the RegExp syntax description.
But if you like to have "_" as a word boundary, try that one:
>(_|\b)get(_|\b).?
(the vertical bars are created with AltGr+7)
I tried the pipe character first
(AltGr+1) which didn't work for the RegExp, but seems to be an
undocumented shortcut within TV3 -> Focus on current folder in tree pane and display General file info tab?!? The even stranger thing about it is, that the "¦" is nevertheless inserted in the current input field, but at the start of the string, rather than the cursor position. (kind of a background process, since the display changed meanwhile to the General file info...
Donald, could you please comment on that behaviour?
I really start to like that RegExp implementation of TV3.
Re: searh by whole words
Posted: 24 Mar 2005 10:32
by admin
RalphM wrote:This seems to be not a bug but on purpose, since "_" is regarded as a word character rather than a whitespace throughout the RegExp syntax description.
Aha, I did not know that. Should I follow this policy? (I mean in TV3's new
Whole Words switch -- BTW, why doesn't anybody comment on this beautiful and incredibly fast new feature??

) For me a "_" feels like a word boundary...
RalphM wrote:(the vertical bars are created with AltGr+7)
Not on a German keyboard, where this will produce a {.
RalphM wrote:I tried the pipe character first (AltGr+1) which didn't work for the RegExp, but seems to be an undocumented shortcut within TV3 -> Focus on current folder in tree pane and display General file info tab?!? The even stranger thing about it is, that the "¦" is nevertheless inserted in the current input field, but at the start of the string, rather than the cursor position. (kind of a background process, since the display changed meanwhile to the General file info...
Donald, could you please comment on that behaviour?
AltGr is the same as pressing CTRL+ALT, and CTRL+1 selects the General file info tab. I forgot to check for the ALT-bit, so CTRL+ALT+1 does the same. It's a minor bug and I'll fix it. This will also fix the insertion-in-current-input-field-issue you mentioned.
Re: searh by whole words
Posted: 24 Mar 2005 16:01
by RalphM
admin wrote:Should I follow this policy? (I mean in TV3's new Whole Words switch
For me a "_" feels like a word boundary...
I don't know, if this is a general rule. Just figured it from the RegExp description.
For me "_" looks rather like a word boundary too.
I think the RegExp's offer more flexibility, since you can still use the RegExp's internal wildcards, but for general use the "whole word" option might be easier (less cryptic) to set up.
admin wrote:Not on a German keyboard, where this will produce a {.
Well, it just has to be one of the "pipe" chars. (on the German keyboard)
I wonder why Leopoldus doesn't comment on that post anymore, but maybe my patience needs a bit more exercise...
Re: searh by whole words
Posted: 24 Mar 2005 20:36
by Leopoldus
RalphM wrote:I wonder why Leopoldus doesn't comment on that post anymore, but maybe my patience needs a bit more exercise...
I think, the possible reason is that he has some more troubles at these days (and nights).
And I don't know enough regexp language to interfere in your discussion.
But I do use now the both "whole word" option of new Tracker version and regexp adviced by RalphM. The both works finely for my tasks. So thanks once and again Donald for his exellent TrackerV and RalphM for his wise advice.