This seems to work pretty well, but some notes...
1) This will parse the clipboard line by line, so if your data does not contain line breaks it will slow down quite a bit, and you should probably insert some before calling this.
2) Related to 1, is that if your URLs span multiple lines, tough luck.
3) This method really does not scale well. I don't think the regex engine XY uses is particularly up to snuff for this kind of task, but for what is meant to be accomplished in XY it works - remember XY's scripting is not meant to replace full functioning programming languages. (Sorry if you disagree Don, I may be wrong, but that's what I'm observing with the regex engine.)
4) The regex pattern I'm using is from
John Gruber and it is meant to catch all sorts of URL-looking strings, and is well documented in that previous link. He also has a version that that is slightly reduced and will only work for web URLs. Also note that I had to escape ' in both of these.
I've also included one that is part of JGSoft's
RegexBuddy library in the comments. It is not nearly as inclusive as either of Gruber's, but they may be better for you.
You should probably pick the simplest one that works for you; from simplest to not so that would be RegexBuddy, Gruber web only, Gruber all.
Code: Select all
"Extract URLs from Clipboard"
$hay = "<clipboard>";
// http://daringfireball.net/2010/07/improved_regex_for_matching_urls - All URLs
$pattern = '\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:''".,<>?«»“”‘’]))';
// http://daringfireball.net/2010/07/improved_regex_for_matching_urls - Only web URLs
//$pattern = '\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:''".,<>?«»“”‘’]))';
// http://www.regexbuddy.com/ - JGSoft - RegexBuddy - URL: Find in Full Text
//$pattern = '\b((?:https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[A-Z0-9+&@#/%=~_|])';
$results = '';
$i = 0;
$lines = GetToken("$hay", 'count', "<crlf>");
while ($i < $lines) {
$i++;
$line = GetToken("$hay", "$i", "<crlf>");
$match = RegexReplace("$line", ".*?$pattern", "$1<crlf>"); //Put a CRLF after each URL (and strip leading chars)
if (Compare("$line", "$match")) { //If line was changed...
$j = 0;
$matches = GetToken("$match", 'count', "<crlf>");
while ($j < $matches) {
$j++;
$group = GetToken("$match", "$j", "<crlf>");
if (Compare("$group", "<crlf>") && Compare("$group", "")) { //If not CRLF/empty...
$groupMatch = RegexReplace("$group", "^$pattern$|^.*$", "$1"); //If line matches URL use it, else replace with nothing.
if (Compare("$group", "$groupMatch") == 0) { //If line was not changed....
$results = "$results$groupMatch<crlf>";
}
}
}
}
}
Text "$results";