List comparison using script.

Discuss and share scripts and script files...
Post Reply
Gabrielle
Posts: 57
Joined: 28 Apr 2012 00:57

List comparison using script.

Post by Gabrielle »

My problem is I have to compare some lists to find missing or count repeated terms present on each.
A friend told me she doesn't know of such a software but xyplorer programming could do it.

list 1 = "alpha,beta,gama,delta,alpha,gama,gama";
list 2 = "gama,,gama,alpha,";

resulting list =

item (nr of occurrences on opposed list) = item (nr of occurrences on opposed list)
alpha (1) = alpha (2)
beta = (not found)
gama (2) = gama (3)
delta = (not found)
alpha (1) = alpha (2) (x) = extra on this list just to mark its equivalent on list 1
gama (2) = gama (3)
gama (2) = gama (3) (x) = extra on this list just to mark its equivalent on list 1

but they can be listed as

list 1 = "gama,,gama,alpha,";
list 2 = "alpha,beta,gama,delta,alpha,gama,gama";

so the resulting list would be the opposite as what is seen on the list above, i.e.,

alpha (1) = alpha (1)
(not found) = beta
and so on.

Can you be so kind helping me on this?
Thanks.

highend
Posts: 14946
Joined: 06 Feb 2011 00:33
Location: Win Server 2022 @100%

Re: List comparison using script.

Post by highend »

I'm absolutely unsure if I got you right...
alpha (1) = alpha (1)
(not found) = beta
Does the first "alpha (1)" mean the word alpha was found at the first position of the tokens?
Does the second part " = alpha (1) mean that alpha was only found one time in list 2?

However...

One way:

Code: Select all

	$list1 = "alpha,beta,gama,delta,alpha,gama,gama";
	$list2 = "gama,,gama,alpha,";

	$count1 = gettoken($list1, "count", ",");
	$count2 = gettoken($list2, "count", ",");
	if ($count1 > $count2 || $count1 == $count2) {
		$long = $list1; $short = $list2; $count = $count1;
	} else {
		$long = $list2; $short = $list1; $count = $count2;
	}

	$output = "";
	while ($i++ <= $count) {
		$item = gettoken($long, $i, ",");
		$found = gettoken(regexmatches($short, $item, "|"), "count", "|");
		if ($found >= 1) {
			$output = $output . "$item (1) = $item ($found)<crlf>";
		} else {
			$output = $output . "$item (1) = not found<crlf>";
		}
	}
	text formatlist($output, "cde", "<crlf>");
Does that deliver the correct output?
If not, describe exactly how the output should look like for all entries!
One of my scripts helped you out? Please donate via Paypal

Gabrielle
Posts: 57
Joined: 28 Apr 2012 00:57

Re: List comparison using script.

Post by Gabrielle »

hello, mr. highend!

list 1 = "alpha,beta,gama,delta,alpha,gama,gama";
list 2 = "gama,,gama,alpha,";

the final list is a cross reference table.

alpha (1) = alpha (2) > it tells that ALPHA from list 1 was found (1) time on list 2, and it occurred (2) times on list 1
beta = (not found) > it tells that BETA from list 1 was NOT found on list 2
gama (2) = gama (3) > it tells that GAMA from list 1 was found (2) times on list 2, and it occurred (3) times on list 1
delta = (not found) > it tells that DELTA from list 1 was NOT found on list 2
alpha (1) = alpha (2) (x) > it tells that ALPHA from list 1 was found (1) time on list 2, and it occurred (2) times on list 1; the (X) = this ALPHA is extra on this list just to mark its equivalent on list 1 as it occurred just 1 time on list 2 (being now an extra 2nd just for visualization)
gama (2) = gama (3) > it tells that GAMA from list 1 was found (2) times on list 2, and it occurred (3) times on list 1
gama (2) = gama (3) (x) > it tells that GAMA from list 1 was found (2) time on list 2, and it occurred (3) times on list 1; the (X) = this GAMA is extra on this list just to mark its equivalent on list 1 as it occurred just 2 times on list 2 (being now an extra 3rd just for visualization)

made some corrections.

highend
Posts: 14946
Joined: 06 Feb 2011 00:33
Location: Win Server 2022 @100%

Re: List comparison using script.

Post by highend »

Code: Select all

		$list1 = "alpha,beta,gama,delta,alpha,gama,gama";
	$list2 = "gama,,gama,alpha,";

	$count1 = gettoken($list1, "count", ",");
	$count2 = gettoken($list2, "count", ",");
	if ($count1 > $count2 || $count1 == $count2) {
		$long = $list1; $short = $list2; $count = $count1;
	} else {
		$long = $list2; $short = $list1; $count = $count2;
	}

	$output = "";
	$searchedTokens = "";
	while ($i++ <= $count) {
		// step;
		$item = gettoken($long, $i, ",");
		$foundSelfCount = gettoken(regexmatches($long, $item, "|"), "count", "|");
		$foundOtherCount = gettoken(regexmatches($short, $item, "|"), "count", "|");
		if ($foundOtherCount >= 1) {
			if (strpos($searchedTokens, $item) != -1) {
				$output = $output . "$item ($foundOtherCount) = $item ($foundSelfCount) (x)<crlf>";
			} else {
				$output = $output . "$item ($foundOtherCount) = $item ($foundSelfCount)<crlf>";
				$searchedTokens = formatlist($searchedTokens . $item . "|", "d", "|");
			}
		} else {
			$output = ($foundSelfCount > 1) ? ($output . "$item ($foundSelfCount) = not found<crlf>") : ($output . "$item = not found<crlf>");
		}
	}
	text $output;
It's not cross reference (it will only compare the longer list with the shorter one).
That means: if there are any entries in the shorter list, that are not already in the longer one, you don't get any output for them.

Output:
alpha (1) = alpha (2)
beta = not found
gama (2) = gama (3)
delta = not found
alpha (1) = alpha (2) (x)
gama (2) = gama (3) (x)
gama (2) = gama (3) (x)
Look at line 6:
gama (2) = gama (3) (x)

You've said for that line:
gama (2) = gama (3) > it tells that GAMA from list 1 was found (2) times on list 2, and it occurred (3) times on list 1
But imho it must be marked with (x) because it's the second time, that it's found.

Compare it with your annotation for line 7.
the (X) = this GAMA is extra on this list just to mark its equivalent on list 1 as it occurred just 2 times on list 2 (being now an extra 3rd just for visualization)
For line 6 it should be:
the (X) = this GAMA is extra on this list just to mark its equivalent on list 1 as it occurred just 2 times on list 2 (being now an extra 2nd just for visualization)

Correct?
One of my scripts helped you out? Please donate via Paypal

Gabrielle
Posts: 57
Joined: 28 Apr 2012 00:57

Re: List comparison using script.

Post by Gabrielle »

List 2 has 2 GAMA, list 1 has 3, so the 3rd GAMA (list TWO) must be marked as it doesn't exist
originally, got it? It's there only because it must be seen as it is a fake counterpart for something
that was previously listed as existent. Different than "item (simply) not found", see?
Perhaps this can make it clear:

A,B,C,D,D,E,F,A,A > x for the missing D as it's here just two times. the 3rd is marked x as it's originally not here
A,D,F,A,D,D > x for the missing A as it's here just two times. the 3rd is marked x as it's originally not here

A = A
B = (not found)
C = (not found)
D = D
D = D
E = (not found)
F = F
A = A
A = A (x)
D (x) = D

but please note that sometimes those lists can be reversed, i.e., the first list being shorter
than second
A,D,F,A,D,D
A,B,C,D,D,E,F,A,A

so the resulting table is reversed too:

A = A
(not found) = B
(not found) = C
D = D
D = D
(not found) = E
F = F
A = A
A (x) = A
D = D (x)

Sometimes items can just be missing, but this shouldn't affect final results:

,A,D,F,A,,,D,D
,,,A,B,,,,C,D,,D,E,F,,,A,A, (she told me to tell that is common to have "<crlf" instead of ",", ", "
or "; " as separators on both lists, whatever it is)

The sort order they appear may not be regular either. What's important here is that each item
from a longer list finds its correspondent in a shorter list in the order they appear and eventual
missing items are marked as such, being simply "not found" (i.e., they don't have correspondent
equal on opposite list) or added for the sake of compensation (i.e., opposite list has more
correspondent equals than comparison list).

Gabrielle
Posts: 57
Joined: 28 Apr 2012 00:57

Re: List comparison using script.

Post by Gabrielle »

highend, what happened? Can't be done?
Anyone else please?

Post Reply