How to read UNICODE file content with readfile()?

Discuss and share scripts and script files...
Post Reply
DmFedorov
Posts: 680
Joined: 04 Jan 2011 16:36
Location: Germany

How to read UNICODE file content with readfile()?

Post by DmFedorov »

I have script below, but it not works with files such as dll and exe, that contains UNICODE strings.
In self XY I can easily find such strings with checkbox "Match unicode" in Contents tab of Info-pane.

The script is needed to find the lines "identifiers" in original files.
Lines "identifiers" are then used to translate a program.
number of lines ~ 2000, number of files ~ 100.

How it works: you have necessary files (htm, js, css, exe, dll) in file-pane, and a list of strings to enter in input-window (lines from these files and lines which are not in any of these files).

Code: Select all

    $searchStrings = input("Input list of strings ... Script return:", "1. Not found strings   2. Found strings   3. Strings found in specific files", , "m", 600, 600);
            $allFiles = listpane(, "*", 1, "|");
            $foundStringsText = "";   // Strings found in specific files (derived as a listing in the form of the text: Lines found in 1file and 1file; Lines found in 2file and 2file ...)
            $allFoundStrings = "";    // All lines have been found during the search in the order they were found. These lines can be repeated.
            $notFoundStrings = "";    // Strings that were not found in any of the files (strings without repeats).
            $foundStrings = "";       // Strings that have been found in at least one file (strings without repeats).
    foreach($file, $allFiles, "|") {
            $fileContent = readfile($file);
            $countFound = 0;
        foreach($string, $searchStrings, "<crlf>") {
            $stringInquote = $string;
            if (strpos($fileContent, $stringInquote, , 1) != -1) {
                $foundStringsText = $foundStringsText . $string. "<crlf>";
                $allFoundStrings = $allFoundStrings . $string. "<crlf>";
                $countFound = 1;
            }
        }
        if ($countFound == 1) {
            $foundStringsText = $foundStringsText . "<tab>" . $file . "<crlf><crlf>";
        }
    }
    foreach($stringInText, $searchStrings, "<crlf>") {
        if (strpos($allFoundStrings, $stringInText, , 1) == -1) {
            $notFoundStrings = $notFoundStrings . $stringInText . "<crlf>";
        }
        if (strpos($allFoundStrings, $stringInText, , 1) != -1) {
            $foundStrings = $foundStrings . $stringInText . "<crlf>";
        }
    }
    text "Not found strings:<br>----------------<br>$notFoundStrings<br>Found strings:<br>----------------<br>$foundStrings<br>Strings found in specific files:<br>----------------<br>$foundStringsText", , , "search Result";
There is also a second problem:
If string is found, it simply means that it exist in the files. ($stringInquote = $string;)
But to know that strings are identifiers, I should at least be sure that in files they are enclosed in quotation marks (.htm,.js). ($stringInquote = $stringInquote = """" . $string . """";)
And if the same string also exists in the exe or dll file, it is surrounded by a variable symbol ($). ($stringInquote = $stringInquote = '$' . $string . '$';)

Question: Is it possible to write an expression $stringInquote as one expression where the symbol that surrounds the line will be built on the principle of "OR"?
In this case, I do not need to do two searches.
Of course even more convenient option would be output (along with a search string) of one (or two) characters that surround this line in the original file.

DmFedorov
Posts: 680
Joined: 04 Jan 2011 16:36
Location: Germany

Re: How to read UNICODE file content with readfile()?

Post by DmFedorov »

Thanks to Enternal's link in topic Unicode, UTF-8, ASCII, and ReadFile SC I found a solution how to make function ReadFile() read content of UNICODE file.
Or just link them this page:
http://msdn.microsoft.com/en-us/library ... 85%29.aspx
$fileContent = readfile($file, , , 1200);
This link is very usefull.
---------------
Total, remained only the second problem:

is it possible add to variable characters (arounded string) as "OR" expression?
e.g. combine such as this in one expression.
$stringInquote = $stringInquote = '$' . $string . '$';
$stringInquote = $stringInquote = """" . $string . """";

And who knows, maybe I can just bring out the search string with one, two, three characters surrounding it in the code?

Post Reply