Zero knowledge equals random NN weights?

Known unknowns and unknown unknowns!

Moderators: AMod, iMod

Post Reply
Hrvoje
Posts: 31
Joined: Sat Jan 19, 2019 4:37 am

Zero knowledge equals random NN weights?

Post by Hrvoje »

I was watching this video on youtube https://youtu.be/ig380wp10aQ?t=111 in which Gary Kasparov says that machines revealed so many secrets, and magic or mysteries of the game of chess are gone because you could see it through the lenses of computer and even an amateur can actually understand immediately what is happening at the chessboard thanks to the machine’s advice. There is another video that I cannot find anymore in which he is more specific and says that engines can explain what’s going on. And he is right of course, in the context of chess, every explanation is expressible first and foremost in the language of moves, which engines do speak, however, besides that, human mind tends to reason abstractly about it, create concepts expressible in natural language, mastering of which is something that people primarily refer to when they speak about “understanding chess”, otherwise everyone can understand if he or she is losing consistently, that is kind of obvious directly from the moves, but why exactly this happens requires another kind of explanation, in terms of these abstract concepts, which should be able to be illustrated concretely in the language of moves at the same time, for them to be valid and teachable. And this abstract reasoning comes natural to people, so that even young children are able to create for themselves some of these concepts without ever being taught, within mere hours of playing. Two of such concepts are material and the value of pieces, they immediately understand that having more is generally speaking better than having less of it, that is almost an instinct, which they feel as a frustration when they lose material, giving something without gaining anything in return, while for example the concept of sacrifice is advanced and has to be learned, ie acquired after some more experience with the game, as it involves further concepts, those that are exchanged for material.
Anyway, as chess engines do not speak natural language, and are mainly agnostic about human abstract concepts, at least those modern, self taught ones (in the sense that these concepts are not built into them), there is a gap that we need to overcome if we want to translate their knowledge to something that is comprehensible to us, besides the moves, that we clearly see are superior to those that human player is able to produce. And there comes into play a software company like Decodea, their mission is to produce such translators, for various domains of human knowledge, the one for chess they have named DecodeChess. I investigated it a bit, by watching this video: https://youtu.be/-JpQEByxpzY
Obviously, when I speak about chess engines, I speak in terms of standard chess software architecture that is not monolithic, it identifies three main parts, an engine, which is responsible for analysis of positions (Stockfish, LCZero...), a graphical user interface, which is a front end application that accepts user input and provides output to the user (Arena, Scid vs. PC...), and a protocol (UCI, CECP) by which these two components communicate. The engine is pluggable into the user interface if the interface supports the protocol for which the engine is written. The translator/decoder, which is a separate component, sits in between and interprets the input moves, before presenting its results via user interface, consulting in the process its own repository of human knowledge, those abstract concepts and ideas on what constitutes efficient play, matching them with data received from the engine and from human players, recognizing tactical and strategical patterns and presenting them in the form of explanations written in natural language why are the moves suggested by the engine good, and those played by human not that good, when such is the case. So when it detects that pin is created or threatened it reports that as good for the side which created it, or threatens to do so, or when it sees that an open file is taken under control it reports that as good for the side which took it, etc. That is a correct approach, and not quite a trivial task, although, the objection from the guy who talked about Nimzowitsch rules and Steinitz rules is on spot, regardless of the fact that he did not use the best term to describe what he meant (he said rules, as people often do, but he meant abstract concepts and ideas on how to play efficiently), and regardless of whether the objection still stands or not. Namely, if the machine learning process during which the translator is trained to recognize patterns is strictly supervised, unable to distil its own patterns from the data it receives from the engine, and update the previously mentioned repository with some new, inhuman knowledge, instead of just using existing for supervisory reference, then the objection still stands, because they did not upgrade it yet to that level, to enable unsupervised learning. I know it is easier said than done, but if DeepMind managed to produce MuZero, a program that not only finds out by itself how to play efficiently by the rules given, such as AlphaZero does too, but it also finds out by itself what are the rules of the game in the first place given the chance to play, I don’t see why Decodea would not be able to produce an enhanced decoder that would actually be able to extract new abstract chess knowledge by analyzing engine’s play and teach even human grandmasters some new abstract concepts and ideas, that seems like a comparable effort to me. I don’t know if I got it right, but from my layman point of view, the principal difference between AlphaZero and MuZero is that the former one has a built in legal_move_generator function and a function recognize_terminal_game_state (mate, stalemate, draw by insufficient material, draw by repetition...) which means it knows the rules completely in advance, prior to NN training, that serves only to enhance evaluate_position function, while the latter one utilizes NN training to build from scratch the first two functions, as well as to enhance the third. Actually, this is not right distinction, as the starting point for all three functions is zero knowledge, ie random NN weights, the important difference between these functions is that the first two can be learned perfectly, while for the third, the law of diminishing returns applies with respect to the number of NN training games, and possibly with respect to growing NN topology. Does it mean that the game rules should be somehow extractable from MuZero NNs into a human understandable code? Can the same thing be done with the knowledge of evaluating positions, and “decoded” into natural language?
Of course, there is a possibility that there is no such new abstract concept unknown to human, and the only reason why computers can play better is because they can apply more consistently the concrete ideas which present concretization of abstract ideas already known to human. And of course, computers are able to show new concrete ideas even to the best informed, most knowledgeable human players and they do that all the time, thanks to their superior capability to explore the game tree, which is vast, but in order to do just that, engines are sufficient, no need for decoders.

Unfortunately the current situation with Decodea decoder is still slightly worse, and even the initial intended more modest result of explaining the moves in terms of already known concepts is not yet fully achieved, let alone something more. I can compare the state of the art with the translation from English to Croatian by using Google Translate: the original English text is much more understandable to me than the produced Croatian text, and I am a native Croatian speaker. That is not helpful at all, except maybe to some native Croatian speakers who do not speak a word of English. They might gain at least a certain clue what the text is about, but to me it is actually confusing and annoying.
Let me illustrate my comparison with a second example of analyzed position in the same video, there is a summary that explains why is Nb4 a good move, this is because it:

threatens to play Nc2+
enables Bxf3+
allows playing Bxf3+ and prevents playing Qxf6
lures the white pawn to f3 and steps into a dangerous place

As none of this makes any sense if one fails to see that 17...Nb4 is actually an indirect checkmate threat, which does not allow 18.Qxf6 because of a forcing line 18...Nc2+ 19.Ke2 Ba6+ 20.c4 Bxc4#, and there is no better alternative to 18.cxb4, for example 18.Be2 Nc2+ 19.Kf1 Qxg5 20.Nxg5 Nxa1 is worse, and the commentary presented in the video does not explain that, I was not lazy and I opened an account at DecodeChess, and there I opened the same example myself in order to see if I can get that analysis by expanding the hidden text (pressing the yellow plus sign button on the right). And I cannot see that panel properly, all content, for some reason, but it seems that these lines are there, strangely scattered, in not particularly concise way as I presented them. And the text that is visible without expanding hidden panes, does not explain properly in human way the tactical idea of that complex combination: due to previously explained reasons black can make a clearing sacrifice of the knight on b4 (clears the path for his bishop), decoying sacrifice of the rook on d1 (lures the opponent’s king to that dangerous square), after which exchange of the bishop for the knight on f3 comes with check, and at the same time it removes the only white’s queen defender, so that black can pick it up in the next move, with a net material gain of queen for rook and knight, which in this position should be comfortable advantage for black. The combination is actually even longer, I did not mention the exchange of one pair of rooks on d1 in the middle of it.
Yes it is all there, recognized, and somehow mentioned, but not in a sufficiently succinct way, and the sentence “lures the white pawn to f3 and steps into a dangerous place” sounds more silly than explanatory. Credit is due to what’s been done, I hope it will get better, and I also hope constructive criticism could be accepted. But the main problem stays: if it can only explain things that I already understand, and things I could grasp by my self from direct communication with an engine, in that case that translator or decoder does not fully meet its purpose.
The development of chess engines is many years ahead of development of chess knowledge decoders due to several reasons, primary one being the fact that there is a large and vibrant community of chess engine developers, that organize chess engine competitions, with occasional inclusion of biggest players such as IBM and Google, while Decodea is not accompanied or challenged by a large community of active developers researching the same area, which is a pity, because what they do is as much important and exciting. An initiative by Herik, Herschberg, Marsland, Newborn and Schaeffer to establish a competition whose objective was to produce the best chess annotation software possible, died after a couple of years, regretfully. That was The Best Annotation Award, an annual contest described here: https://pure.uvt.nl/ws/portalfiles/port ... ST____.PDF , if it was alive DecodeChess would be one of the main competitors there.
What constitutes a proper chess commentary was theorized in depth by David Levy, Ivan Bratko, Matej Guid, to name just a few among many others. The caveman approach to implementing that functionality in a computer program would be to read the input, not yet annotated game file, iterate through its moves by submitting them sequentially to the engine that is used, and when the difference between the quality of the move played and the quality of the best move available in the position reaches certain threshold, expressed in score unit of centipawns returned by the engine, detect that as a serious mistake that requires comment and print a principal variation which is also returned by the engine, as an annotation for any such move into the output annotated game file. The typical insufficiency of such a not sophisticated approach is that it misses refutations of alternative moves that might appear appealing to a superficial human analyzer, and that were not played, as well as explanations why was important to play those that were played, if reasons are not so obvious to a superficial human analizer. So for example in the analyzed position, after 17...Nb4 was played, such a simple annotator would fail to comment 18.Qxf6 simply because the move that was actually played, 18.cxb4, was the best available at that moment. The only way in this case to get the machine’s advice on why 18.Qxf6 is bad, is to ask the engine directly, which defeats the purpose of the annotator, because it fails to explain automatically all that is relevant, even tactical ideas, let alone strategical ones. In other words, such a program completely fails because it lacks the notion of obviousness, importance, and relevance. Even DecodeChess, which is much more sophisticated program misses some of that, when it reports that 17...Nb4 “lures the white pawn to f3 and steps into a dangerous place”. In both cases the problem is that the program is too rigid in making decisions on what to comment and how.
I know that explaining chess software architecture is not that important or relevant when we talk about DecodeChess, since it is an integrated product or unit, about which one doesn’t have to worry what to plug in it or what to plug it in. I explained it because I was annoyed by the fact that when I asked people online what chess annotating software would they recommend, some of them started to mention engines. Obviously it doesn’t matter if chess annotator is a standalone program with no other purpose, or if it is integrated into general purpose chess GUI, what matters is that using stronger chess engine will not solve the problem I just described, because the functionality in question is implemented in the annotator, not in the engine, so there is no point to mention them. To understand such things, one should always keep in mind the notion of separation of concerns between software components, and have a clear picture of their responsibilities.
Anyway, the attempt to extract the abstract knowledge was not connected only with engines as sources or oracles of that knowledge, but with endgame tablebases too, for example https://ailab.si/matej/doc/Deriving_Con ... ebases.pdf . The subject has a lot of history, but its future is actually more interesting. And a few mentioned concepts are just a tip of the iceberg of what actually exists in that game, and then some, as one can easily imagine, considering the fact that one can practice that immensely rich game whole his life, and still not be particularly good at it.
But chess is not only a game of logic and tactical and strategical planning, other factors are important, such as memory, visualization, focus or concentration. Although each chess player regardless of his or her strength has to have certain visualization capabilities in order to be able to analyze a few moves ahead, without actually moving pieces on the board (because rules do not allow that), that is immensely easier when you can look at the board. At least to an average person, not so much to a top grandmaster, but, can they explain how they acquired such an amazing skill, like being able to play blindfold? Saying that this presents a whole another level of visualization capability that although not required by standard rules, greatly helps in standard circumstances when looking at the table is allowed, doesn’t explain much how is this actually achieved. The only explanation offered by Alexander Grischuk https://youtu.be/B3SXVN6KSNc?t=1340 was that it came natural to him during his childhood, as it should to any future grandmaster, ie not a result of some conscious effort and systematic practice, while I tried to follow a couple of recipes offered by others, to no avail. So either these explanations were not good enough, or I did not follow them properly and on time, result is the same: I cannot memorize the board, just as Grischuk cannot speak Chinese although he tried to learn it. Which I know because he said that a few minutes before the moment I chose as a starting point for playing this video, when I pasted its link here. Before writing this essay, I did not know how to pass that information (Start at...) along with the link to a youtube video that would otherwise start from the beginning, at timestamp zero, and now not only I know that, but I am also fairly sure I can explain that to pretty much everyone interested, in several ways, depending on their prior knowledge. This is because explaining properly how to learn during adulthood a new language which is very different compared to your own native one, is much harder task than explaining properly how to add timestamp parameter to a youtube video link. Which is connected to the amount of information the explanation contains, and reliability of passing that information. And if we accept a “task” as a fundamental notion needed to explain nature, then “explanation” would be “information needed to accomplish a task”. Moreover, explanation is to a human the same thing as program is to a computer, instruction it can follow and execute. Of course one can argue this is just one aspect of explanation, not its full characterization, because one can follow instructions without fully understanding them. Nevertheless, following that logic, every living organism that can pass useful information, can produce explanation, it is only a matter of surpassing a communication barrier between the one who tries to explain and the one who tries to understand. Right? David Deutsch seems to disagree, here in this TED interview https://www.ted.com/talks/the_ted_inter ... n#t-889066 , when asked by Chris Anderson:
“A lot of people would say, look, every species knows something. A dog knows that a bone tastes delicious, but it doesn't know scientific theory. We know a certain amount of scientific theory, but it's ridiculous to imagine that we could know you know, that there must be a whole world of things out there that we are never even in principle capable of understanding. And you dispute that. Why? Why?”
David Deutsch replied:
“I've already explained why the dog is inherently different from us. It's because the dog knows that the bone tastes good because some of its ancestors who didn't know that, died. And the dog doesn't actually know anything, its genes know that. And there are certain types of things that can become known that way. But the vast majority of things in the world, in the universe, cannot become known that way, because the dog cannot try to eat the Sun and be burned and that kind of thing.”
So, a lot of vague instructions present much higher barrier to understanding than a few precise ones. Actually, a lot of precise instructions given in precise order that can be reliably memorized are much easier to get acquired and applied than just one vague instruction, but in that case one can at least focus much more easily on removing the vagueness. This is the reason why it is possible to train a dog to sit on your command, or search for drugs, or search for a missing person, but it is impossible to make conversation like Doctor Dolittle. As some of that potentially saves human lives, it is actually odd to read such comments about dogs from a person who is obvious Nobel prize candidate.
This is also a reason why it is much easier to remove just one bug from the program, than several combined ones, if we consider debugging as a way of communication between human and computer, during which human tries to remove vagueness of instructions given to a computer. If bugs are isolated in their effect, then the effort to eliminate them is proportional to their number, if we assume they are equally hard to get eliminated.
One such vague instruction is that in order to learn a foreign language to the level of being able to speak fluently and sound like a native speaker at the same time, one should not only study it like it is taught in school, but learn it in the way small children learn their native language. Which sounds logical, but it also requires further explanation, how exactly is that performed by an adult? I have found that insight shared on the internet by the excellent polyglot Luca Lampariello, who demontrates validness of his methods as soon as he begins to speak. He is an Italian who speaks fluently Chinese and Russian, among a dozen of other languages, but he failed with Japanese, to a certain extent, what he describes as a failure I would most probably describe as a great achievement, if I was ever able to reach that level of fluency in any foreign language. I was impressed by the longitude of explanation of that failure, and by the steps taken to improve his skill, such as sessions with another guy, Matt Bonder, an American who managed to learn Japanese fluently. So, there is a lot to know about it, and make an introspection, how did we manage to learn those languages that we speak, and those things that we know in general?
Finally, there is maybe a crucial aspect of an explanation, captured by Deutsch when he says:
“Well, human-type creativity is different from the creativity of the biosphere, in that human creativity can form models of the world that say not only what will happen, but why. So an explanation, for example, is something that captures an aspect of the world that is unseen. So, explanations explain the seen in terms of the unseen”
This describes a scientist as someone who tries to decipher conjurer tricks performed by nature, but this is a subject for another essay, explanations in science. Although, if we assume that moves represent the seen, and abstract concepts the unseen, then we may conclude that there is no reason to make distinction between chess explanations and scientific explanations?
Hrvoje
Posts: 31
Joined: Sat Jan 19, 2019 4:37 am

Re: Zero knowledge equals random NN weights?

Post by Hrvoje »

Initially I envisaged this discussion to be about explanations in the context of chess, and in the last minute I changed my mind about the title, which usually is not good. If we assume that we have a precise definition of information, can we define precisely what information counts as an explanation? For example, phenotype, which is seen, is explained in terms of genotype, which is unseen. Or at least it was unseen until it was discovered, and now it became totaly visible and seen as the DNA analysis is a routine procedure. Now if anything is still unclear about phenotype and genotype and their relation, it will have to be explained in terms of the next, so far unseen thing, like epigenotype. But, the theory that phenotype is caused by genotype, definitely counts as a scientific explanation, in accordance with Deutsch’s description. Now, as intelligence and ability to accumulate knowledge during the life of an individual, as well as over generations, are physical characteristics of species, they are result of their genes, both in case of a dog and in case of a human. So, a human is inherently much more intelligent than a dog, that is a difference that allows us to know much more than dogs do, but there is no difference with respect to genetic causality of that knowledge, or is there?
Hrvoje
Posts: 31
Joined: Sat Jan 19, 2019 4:37 am

Re: Zero knowledge equals random NN weights?

Post by Hrvoje »

Let me explain my point further by giving an example of what knowledge exactly are we talking about here, that engines and endgame tablebases possess, but in much more concrete form, which raises the question of software possibility to extract it from them in an abstract form, like I will present here, without using software.
Pawnless endgames with major pieces, are a good start because some of them are most elementary checkmates, simplest patterns to describe, such as K+Q vs K and K+R vs K.
In both cases, the lone king must be forced to the edge (or to the corner) of the board in order to get checkmated, due to a lack of other pieces (its own or its opponent’s) that could constrain its mobility additionally in sufficient way, if they were present.
This can always be achieved by squeezing it from the center, by placing our king in opposition (an even simpler concept which also requires explanation/description), and checking it from the side, with the major piece. Actually, this is needed only in the rook case, queen alone can squeeze the opponent’s king to the edge by itself, without a help of its own king, one should only watch out not to squeeze it too much, into the corner, where it would be stalemated. But as queen can be underused, ie used as a rook, we can explain the rook pattern as applicable for both major pieces, and get back to show how queen can be used better, according to its full potential.
In order to prepare opposition, major piece is placed to a line adjacent to opponent’s king (obviously not within its reach of one square to avoid capture), chosen opportunistically so that squeezing takes minimal number of moves/lines from the target edge.
If the opponent’s king is strictly in the center (d4, e4, d5 or e5), then this strategy requires more moves to accomplish the goal in comparison with cases when it is nearer the edge, and we can choose d or e file, or 4th or 5th rank, to cut the board in two parts, depending on where exactly is the opponent’s king, and where is our king, which should be on the other side of the board than opponent’s king, ie major piece is also supposed to be placed on the line that is between the two kings, and it should stay there until its own king approaches the line adjacent to that chosen for the major piece. This is always possible because of two reasons, opponent’s king cannot approach our queen in order to squeeze it from the line it occupies, and rook is much faster piece than the king, so it can always maintain the safe distance from opponent’s king staying on the same line.
When the king arrives on the supposed line, next to the line occupied by its major piece, two lines far from the opponent’s king, it can force the opposition (position when it is standing on the same perpendicular line as the opponent’s king, two squares away from it) due to the fact that opponent’s king cannot run away from our king (assuming we take a stronger side for the purpose of this explanation) along the same line further than the edge of the board, and due to the fact that our major piece can easily “lose” a move if needed (staying on the same line, and on the same side of it, far from both kings) when opponent’s king changes direction, and starts moving towards our king, placing itself one square away from opposition. After that major piece move, lone king must either continue towards our king and step into opposition, or continue running away along the same line which may be not possible if it is already at the edge.
When it steps into opposition, our king covers all three squares in front of it, so when it gets checked from the side by the major piece, it must step back one line, as it cannot stay on the same line, and if there is no line to step back, then it is checkmated. Of course, it there is, our king advances one line following our major piece that already advanced, and the procedure continues. And if at any time the lone king steps back voluntarily one line without being forced by side check and opposition, the major piece first takes advantage of that by advancing one line, and only then our king follows.
It must be noticed that running away from our king towards our major piece does not make any problem, because rook can move along the same line to the other edge to avoid capture, transposing into previous case in which opponent’s king can either move toward the edge, or towards our king (which now stands again between our rook and opponent’s king, in terms of lines perpendicular to the chosen line along which the rook moved), while the queen does not even have to move, as it cannot be captured.
That explains how queen can be used better, to squeeze the lone king towards the edge without engagement of our own king, which joins forces with the queen only at the very end, by giving support for queen’s delivering kiss of death to opponent’s king at the edge of the board.
So, in the case of queen, major piece can not only stay on the line adjacent to opponent’s king, but it can also squeeze it along that line towards the edge, forming each time horse like L shape configuration between two pieces, ie one rank two files away, if moving along the rank, or one file two ranks away if moving along the file, if lone king tries to keep the position on the same line. In that case it will soon reach the edge, when it can be cut off by queen to stay permanently on that edge, until our own king arrives near, or, it will have to step back one line, and then queen will advance one line maintaining the L shape configuration, regardless of whether the king stepped back diagonally towards the center, which is the most resilient option, or in any other of two possible ways. Basically queen follows the lone king by making “king like”, one square moves. Eventually this will also squeeze the king to the edge.
Of course, if the king steps back into a corner, queen will not follow the same L pattern, instead of that it will move one square further on the penultimate line, leaving two squares free on the edge line for the king to be able to move, until its own king comes near to provide support for the final blow, which can be frontal contact of queen and lone king, diagonal contact if the lone king is in the corner, or contactless side check while its own king constrains the lone king in the manner of opposition.
Notice how the whole explanation is devoid of any concrete position, which makes it abstract, but not any less valid or precise than that what an engine or an endgame tablebase can present. Moreover, this is how human mind learns and memorizes these things.
OK, although these endgame concepts are basic, they may be not the simplest ones to describe, but others are either less basic and more complex to start with, like K+B+N vs K, or are built upon these (because they include pawns, which are promotable), so in the way they include the complexity of these presented, so I chose them.
But any abstract explanation of these two endings cannot be significantly different from this, although it can be surely more concise, simpler, precise and elegant. Probably if I could have expressed it in a not yet existing CCDL (Chess Concept Description Language), instead of in natural language, it would have been so. Although I made an effort to make it precise and concise, I did not achieve the result similar to what you get when you write in a domain specific strict language.
However, I cannot get that from DecodeChess either, which misses such patterns in its repository of human knowledge, and therefore cannot recognize them and present them, let alone create such explanations automatically as a result of machine learning and store it to knowledge repository, in order to be able to use them later in its explanations. This is a screenshot of what I got for such a position that clearly demontrates my point:
Decodechess endgame explanation
Decodechess endgame explanation
147631B1-AE7B-478A-91F6-FBCDA1EB0619.jpeg (30.92 KiB) Viewed 1817 times
Post Reply