Machine Comprehension

The utility of AI products is founded upon their capacity to understand content - what is meant when a car’s left indicator flashes, or the meaning of a spoken word, such as ‘rain’. However, it is the context in which the content exists that gives it meaning - an indicator flashing left, when the car turns right, is an expansion of context; just as what may sound like ‘rain’ is actually ‘reign’ when we're talking about monarchies.
For AI to be effective, it must possess a sophisticated way of understanding context. Not only does this highlight the pertinence of more data points, it draws attention to the parameters on which AI is created, which defines its understanding of context. HUMAN offers a bottom-up solution; it allows millions of humans to contribute more data to machine learning practitioners, while also assisting in the problem of the limited outlooks of those who train and write the AI algorithms, by providing a system in which the computers themselves can ask for the data they need.

Why is comprehension hard to attain and hard to evaluate?

If we had true comprehension of tex translation between English and Chinese would be reliable and good.
  • f you met a man 5 years ago and asked ‘what is your wife’s job,’ how many assumptions would be layered within that one question?
  • For each example we can identify, how many examples are we blind to?
The reason we can point out these examples, and understand them, is only because our personal experiences include that context.
  • When you ask individuals in Miami or Saudi Arabia, “is this a long sleeve shirt,” you receive different definitions.
The training data, libraries, and ontologies we give to our AIs have been shaped and hard-coded by a narrow sliver of our population and reflect the experiences and biases of the author(s): grad students, data entry specialists, architects etc.
This results in blindspots, “edge cases,” and encoded bias: an indelible fingerprint of individuals who defined the library or asset. This is especially well observed in healthcare where data indexing performance is a function of not only contextual information, but also of the strength and extent of mappings of synonyms and hierarchies between concepts (ie there are 12,000+ coded concepts, including a specific code 33XD, which is a billable code used to specify a medical diagnosis of sucked into jet engine, subsequent encounter), so when there is an incorrect response it is far more obvious).
  • But, in healthcare, disorders and diseases are constantly researched and re-classified as we learn more, and:
  • individuals or expert panels are not sufficient to define and redefine these relationships
Concepts are not only mapped and evaluated with tools such as graph analysis, but also through supplying of context through libraries. This has resulted in proprietary “black-box” libraries and methodologies for refining and layering upon the ways in which concepts are linked and mapped -- much like an apprentice relationship, this information is frequently passed from scientist to algorithm through added layers of context supplied as conditions.
Context provides meaningful and important information, and in many ways these issues are more obvious in specialized domains because of the robustness and variety of standards for classification (ie, failures aren’t silent, they are explicit in healthcare, and we can learn from them).
Example: “AFP” in healthcare:
  • in a paragraph about the kidney and labs, AFP most likely stands for “Alpha FetoProtein,” a lab,
  • but in a paragraph about symptoms and the face, it is more likely to mean “Atypical Facial Pain.”
  • Healthcare, as a field, only elucidates these failures because robust coded libraries profiliferate and this makes the failures more obvious.
Example: Mr. Huntington has Huntington’s Disease and lives on Huntington Street.
  • Huntington means three different things, depending on context and these terms may be repeated in a variety of different sentences throughout the document.
This applies not only to text, but also to visual representation learning. It is easy for It is a challenge to label and understand these differences throughout not only the structured elements, but also through unstructured writing
  • Proper classification is important for being able to achieve data interoperability and utility:
  • Proper classification is essential to higher order functions, such as useful predictive and outcomes modeling in healthcare and other specific domains.
  • Attempting to load data into a visualization dashboard such as Snowflake can illustrate these issues rapidly.
Because of the robustness of tools in healthcare and the work on libraries in this arena, failures do not happen silently. Similarly robust tools do not yet exist for many other domains.
In this example, several factors influence performance
  1. 1.
    Identification of document section
  2. 2.
    Identification and refinement of relationships between context
  3. 3.
    frequently the author identifies the edge cases and then creates more robust conceptual maps or contextual classifiers to address that edge case.
Context is not only important to comprehension and translation, but also feeds back into adequate solutions for problems like refining and improving handwriting recognition. Today, our contextual bias is hard coded into the conditions and libraries we supply to our algorithms.
Example: only half of people consider a truck to be a type of car, but most humans consider both trucks and cars to be types of automobiles. Computers to identify images which contain a man or a dog or both, but hard to identify that an image is of a man petting a dog or playing with a dog.
  • Many categories of tasks involve understanding the physical world and interactions in the physical world via embodied agents or proxy learning from humans or robotic bodies i.e. some concepts are beyond linguistic representation.
  • Furthermore the concept of cultural context extends to the visual world, consider why airplane stewards and stewardesses will point with two fingers instead of one, because pointing means different things in different regions and can even be offensive for some.
  • Understanding of abstract concepts and reasoning in the world without assigning language symbols is an area of interest and research, and the overlap between these domains is useful when it comes to human labeling tasks.
While it is easier to demonstrate this issue in the field of language, it rings true for symbolic systems and the portability of concepts around language. There exists a hidden translation layer between language / visual input and meaning which is broadly encoded in societal context.
Beyond translation/communication, these same issues apply to scene understanding, segmentation, detection, information retrieval, quality evaluation, understanding of human preferencing, tracking, measurement, perception for planning and even real-time decision making.
Multimodal conceptual understanding and transfer learning will be essential to progress in similarity-based learning. Visual + Linguistic (V+L) learning is an area of deep interest because:
  • There is a wealth of video-based data available online for learning
  • Video or live-experience is a primary mode of learning for humans
  • Trends in machine learning surround embodied learning
    • HUMANs learn and our brains have evolved through interactions in the world
    • V + L connects to this in a shortcut way through examples of videos and curriculums of videos that illustrate objects in their natural environments, behaviors etc are useful in associating things in the real world and may be useful for reinforcement learning. - Can we detect shortcut learning (through explicable AI) and use that to determine if shortcuts should be valid or invalid?
    • Beyond textual hierarchies, the combination of conceptual relationships and proxy real-world experience is a powerful component which has shaped the evolution of the human mind, therefore we believe it is likely to be a helpful component of curriculums for reinforcement learning.
Everything derives from a fundamental understanding, higher order layers should be pulling from much more robust contextual information:
  • If we ‘understand’ that “put one foot in front of the other” means “to take small steps to achieve a goal,” then translation of this concept would occur properly.
  • And If we solve these underlying contextual meta-translation issues and solve at this deeper layer, higher order tasks would improve.
Most people have exposure only to one such societal context - there is no one person who can create the standard because they have not lived a million lives in a thousand places. As such, the only way to create a robust meta-translation layer is through massively distributed technology - together we must define the structure of that translation layer and populate it with relational information.
For us to create broader dialogue between AI and HUMANity, in a global context, we must make concepts and education more approachable for a wider group of individuals, prioritizing collaborative efforts around education and pursuing simplicity in our explanations.
© 2023 HPF. HUMAN Protocol® is a registered trademark