Lecture
l Semantic lexicon of the English language
l It consists of synsets (meanings) l Synset:
l a few synonymous words
l description of meaning
l One word - several synsets (meanings)
l 150 000 words, 115 000 syncets, 207 000 pairs "word - synset"
l Nouns
l Hyperonyms : Y is a hyperonym X , if X is a type of Y l Hyponyms : Y is a hypononym X , if Y is a type X
l Equal in rank : X and Y are equal in rank if they have a common hyperonym
l Holonyms : Y is a holonim of X if X is part of Y
l Meronyms : Y is a meronym X , if Y is part X
l Verbs
l Move - Hyperonym Run
l Whisper - Hypony Speak
l Sleep - follow Snore
l Walk - equal in rank Run
l WordNet is used in the Start system when searching for matches with T-expressions.
l Let the database have a T-expression <bird can fly>
l Canary - Bird Hyponym
l To the question: “Can canary fly?” Start will answer “Yes”
l "Universal Base"
l Used to make requests for facts
l Model "object-property-value"
l Example: “Federico Fellini is a director of La Strada”
l Object: La Strada l Property: director
l Meaning: Federico Fellini
l Each object is associated with a data source (data source):
l star wars imdb-movie
Question
Who wrote the music for Star
Wars?
Who invented dynamite?
How big is Costa Rica?
How many people live in Kiribati?
Object Property Value
Star Wars Composer John Williams
Dynamite Inventor Alfred Nobel
Costa Rica Area 51,100 sq. km
Kiribati Population 94,149
What languages are Guernsey Languages English, spoken in French
Guernsey?
Show me paintings Monet Works [images]
by Monet
“Victor Fleming directed Gone with the wind”
l Benefits:
l Uniform database query format
l Natural use of the model
"Object-property-value" l Disadvantages:
l Need to write a "wrapper" for each data source
l Wikipedia
l The World Factbook 2006
l google
l yahoo
l The Internet Movie Database
l Internet Public Library
l The Poetry Archives
l Biography.com
l Merriam-Webster Dictionary
l WorldBook
l Infoplease.com
l Metropla.net
l weather.com
l New Internet Development Concept
l The problem of machine analysis of information posted on the web
l All information on the web should be posted in two languages:
l human
l Computerized
l To create a computer resource description, the RDF (Resource Description Framework) format is used, based on:
l XML format
l Triplets "Object - Relationship - Subject"
l It is proposed for each information block to make an annotation in natural language
l Compromise between machine-readable and natural description of information
l The knowledge base stores only annotations with source links attached.
l Effective organization of access to information of any type:
l Texts
l Pictures l Multimedia
l Databases
l Procedures
l Annotations can be parameterized.
l Embed annotations:
l Add annotations to RDF document descriptions
l Using parameterized annotations
(information access schemes)
l Using answer finding schemes
l How many people live in Kiribati?
l What is the population of Bahamas?
l Tell me Guam's population.
1. <rdfs: Class ID = "Country">
2. <rdfs: comment> A Country in the CIA Factbook </ rdfs: comment>
3. </ rdfs: Class>
4. <rdf: Property ID = "population">
5. <rdfs: domain rdf: resource = "# Country" />
6. <rdfs: range rdf: resource = "xsd: string" />
7. <nl: ann text = "Many people live in ? S " />
8. <nl: ann text = "population of ? S " />
9. <nl: gen text = "The population of ? S is ? O " /> 10. </ rdf: Property>
l What is the largest area in Africa?
l Tell me what Asian country has the highest population density.
l What is the lowest infant mortality rate?
l What is the most populated South American country?
1. <nl: InformationAccessSchema>
2. <nl: ann> country of the region
$ attribute </ nl: ann>
3. <nl: pattern> ? x a: Country </ nl: pattern
4. <nl: pattern> ? x map ($ attribute ) ? val </ nl: pattern>
5. <nl: pattern> ? x : location $ region </ nl: pattern>
6. <nl: action> display (boundto ( ? X , max ( ? Val ))) </ nl: action>
7. <nl: mapping>
8. <nl: hash variable = " $ attribute ">
9. <nl: map value = "population">: population </ nl: map>
10. <nl: map value = "area">: area </ nl: map>
eleven. ...
12. </ nl: hash>
13. </ nl: mapping>
14. </ nl: InformationAccessSchema>
l Is Canada's coastline longer than Russia's coastline?
l Which country has the larger population, Germany or Japan?
l Is Nigeria's population bigger than that of South Africa?
1. <nl: InformationAccessSchema>
2. <nl: ann> $ country-1 ’s $ att is larger than $ country-2 ’ s $ att </ nl: ann>
3. <nl: pattern> ? x a: Country </ nl: pattern
4. <nl: pattern> ? x map ( $ att ) ? val-1 </ nl: pattern> 5. <nl: pattern> ? y a: Country </ nl: pattern
6. <nl: pattern> ? y map ( $ att ) ? val-2 </ nl: pattern>
7. <nl: action> display (gt ( ? Val-1 ,? Val-2 ))) </ nl: action>
8. <nl: mapping>
9. <nl: hash variable = " $ att ">
10. <nl: map value = "population">: population </ nl: map>
11. <nl: map value = "area">: area </ nl: map>
12. ...
13. </ nl: hash>
14. </ nl: mapping>
15. </ nl: InformationAccessSchema>
l What is the distance from Japan to South Korea?
l How far is the United States from Russia? l What's the distance between Germany and England?
l Plan of answering this question:
l Find the capital of one country
l Find the capital of another country
l Calculate the distance between them.
1. <nl: InformationPlanningSchema>
2. <nl: ann> distance between $ country1 and $ country2 </ ann>
3. <nl: plan>
4. <rdf: Seq>
5. <rdf: li> what is the capital of $ country1 : = ? capital1 </ rdf: li>
6. <rdf: li> what is the capital of $ country1 : = ? capital2 </ rdf: li>
7. <rdf: li> what is the distance between ? capital1 and ? capital2
8.: = ? distance </ rdf: li>
9. </ rdf: Seq>
10. </ nl: plan>
11. <nl: action> display ( ? Distance ) </ nl: action>
12. </ nl: InformationPlanningSchema>
l Benefits:
l Description of information in a universal, natural language
l One parameterized summary can handle hundreds of questions.
l Automate the annotation process
l Requests to Omnibase are also described using annotations l Disadvantages:
l implementation complexity
l Processing composite queries and their automatic decomposition
l Capacity building to increase information redundancy
l Automation of the analysis of semantic links in documents
l Introduction of annotations
l Specific answer-finding strategies for different subject areas.
Comments
To leave a comment
Creating question and answer systems
Terms: Creating question and answer systems