Building a Search Engine in the Social Media Era SDForum Meeting

2010, Mar 24  —  ...

The topic for the SDForum Software Architecture & Modeling SIG meeting on March 24, 2010 was titled Building a Search Engine in the Social Media Era presented by AJ Chen, Technical Architect, Healthline Networks Inc. The presentation started off by reviewing some of the issues that the social media environment bring forth and then jumped into various search architectures from traditional, to real time, to social, and finally semantic.

Unfortunately, at the time of writing, the slides are not available but they provided some wonderful details about various search architectures including Aardvark.

The following are the notes that I took during the presentation:

Building Search Engines in the Social Media Era
  • AJ Chen, Software Architect at Healthline
  • Working on applying semantic technology to make the search engine better
  • High level talk about search
  • Agenda
    • the changing search environment
    • traditional search engine architectures
    • new architectures for real time srach and social search
    • semantic search
    • emerging real time web monitoring
  • The Changing Search Environment
    • Users expect more from search box
      • find docs
      • discover information
      • answer specific questions
      • From this content
        • docs, web pages, blogs, Q&A, forums
      • But now there is more content
        • Status updates and social networks
        • Real-tme data streams (API)
        • Web of data: linked semantic data, databases
      • Users expect more from "search"
        • check real-time status update
        • check my social circle
        • monitor social media
      • Semantic data - object on the web that can be located by a URL
      • Search technologies
        • Full text search technologies
        • Semantic search technologies
      • Infrastructure
        • More cloud offerings.
    • Open Source Lucene/Solr
      • Lucene stack
        • App
        • Solr
        • Lucene Java/Tika
        • Crawlers and Connectors
        • Application Infrastructure
        • Java
      • Inverted index - more like the index at the end of a book
      • Solr
  • New Architectures for Real time Search
  • Social Search
    • Twitter profiles, chat buddies, Picasa, google reader
    • My Google Profile
    • Social Graph
    • Index
    • Google Aardvark
      • Good example of something new and different
  • Semantic Search
    • Bing/Powerset NLP Search Engine
      • Question/Answer is still in infancy
    • Sindice: Retrieving Semantic Data
      • index API: sindice.com
      • search engine: sig.ma
    • Google has a "semantic" search engine: Wonder Wheel
      • Shows related topics
    • Hybrid Search Engine
      • best approach
    • http://www.w3.org/TR/rdf-sparql-query/
  • Social Media Monitoring
    • Benefits for business
      • understand what people are talking about the company, brands, products, and competitors, etc.
      • identity leads for marketing and sales
      • engage with customer and community conversations
      • support customers
      • cultivate product advocates
      • use customer feedback to improve products and brands
    • Differences from search engine
      • search engine for filtering
      • real-time text analysis and semantic analysis
    • Basic monitoring slide
    • Scaling up Social CRM
      • Problem: Today's social media program does not scale
        • If you have a lot of products, it will be difficult to go through each social media blip
      • Solution: build a pipeline of social context flow and then apply advanced text analytics to automate routing of social contexts
      • Social Media -> Monitoring -> Text Analytics -> CRM
    • Social CRM Monitoring Integration in the Cloud
    • Text Analytics
      • Apache Mahout
      • IBM Cognos open source
  • Q & A
    • Wolfram-Alpha: semantic search but uses statistics and not an ontology