GSA (Google Search Appliance) integration with IBM Websphere Portal/ WCM

GSA (Google Search Appliance) is Google's enterprise search product. Recently I had to work with GSA as search solution in the IBM Websphere portal platform .  Main use cases are crawling the WCM seeds (including the binary documents and WCM content)  and portal content.

Checkout below for more details regarding the GSA Basics

  1. GSA Feeds
  2. Crawling Content 
  3. Collections in GSA
  4. Metadata search

There are two simple ways we can feed the portal/WCM content to GSA

  1. Writing proxy component
    1. Get the portal /WCM seedlists from the IBM system (using the IBM out of the box seedlist framework)
    2. Parse IBM out of the box seedlist content and
    3. Generate the GSA compatible seedlist
    4. Post the GSA compatible seedlist to GSA server
  2. Generating the GSA compatible Feed directly
    1. Write custom component using the IBM portal/WCM API
    2. Generate the feed in GSA supported format
    3. Post feed to GSA 

1 comment:

  1. Thank you for your interesting article Siva. Sadly they discontinued the GSA, here is a petition to extend support for the GSA: