[Lucene]Search Using ASP.Net and Lucene

mikel阅读(666)

Search Using ASP.NET and Lucene

Getting Started

Getting Lucene to work on your ASP.NET website isn't hard but there are a few tricks that help. We decided to use the Lucene.Net 2.1.0 release because it made updating the Lucene index easier via a new method on the IndexWriter object called UpdateDocument. This method deletes the specified document and then adds the new copy into the index. You can't download the Lucene.net 2.1.0 binary. Instead you will need to download the source via their subversion repository and then compile it.

Don't worry this is an easy step. Using your subversion client – I recommend TortiseSvn get the source by doing a checkout from this url: https://svn.apache.org/repos/asf/incubator/lucene.net/tags/Lucene.Net_2_1_0/

Next go into the directory: Lucene.Net_2_1_0\src\Lucene.Net You should find a Visual Studio solution file that matches your Visual Studio version. If you are using Visual Studio 2005 be sure to load Lucene.Net-2.1.0-VS2005.sln

Hit compile. The resulting Lucene.Net.dll in the bin/release folder is the dll you will need to reference in your Visual Studio project that will contain the Lucene code.

Creating the Lucene Index

Lucene creates a file based index that it uses to quickly return search results. We had to find a way to index all the pages in our system so that Lucene would have a way to search all of our content. In our case this includes all the articles, forum posts and of course house plans on the website. To make this happen we query our database, get back urls to all of our content and then send a webspider out to pull down the content from our site. That content is then parsed and fed to Lucene.

We developed three classes to make this all work. Most of the code is taken from examples or other kind souls who shared code. The first class, GeneralSearch.cs creates the index and provides the mechanism for searching it. The second class, HtmlDocument consists of code taken from Searcharoo a web spidering project written in C#. The HtmlDocument class handles parsing the html for us. Special thanks to Searcharoo for that code. I didn't want to write it. The last class is also borrowed from Searcharoo. It is called HtmlDownloader.cs and its task is to download pages from the site and then create a HtmlDocument from them.

GeneralSearch.cs

using System;
using System.Collections.Generic;
using System.Data;
using Core.Utils.Html;
using Lucene.Net.Analysis;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.QueryParsers;
using Lucene.Net.Search;
namespace Core.Search
{
/// <summary>
/// Wrapper for Lucene to perform a general search
/// see:
/// http://www.codeproject.com/KB/aspnet/DotLuceneSearch.aspx
/// for more help about the methods used in this class
/// </summary>
public class GeneralSearch
{
private IndexWriter _Writer = null;
private string _IndexDirectory;
private List<string> _Errors = new List<string>();
private int _TotalResults = 0;
private int _Start = 1;
private int _End = 10;
/// <summary>
/// General constructor method.
/// </summary>
/// <param name="indexDirectory">The directory where the index 
/// is located.</param>
public GeneralSearch(string indexDirectory)
{
_IndexDirectory = indexDirectory;
}
/// <summary>
/// List of errors that occured during indexing
/// </summary>
public List<string> Errors
{
get { return _Errors; }
set { _Errors = value; }
}
/// <summary>
/// Total number of hits return by the search
/// </summary>
public int TotalResults
{
get { return _TotalResults; }
set { _TotalResults = value; }
}
/// <summary>
/// The number of the record where the results begin.
/// </summary>
public int Start
{
get { return _Start; }
set { _Start = value; }
}
/// <summary>
/// The number of the record where the results end.
/// </summary>
public int End
{
get { return _End; }
set { _End = value; }
}
/// <summary>
/// Returns a table with matching results or null
/// if the index does not exist.  This method will page the
/// results.
/// </summary>
/// <param name="searchText">terms to search for</param>
/// <param name="currentPage">The current results page</param>
/// <param name="hitsPerPage">The number of hits to return for each results page</param>
/// <returns>A datatable containing the number of results specified for the given page.</returns>
public DataTable DoSearch(string searchText, int hitsPerPage, int currentPage)
{
if(!IndexReader.IndexExists(_IndexDirectory))
{
return null;
}
string field = IndexedFields.Contents;
IndexReader reader = IndexReader.Open(_IndexDirectory);
IndexSearcher searcher = new IndexSearcher(reader);
Analyzer analyzer = new StandardAnalyzer();
QueryParser parser = new QueryParser(field, analyzer);
Query query = parser.Parse(searchText);
Hits hits = searcher.Search(query);
DataTable dt = new DataTable();
dt.Columns.Add(IndexedFields.Url, typeof(string));
dt.Columns.Add(IndexedFields.Title, typeof(string));
//dt.Columns.Add(IndexedFields.Summary, typeof(string));
dt.Columns.Add(IndexedFields.Contents, typeof(string));
dt.Columns.Add(IndexedFields.Image, typeof(string));
if(currentPage <= 0)
{
currentPage = 1;
}
Start = (currentPage-1) * hitsPerPage;
End = System.Math.Min(hits.Length(), Start + hitsPerPage);
TotalResults = hits.Length();
for (int i = Start; i < End; i++)
{
// get the document from index
Document doc = hits.Doc(i);
DataRow row = dt.NewRow();
row[IndexedFields.Url] = doc.Get(IndexedFields.Url);
//row[IndexedFields.Summary] = doc.Get(IndexedFields.Summary);
row[IndexedFields.Contents] = doc.Get(IndexedFields.Contents);
row[IndexedFields.Title] = doc.Get(IndexedFields.Title);
row[IndexedFields.Image] = doc.Get(IndexedFields.Image);
dt.Rows.Add(row);
}
reader.Close();
return dt;
}
/// <summary>
/// Opens the index for writing
/// </summary>
public void OpenWriter()
{
bool create = false;
if (!IndexReader.IndexExists(_IndexDirectory))
{
create = true;
}
_Writer = new IndexWriter(_IndexDirectory, new StandardAnalyzer(), create);
_Writer.SetUseCompoundFile(true);
_Writer.SetMaxFieldLength(1000000);
}
/// <summary>
/// Closes and optimizes the index
/// </summary>
public void CloseWriter()
{
_Writer.Optimize();
_Writer.Close();
}
/// <summary>
/// Loads, parses and indexes an HTML file at a given url.
/// </summary>
/// <param name="url"></param>
public void AddWebPage(string url)
{
HtmlDocument html = HtmlDownloader.Download(url);
if (null != html)
{
// make a new, empty document
Document doc = new Document();
// Store the url
doc.Add(new Field(IndexedFields.Url, url, Field.Store.YES, Field.Index.UN_TOKENIZED));
// create a uid that will let us maintain the index incrementally
doc.Add(new Field(IndexedFields.Uid, url, Field.Store.NO, Field.Index.UN_TOKENIZED));
// Add the tag-stripped contents as a Reader-valued Text field so it will
// get tokenized and indexed.
doc.Add(new Field(IndexedFields.Contents, html.WordsOnly, Field.Store.YES, Field.Index.TOKENIZED));
// Add the summary as a field that is stored and returned with
// hit documents for display.
//doc.Add(new Field(IndexedFields.Summary, html.Description, Field.Store.YES, Field.Index.NO));
// Add the title as a field that it can be searched and that is stored.
doc.Add(new Field(IndexedFields.Title, html.Title, Field.Store.YES, Field.Index.TOKENIZED));
Term t = new Term(IndexedFields.Uid, url);
_Writer.UpdateDocument(t, doc);
}
else
{
Errors.Add("Could not index " + url);
}
}
/// <summary>
/// Use this method to add a single page to the index.
/// </summary>
/// <remarks>
/// If you are adding multiple pages use the AddPage method instead as it only opens and closes the index once.
/// </remarks>
/// <param name="url">The url for the given document.  The document will not be requested from
/// this url.  Instead it will be used as a key to access the document within the index and 
/// will be returned when the index is searched so that the document can be referenced by the
/// client.</param>
/// <param name="documentText">The contents of the document that is to be added to the index.</param>
/// <param name="title">The title of the document to add to the index.</param>
public void AddSinglePage(string url, string documentText, string title, string image)
{
OpenWriter();
AddPage(url, documentText, title, image);
CloseWriter();
}
/// <summary>
/// Indexes the text of the given document, but does not request the document from the specified url.
/// </summary>
/// <remarks>
/// Use this method to add a document to the index when you know it's contents and url.  This prevents
/// a http download which can take longer.
/// </remarks>
/// <param name="url">The url for the given document.  The document will not be requested from
/// this url.  Instead it will be used as a key to access the document within the index and 
/// will be returned when the index is searched so that the document can be referenced by the
/// client.</param>
/// <param name="documentText">The contents of the document that is to be added to the index.</param>
/// <param name="title">The title of the document to add to the index.</param>
/// <param name="image">Image to include with search results</param>
public void AddPage(string url, string documentText, string title, string image)
{
// make a new, empty document
Document doc = new Document();
// Store the url
doc.Add(new Field(IndexedFields.Url, url, Field.Store.YES, Field.Index.UN_TOKENIZED));
// create a uid that will let us maintain the index incrementally
doc.Add(new Field(IndexedFields.Uid, url, Field.Store.NO, Field.Index.UN_TOKENIZED));
// Add the tag-stripped contents as a Reader-valued Text field so it will
// get tokenized and indexed.
doc.Add(new Field(IndexedFields.Contents, documentText, Field.Store.YES, Field.Index.TOKENIZED));
// Add the summary as a field that is stored and returned with
// hit documents for display.
//doc.Add(new Field(IndexedFields.Summary, documentDescription, Field.Store.YES, Field.Index.NO));
// Add the title as a field that it can be searched and that is stored.
doc.Add(new Field(IndexedFields.Title, title, Field.Store.YES, Field.Index.TOKENIZED));
// Add the title as a field that it can be searched and that is stored.
doc.Add(new Field(IndexedFields.Image, image, Field.Store.YES, Field.Index.TOKENIZED));
Term t = new Term(IndexedFields.Uid, url);
try
{
_Writer.UpdateDocument(t, doc);
}
catch(Exception ex)
{
Errors.Add(ex.Message);
}
}
/// <summary>
/// A list of fields available in the index
/// </summary>
public static class IndexedFields
{
public const string Url = "url";
public const string Uid = "uid";
public const string Contents = "contents";
//public const string Summary = "summary";
public const string Title = "title";
public const string Image = "image";
}
}
}

HtmlDocument.cs

using System;
using System.Collections;
using System.Text.RegularExpressions;
namespace Core.Utils.Html
{
/// <summary>
/// This code was taken from:
/// http://www.searcharoo.net/SearcharooV5/
/// 
/// Storage for parsed HTML data returned by ParsedHtmlData();
/// </summary>
/// <remarks>
/// Arbitrary class to encapsulate just the properties we need 
/// to index Html pages (Title, Meta tags, Keywords, etc).
/// A 'generic' search engine would probably have a 'generic'
/// document class, so maybe a future version of Searcharoo 
/// will too...
/// </remarks>
public class HtmlDocument
{
#region Private fields: _Uri, _ContentType, _RobotIndexOK, _RobotFollowOK
private int _SummaryCharacters = 350;
private string _IgnoreRegionTagNoIndex = "";
private string _All = String.Empty;
private Uri _Uri;
private String _ContentType;
private string _Extension;
private bool _RobotIndexOK = true;
private bool _RobotFollowOK = true;
private string _WordsOnly = string.Empty;
/// <summary>MimeType so we know whether to try and parse the contents, eg. "text/html", "text/plain", etc</summary>
private string _MimeType = String.Empty;
/// <summary>Html &lt;title&gt; tag</summary>
private String _Title = String.Empty;
/// <summary>Html &lt;meta http-equiv='description'&gt; tag</summary>
private string _Description = String.Empty;
/// <summary>Length as reported by the server in the Http headers</summary>
private long _Length;
#endregion
public ArrayList LocalLinks;
public ArrayList ExternalLinks;
#region Public Properties: Uri, RobotIndexOK
/// <summary>
/// http://www.ietf.org/rfc/rfc2396.txt
/// </summary>
public Uri Uri
{
get { return _Uri; }
set
{
_Uri = value;
}
}
/// <summary>
/// Whether a robot should index the text 
/// found on this page, or just ignore it
/// </summary>
/// <remarks>
/// Set when page META tags are parsed - no 'set' property
/// More info:
/// http://www.robotstxt.org/
/// </remarks>
public bool RobotIndexOK
{
get { return _RobotIndexOK; }
}
/// <summary>
/// Whether a robot should follow any links 
/// found on this page, or just ignore them
/// </summary>
/// <remarks>
/// Set when page META tags are parsed - no 'set' property
/// More info:
/// http://www.robotstxt.org/
/// </remarks>
public bool RobotFollowOK
{
get { return _RobotFollowOK; }
}
public string Title
{
get { return _Title; }
set { _Title = value; }
}
/// <summary>
/// Whether to ignore sections of HTML wrapped in a special comment tag
/// </summary>
public bool IgnoreRegions
{
get { return _IgnoreRegionTagNoIndex.Length > 0; }
}
public string ContentType
{
get
{
return _ContentType;
}
set
{
_ContentType = value.ToString();
string[] contentTypeArray = _ContentType.Split(';');
// Set MimeType if it's blank
if (_MimeType == String.Empty && contentTypeArray.Length >= 1)
{
_MimeType = contentTypeArray[0];
}
// Set Encoding if it's blank
if (Encoding == String.Empty && contentTypeArray.Length >= 2)
{
int charsetpos = contentTypeArray[1].IndexOf("charset");
if (charsetpos > 0)
{
Encoding = contentTypeArray[1].Substring(charsetpos + 8, contentTypeArray[1].Length - charsetpos - 8);
}
}
}
}
public string MimeType
{
get { return _MimeType; }
set { _MimeType = value; }
}
public string Extension
{
get { return _Extension; }
set { _Extension = value; }
}
#endregion
#region Public fields: Encoding, Keywords, All
/// <summary>Encoding eg. "utf-8", "Shift_JIS", "iso-8859-1", "gb2312", etc</summary>
public string Encoding = String.Empty;
/// <summary>Html &lt;meta http-equiv='keywords'&gt; tag</summary>
public string Keywords = String.Empty;
/// <summary>
/// Raw content of page, as downloaded from the server
/// Html stripped to make up the 'wordsonly'
/// </summary>
public string Html
{
get { return _All; }
set
{
_All = value;
_WordsOnly = StripHtml(_All);
}
}
public string WordsOnly
{
get { return this.Keywords + this._Description + this._WordsOnly; }
}
public virtual long Length
{
get { return _Length; }
set { _Length = value; }
}
public string Description
{
get
{
// ### If no META DESC, grab start of file text ###
if (String.Empty == this._Description)
{
if (_WordsOnly.Length > _SummaryCharacters)
{
_Description = _WordsOnly.Substring(0, _SummaryCharacters);
}
else
{
_Description = WordsOnly;
}
_Description = Regex.Replace(_Description, @"\s+", " ").Trim();
}
// http://authors.aspalliance.com/stevesmith/articles/removewhitespace.asp
return _Description;
}
set
{
_Description = Regex.Replace(value, @"\s+", " ").Trim();
}
}
#endregion
#region Public Methods: SetRobotDirective, ToString()
/// <summary>
/// Pass in a ROBOTS meta tag found while parsing, 
/// and set HtmlDocument property/ies appropriately
/// </summary>
/// <remarks>
/// More info:
/// * Robots Exclusion Protocol *
/// - for META tags
/// http://www.robotstxt.org/wc/meta-user.html
/// - for ROBOTS.TXT in the siteroot
/// http://www.robotstxt.org/wc/norobots.html
/// </remarks>
public void SetRobotDirective(string robotMetaContent)
{
robotMetaContent = robotMetaContent.ToLower();
if (robotMetaContent.IndexOf("none") >= 0)
{
// 'none' means you can't Index or Follow!
_RobotIndexOK = false;
_RobotFollowOK = false;
}
else
{
if (robotMetaContent.IndexOf("noindex") >= 0) { _RobotIndexOK = false; }
if (robotMetaContent.IndexOf("nofollow") >= 0) { _RobotFollowOK = false; }
}
}
/// <summary>
/// For debugging - output all links found in the page
/// </summary>
public override string ToString()
{
string linkstring = "";
foreach (object link in LocalLinks)
{
linkstring += Convert.ToString(link) + "\r\n";
}
return Title + "\r\n" + Description + "\r\n----------------\r\n" + linkstring + "\r\n----------------\r\n" + Html + "\r\n======================\r\n";
}
#endregion
/// <summary>
///
/// </summary>
/// <remarks>
/// "Original" link search Regex used by the code was from here
/// http://www.dotnetjunkies.com/Tutorial/1B219C93-7702-4ADF-9106-DFFDF90914CF.dcik
/// but it was not sophisticated enough to match all tag permutations
///
/// whereas the Regex on this blog will parse ALL attributes from within tags...
/// IMPORTANT when they're out of order, spaced out or over multiple lines
/// http://blogs.worldnomads.com.au/matthewb/archive/2003/10/24/158.aspx
/// http://blogs.worldnomads.com.au/matthewb/archive/2004/04/06/215.aspx
///
/// http://www.experts-exchange.com/Programming/Programming_Languages/C_Sharp/Q_20848043.html
/// </remarks>
public void Parse()
{
string htmlData = this.Html;    // htmlData will be munged
//xenomouse http://www.codeproject.com/aspnet/Spideroo.asp?msg=1271902#xx1271902xx
if (string.IsNullOrEmpty(this.Title))
{   // title may have been set previously... non-HTML file type (this will be refactored out, later)
this.Title = Regex.Match(htmlData, @"(?<=<title[^\>]*>).*?(?=</title>)",
RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture).Value;
}
string metaKey = String.Empty, metaValue = String.Empty;
foreach (Match metamatch in Regex.Matches(htmlData
, @"<meta\s*(?:(?:\b(\w|-)+\b\s*(?:=\s*(?:""[^""]*""|'[^']*'|[^""'<> ]+)\s*)?)*)/?\s*>"
, RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture))
{
metaKey = String.Empty;
metaValue = String.Empty;
// Loop through the attribute/value pairs inside the tag
foreach (Match submetamatch in Regex.Matches(metamatch.Value.ToString()
, @"(?<name>\b(\w|-)+\b)\s*=\s*(""(?<value>[^""]*)""|'(?<value>[^']*)'|(?<value>[^""'<> ]+)\s*)+"
, RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture))
{
if ("http-equiv" == submetamatch.Groups[1].ToString().ToLower())
{
metaKey = submetamatch.Groups[2].ToString();
}
if (("name" == submetamatch.Groups[1].ToString().ToLower())
&& (metaKey == String.Empty))
{ // if it's already set, HTTP-EQUIV takes precedence
metaKey = submetamatch.Groups[2].ToString();
}
if ("content" == submetamatch.Groups[1].ToString().ToLower())
{
metaValue = submetamatch.Groups[2].ToString();
}
}
switch (metaKey.ToLower())
{
case "description":
this.Description = metaValue;
break;
case "keywords":
case "keyword":
this.Keywords = metaValue;
break;
case "robots":
case "robot":
this.SetRobotDirective(metaValue);
break;
}
//                ProgressEvent(this, new ProgressEventArgs(4, metaKey + " = " + metaValue));
}
string link = String.Empty;
ArrayList linkLocal = new ArrayList();
ArrayList linkExternal = new ArrayList();
// http://msdn.microsoft.com/library/en-us/script56/html/js56jsgrpregexpsyntax.asp
// original Regex, just found <a href=""> links; and was "broken" by spaces, out-of-order, etc
// @"(?<=<a\s+href="").*?(?=""\s*/?>)"
// Looks for the src attribute of:
// <A> anchor tags
// <AREA> imagemap links
// <FRAME> frameset links
// <IFRAME> floating frames
foreach (Match match in Regex.Matches(htmlData
, @"(?<anchor><\s*(a|area|frame|iframe)\s*(?:(?:\b\w+\b\s*(?:=\s*(?:""[^""]*""|'[^']*'|[^""'<> ]+)\s*)?)*)?\s*>)"
, RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture))
{
// Parse ALL attributes from within tags... IMPORTANT when they're out of order!!
// in addition to the 'href' attribute, there might also be 'alt', 'class', 'style', 'area', etc...
// there might also be 'spaces' between the attributes and they may be ", ', or unquoted
link = String.Empty;
//                ProgressEvent(this, new ProgressEventArgs(4, "Match:" + System.Web.HttpUtility.HtmlEncode(match.Value) + ""));
foreach (Match submatch in Regex.Matches(match.Value.ToString()
, @"(?<name>\b\w+\b)\s*=\s*(""(?<value>[^""]*)""|'(?<value>[^']*)'|(?<value>[^""'<> \s]+)\s*)+"
, RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture))
{
// we're only interested in the href attribute (although in future maybe index the 'alt'/'title'?)
//                    ProgressEvent(this, new ProgressEventArgs(4, "Submatch: " + submatch.Groups[1].ToString() + "=" + submatch.Groups[2].ToString() + ""));
if ("href" == submatch.Groups[1].ToString().ToLower())
{
link = submatch.Groups[2].ToString();
if (link != "#") break; // break if this isn't just a placeholder href="#", which implies maybe an onclick attribute exists
}
if ("onclick" == submatch.Groups[1].ToString().ToLower())
{   // maybe try to parse some javascript in here
string jscript = submatch.Groups[2].ToString();
// some code here to extract a filename/link to follow from the onclick="_____"
// say it was onclick="window.location='top.htm'"
int firstApos = jscript.IndexOf("'");
int secondApos = jscript.IndexOf("'", firstApos + 1);
if (secondApos > firstApos)
{
link = jscript.Substring(firstApos + 1, secondApos - firstApos - 1);
break;  // break if we found something, ignoring any later href="" which may exist _after_ the onclick in the <a> element
}
}
}
// strip off internal links, so we don't index same page over again
if (link.IndexOf("#") > -1)
{
link = link.Substring(0, link.IndexOf("#"));
}
if (link.IndexOf("javascript:") == -1
&& link.IndexOf("mailto:") == -1
&& !link.StartsWith("#")
&& link != String.Empty)
{
if ((link.Length > 8) && (link.StartsWith("http://")
|| link.StartsWith("https://")
|| link.StartsWith("file://")
|| link.StartsWith("//")
|| link.StartsWith(@"\\")))
{
linkExternal.Add(link);
//                        ProgressEvent(this, new ProgressEventArgs(4, "External link: " + link));
}
else if (link.StartsWith("?"))
{
// it's possible to have /?query which sends the querystring to the
// 'default' page in a directory
linkLocal.Add(this.Uri.AbsolutePath + link);
//                        ProgressEvent(this, new ProgressEventArgs(4, "? Internal default page link: " + link));
}
else
{
linkLocal.Add(link);
//                        ProgressEvent(this, new ProgressEventArgs(4, "I Internal link: " + link));
}
} // add each link to a collection
} // foreach
this.LocalLinks = linkLocal;
this.ExternalLinks = linkExternal;
} // Parse
/// <summary>
/// Stripping HTML
/// http://www.4guysfromrolla.com/webtech/042501-1.shtml
/// </summary>
/// <remarks>
/// Using regex to find tags without a trailing slash
/// http://concepts.waetech.com/unclosed_tags/index.cfm
///
/// http://msdn.microsoft.com/library/en-us/script56/html/js56jsgrpregexpsyntax.asp
///
/// Replace html comment tags
/// http://www.faqts.com/knowledge_base/view.phtml/aid/21761/fid/53
/// </remarks>
protected string StripHtml(string Html)
{
//Strips the <script> tags from the Html
string scriptregex = @"<scr" + @"ipt[^>.]*>[\s\S]*?</sc" + @"ript>";
System.Text.RegularExpressions.Regex scripts = new System.Text.RegularExpressions.Regex(scriptregex, RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.ExplicitCapture);
string scriptless = scripts.Replace(Html, " ");
//Strips the <style> tags from the Html
string styleregex = @"<style[^>.]*>[\s\S]*?</style>";
System.Text.RegularExpressions.Regex styles = new System.Text.RegularExpressions.Regex(styleregex, RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.ExplicitCapture);
string styleless = styles.Replace(scriptless, " ");
//Strips the <NOSEARCH> tags from the Html (where NOSEARCH is set in the web.config/Preferences class)
//TODO: NOTE: this only applies to INDEXING the text - links are parsed before now, so they aren't "excluded" by the region!! (yet)
string ignoreless = string.Empty;
if (IgnoreRegions)
{
string noSearchStartTag = "<!--" + _IgnoreRegionTagNoIndex + "-->";
string noSearchEndTag = "<!--/" + _IgnoreRegionTagNoIndex + "-->";
string ignoreregex = noSearchStartTag + @"[\s\S]*?" + noSearchEndTag;
System.Text.RegularExpressions.Regex ignores = new System.Text.RegularExpressions.Regex(ignoreregex, RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.ExplicitCapture);
ignoreless = ignores.Replace(styleless, " ");
}
else
{
ignoreless = styleless;
}
//Strips the <!--comment--> tags from the Html
//string commentregex = @"<!\-\-.*?\-\->";        // alternate suggestion from antonello franzil
string commentregex = @"<!(?:--[\s\S]*?--\s*)?>";
System.Text.RegularExpressions.Regex comments = new System.Text.RegularExpressions.Regex(commentregex, RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.ExplicitCapture);
string commentless = comments.Replace(ignoreless, " ");
//Strips the HTML tags from the Html
System.Text.RegularExpressions.Regex objRegExp = new System.Text.RegularExpressions.Regex("<(.|\n)+?>", RegexOptions.IgnoreCase);
//Replace all HTML tag matches with the empty string
string output = objRegExp.Replace(commentless, " ");
//Replace all _remaining_ < and > with &lt; and &gt;
output = output.Replace("<", "&lt;");
output = output.Replace(">", "&gt;");
objRegExp = null;
return output;
}
}
}

HtmlDownloader.cs

using System;
namespace Core.Utils.Html
{
public static class HtmlDownloader
{
private static string _UserAgent = "Mozilla/6.0 (MSIE 6.0; Windows NT 5.1; ThePlanCollection.com; robot)";
private static int _RequestTimeout = 5;
private static System.Net.CookieContainer _CookieContainer = new System.Net.CookieContainer();
/// <summary>
/// Attempts to download the Uri into the current document.
/// </summary>
/// <remarks>
/// http://www.123aspx.com/redir.aspx?res=28320
/// </remarks>
public static HtmlDocument Download(string url)
{
Uri uri = new Uri(url);
HtmlDocument doc = null;
// Open the requested URL
System.Net.HttpWebRequest req = (System.Net.HttpWebRequest)System.Net.WebRequest.Create(uri.AbsoluteUri);
req.AllowAutoRedirect = true;
req.MaximumAutomaticRedirections = 3;
req.UserAgent = _UserAgent; //"Mozilla/6.0 (MSIE 6.0; Windows NT 5.1; Searcharoo.NET)";
req.KeepAlive = true;
req.Timeout = _RequestTimeout * 1000; //prefRequestTimeout 
// SIMONJONES http://codeproject.com/aspnet/spideroo.asp?msg=1421158#xx1421158xx
req.CookieContainer = new System.Net.CookieContainer();
req.CookieContainer.Add(_CookieContainer.GetCookies(uri));
// Get the stream from the returned web response
System.Net.HttpWebResponse webresponse = null;
try
{
webresponse = (System.Net.HttpWebResponse)req.GetResponse();
}
catch(Exception ex)
{
webresponse = null;
Console.Write("request for url failed: {0} {1}", url, ex.Message);
}
if (webresponse != null)
{
webresponse.Cookies = req.CookieContainer.GetCookies(req.RequestUri);
// handle cookies (need to do this incase we have any session cookies)
foreach (System.Net.Cookie retCookie in webresponse.Cookies)
{
bool cookieFound = false;
foreach (System.Net.Cookie oldCookie in _CookieContainer.GetCookies(uri))
{
if (retCookie.Name.Equals(oldCookie.Name))
{
oldCookie.Value = retCookie.Value;
cookieFound = true;
}
}
if (!cookieFound)
{
_CookieContainer.Add(retCookie);
}
}
doc = new HtmlDocument();
doc.MimeType = ParseMimeType(webresponse.ContentType.ToString()).ToLower();
doc.ContentType = ParseEncoding(webresponse.ToString()).ToLower();
doc.Extension = ParseExtension(uri.AbsoluteUri);
string enc = "utf-8"; // default
if (webresponse.ContentEncoding != String.Empty)
{
// Use the HttpHeader Content-Type in preference to the one set in META
doc.Encoding = webresponse.ContentEncoding;
}
else if (doc.Encoding == String.Empty)
{
doc.Encoding = enc; // default
}
//http://www.c-sharpcorner.com/Code/2003/Dec/ReadingWebPageSources.asp
System.IO.StreamReader stream = new System.IO.StreamReader
(webresponse.GetResponseStream(), System.Text.Encoding.GetEncoding(doc.Encoding));
doc.Uri = webresponse.ResponseUri; // we *may* have been redirected... and we want the *final* URL
doc.Length = webresponse.ContentLength;
doc.Html = stream.ReadToEnd();
stream.Close();
doc.Parse();
webresponse.Close();
}
return doc;
}
#region Private Methods: ParseExtension, ParseMimeType, ParseEncoding
private static string ParseExtension(string filename)
{
return System.IO.Path.GetExtension(filename).ToLower();
}
private static string ParseMimeType(string contentType)
{
string mimeType = string.Empty;
string[] contentTypeArray = contentType.Split(';');
// Set MimeType if it's blank
if (mimeType == String.Empty && contentTypeArray.Length >= 1)
{
mimeType = contentTypeArray[0];
}
return mimeType;
}
private static string ParseEncoding(string contentType)
{
string encoding = string.Empty;
string[] contentTypeArray = contentType.Split(';');
// Set Encoding if it's blank
if (encoding == String.Empty && contentTypeArray.Length >= 2)
{
int charsetpos = contentTypeArray[1].IndexOf("charset");
if (charsetpos > 0)
{
encoding = contentTypeArray[1].Substring(charsetpos + 8, contentTypeArray[1].Length - charsetpos - 8);
}
}
return encoding;
}
#endregion
}
}

References:

[SQLServer]sql存储过程实现c#split的功能

mikel阅读(828)

— =============================================
— Author:  zoser

— Create date: 2009-01-14

— Description: SQL存储过程实现C#split的功能

— =============================================
alter PROCEDURE pro_SQLSplit
 — Add the parameters for the stored procedure here
 @SoureSql varchar(4000),—-源字符串
 @SeprateStr varchar(10) —–分隔符
AS
BEGIN
 declare @i int
 declare @tmp float
 set @tmp=0
 set @SoureSql=rtrim(ltrim(@SoureSql))
    set @i=charindex(@SeprateStr,@SoureSql)
 while @i>1
  begin
   set @tmp = @tmp + convert(float,(left(@SoureSql,@i-1)))
   set @SoureSql = substring(@SoureSql,@i+1,(len(@SoureSql)-@i))
   set @i = charindex(@SeprateStr,@SoureSql)
  end
 select @tmp
END
GO

=============

—by: project—MedInformation—–

[Flex]Flex与.NET互操作(三):基于WebService的数据访问(下)

mikel阅读(925)

在上一篇文章《Flex与.NET互操作(二):基于WebService的数据访问(上) 》中介绍了通过<mx:WebService>标签来访问Webservice。实际上我们也可以通过编程的方式动态的访问WebService,Flex SDK为我们提供了WebService类。

     使用WebService类来访问WebService其实也就是将<mx:WebService>标签的属性通过类对象的 属性形式来表示,相比之下使用WebService类比使用<mx:WebService>标签要灵活。下面我们来看看编程方式怎么连接和调 用远程方法:

1     internal function onClick():void
2     {
3         var service:WebService = new WebService();
4         service.loadWSDL("http://localhost:1146/FlashFlexService.asmx?wsdl");
5         service.addEventListener(ResultEvent.RESULT,onResult);
6         service.addEventListener(FaultEvent.FAULT,onFault);
7         service.GetBook();
8     }

 

     直接通过类对象的loadWSDL()方法调用远程WebService,动态为类对象指定相关的处理函数,然后和标签一样调用远程WebService方法既可。

1     internal function onResult(evt:ResultEvent):void
2     {
3         Alert.show(evt.result.Id);
4     }
5     
6     internal function onFault(evt:FaultEvent):void
7     {
8         Alert.show(evt.fault.faultDetail.toString());
9     }

 

     如上便完成了使用WebService类通过编程的方式访问远程WebService方法的调用。

     下面来看看WebService返回DataTable等负责类型,在Flex客户端该怎么解析。首先定义WebService方法如下:

 1 [WebMethod(Description="该方法将返回DataTable类型的数据")]
 2 public DataTable GetDataTable()
 3 {
 4     DataTable dt = new DataTable("Books");
 5     dt.Columns.Add("Id"typeof(int));
 6     dt.Columns.Add("Name"typeof(string));
 7     dt.Columns.Add("Author"typeof(string));
 8     dt.Columns.Add("Price"typeof(double));
 9 
10     DataRow dr = dt.NewRow();
11     dr["Id"= 1;
12     dr["Name"= "《Flex游戏开发》";
13     dr["Author"= "张三";
14     dr["Price"= 54.85;
15     dt.Rows.Add(dr);
16 
17     dr = dt.NewRow();
18     dr["Id"= 2;
19     dr["Name"= "《Flash游戏开发》";
20     dr["Author"= "李四";
21     dr["Price"= 65.50;
22     dt.Rows.Add(dr);
23 
24     return dt;
25 }

 

     同样在Flex客户端通过WebService来访问就可以了,下面是使用<mx:WebServive>标签访问(这里需要注意,<mx:operation>标签的name必须与服务端的WebService方法同名):

1     <mx:WebService id="myService" 
2         wsdl="http://localhost:1146/DataWebService.asmx?wsdl" useProxy="false">
3         <mx:operation name="GetDataTable">
4         </mx:operation>
5     </mx:WebService>

 

     提供好了WebService,客户端也连接上了WebService,现在只差调用WebService提供的远程方法了。如下:

 1     internal function onTable():void
 2     {
 3         myService.addEventListener(ResultEvent.RESULT,onSuccess);
 4         myService.addEventListener(FaultEvent.FAULT,onFault);
 5         myService.GetDataTable.send();
 6     }
 7     
 8     internal function onSuccess(evt:ResultEvent):void
 9     {
10         //bookGrid.dataProvider=this.myService.GetDataTable.lastResult.Tables.Books.Rows;
11     }
12     
13     internal function onFault(evt:FaultEvent):void
14     {
15         Alert.show("调用WebService方法失败,详细:" + evt.fault.faultDetail.toString());
16         
17     }

 

     将WebService的返回值绑定在Flex的DataGrid组件,mxml的相关代码如下:

 1 <mx:Panel x="41" y="123" width="480" height="279" layout="absolute" fontSize="12">
 2     <mx:DataGrid x="10" y="10" width="436" id="bookGrid" 
 3         dataProvider="{this.myService.GetDataTable.lastResult.Tables.Books.Rows}">
 4         <mx:columns>
 5             <mx:DataGridColumn headerText="编号" dataField="Id"/>
 6             <mx:DataGridColumn headerText="书名" dataField="Name"/>
 7             <mx:DataGridColumn headerText="作者" dataField="Author"/>
 8             <mx:DataGridColumn headerText="价格" dataField="Price"/>
 9         </mx:columns>
10     </mx:DataGrid>
11     <mx:ControlBar>
12         <mx:Button label="DataTable" click="onTable()"/>
13     </mx:ControlBar>
14 </mx:Panel>

 

     通过DataGrid的dataProvider属性绑定DataGrid组件的数据源,除了直接通过"{}"绑定表达式帮定外我们也可 以在调用远程方法成功的处理函数里给DataGrid指定数据源,见上面代码中注释的代码部 分。{this.myService.GetDataTable.lastResult.Tables.Books.Rows}表示将远程 WebService方法GetDataTable()的返回结果(DataTable)的所有行作为数据源与DataGrid组件进绑定,其中 Books为数据源DataTable的name,详细见前面WebService方法的定义出。程序运行结果如下图:

         

     

     DataSet,DataTable相比泛型集合来说,性能上有很大的差距,复杂的序列化和反序列化过程也很负责,自从.net 2.0推出泛型到现在,我一直就比较喜欢用泛型来传递大数据。OK,下面我将介绍下在Flex中怎么去处理WebService方法返回的泛型集合数据。 我们有如下WebService方法定义:

 

 1     [WebMethod(Description="该方法返回泛型集合")]
 2     public List<Book> BookList()
 3     {
 4         return new List<Book>
 5         {
 6             new Book
 7             {
 8                 Id = 1,
 9                 Name = "《Flex游戏开发》",
10                 Author = "张三",
11                 Price = 54.85
12             },
13             new Book
14             {
15                 Id = 1,
16                 Name = "《Flash游戏开发》",
17                 Author = "李四",
18                 Price = 65.50
19             }
20         };
21     }

 

     相比DataSet,DataTable类型,使用List<>返回数据我个人认为更方面容易处理。我们直接在WebService的调试环境下测试返回List<>的WebService方法可以看到如下结果:

        

     这就是以泛型结合(List<>)的形式返回的数据形式,相比DataTable的返回结果更为简洁,明了。话说到此,我们 在Flex下该怎么去获取这个返回值和处理这个值呢?其实这里已经很清楚的展现了我们可以通过什么方式去处理,仔细看上图会发 现"ArrayOfBook"????这是什么东西?莫非是在客户端可以通过数组的形式得到这个返回值。为了进一步搞清楚这里面的的点点滴滴,我们需要深 入到内部去了解下返回值的具体构造,通过Flex Builder的调试环境可以得到如下信息:  

          

     看清楚了吗?BookList方法的lastResult结构集下有两个对象,点开节点可知正是我们通过 List<Book>返回的两个Book对象,而lastResult的类型 是:mx.collections.ArrayCollection,这不真是ActionScript中的数组集合吗?好的,既然这样,在Flex客户 端便可以直接通过lastResult得到WebService返回的泛型集合数据了。如下代码块:

 1 internal function onTable():void
 2 {
 3     myService.addEventListener(ResultEvent.RESULT,onSuccess);
 4     myService.addEventListener(FaultEvent.FAULT,onFault);
 5     myService.BookList.send();
 6 }
 7 
 8 internal function onSuccess(evt:ResultEvent):void
 9 {
10     var arrC:ArrayCollection = this.myService.BookList.lastResult as ArrayCollection;
11     bookGrid.dataProvider=arrC;
12 }
13 
14 internal function onFault(evt:FaultEvent):void
15 {
16     Alert.show("调用WebService方法失败,详细:" + evt.fault.faultDetail.toString());
17     
18 }

 

     对应的mxml代码如下(运行结果和上面返回DataTable类型一样):

 1 <mx:Panel x="41" y="123" width="480" height="279" layout="absolute" fontSize="12">
 2     <mx:DataGrid x="10" y="10" width="436" id="bookGrid">
 3         <mx:columns>
 4             <mx:DataGridColumn headerText="编号" dataField="Id"/>
 5             <mx:DataGridColumn headerText="书名" dataField="Name"/>
 6             <mx:DataGridColumn headerText="作者" dataField="Author"/>
 7             <mx:DataGridColumn headerText="价格" dataField="Price"/>
 8         </mx:columns>
 9     </mx:DataGrid>
10     <mx:ControlBar>
11         <mx:Button label="DataTable" click="onTable()"/>
12     </mx:ControlBar>
13 </mx:Panel>

 

     关于WebService的数据访问就介绍到这里,由于个人能力有限,文中有不足之处还望大家指正。如果有什么好的建议也可以提出,大家相互讨论,学习,共同进步!!

版权说明

  本文属原创文章,欢迎转载,其版权归作者和博客园共有。  

  作      者:Beniao

 文章出处:http://beniao.cnblogs.com/  或  http://www.cnblogs.com/

[Flex]Flex与.NET互操作(二):基于WebService的数据访问(上)

mikel阅读(864)

 Flex提供了<mx:WebService>、<mx:HTTPService>和<mx:RemoteObject>标签来直接访问远程数据,这用于与各种不同语言环境开发提供的远程服务端数据源(如WebService)进行数据交互通信显得更加容易.

     本文以.NET平台下C#语言开发的WebService作为远程数据源,详细介绍Flex与.NET的WebService的数据通信 知识点;包括连接WebService,远程调用WebService方法,给WebService方法传递参数等相关知识点。三个标签的使用方法基本上 是一样,这里就以<mx:WebService>标签为例进行介绍。

     首先看看如下代码块:

1     <mx:WebService id="dataService" 
2         wsdl="http://localhost/FlashFlex/DataWebService.asmx?wsdl"
3         useProxy="false">
4         <mx:operation name="HelloWorld" result="onSuccess(event)" fault="onFault(event)"/>
5         <mx:operation name="GetBook" fault="onFault(event)" result="onObjectSuccess(event)"/>
6     </mx:WebService>

 

     wsdl属性指定到要访问的WebService的wsdl地址既可,其中定义了两个操作标签 (<mx:operation>),分别对应于WebService中定义的WebMethod方法。result属性标记访问 WebService方法成功后的处理函数;fault则相反,指定于访问失败的处理函数。以上两个<mx:operation>对应于 WebService的WebMethod方法如下:

 1     /// <summary>
 2     /// 返回字符串
 3     /// </summary>
 4     /// <returns></returns>
 5     [WebMethod]
 6     public string HelloWorld()
 7     {
 8         return "Hello World";
 9     }
10 
11     /// <summary>
12     /// 返回一个简单对象
13     /// </summary>
14     /// <returns></returns>
15     [WebMethod]
16     public Book GetBook()
17     {
18         return new Book
19         {
20             Id = 1,
21             Name = "三国演义",
22             Author = "罗贯中",
23             Price = 100
24         };
25     }

 

     如上便是WebService方法定义和在Flex的客户端(mxml)通过<mx:WebService>标签来访问WebService的完整流程,下面我们来看看在Flex的客户端怎么去调用WebService所定义的方法:

 1 <mx:Script>
 2     <![CDATA[
 3         import mx.controls.Alert;
 4         import mx.rpc.events.FaultEvent;
 5         import mx.rpc.events.ResultEvent;
 6         
 7         /**
 8          * 向WebService发起请求–调用HelloWorld方法,dataService为<mx:WebService>的id
 9          * */
10         internal function onRequest():void
11         {
12             dataService.HelloWorld();
13         }
14         
15         /**
16          * 请求成功处理返回结果
17          * */
18         internal function onSuccess(evt:ResultEvent):void
19         {
20             Alert.show(evt.result.toString());
21         }
22         
23         
24         /**
25          * 请求失败的处理函数
26          * */
27         internal function onFault(evt:FaultEvent):void
28         {
29             Alert.show("访问WebService失败!");
30         }
31     ]]>
32 </mx:Script>

 

      通过上面的调用,就可以完成一个Flex和.NET WebService的交互。当然我们在Flash/Flex的客户端调用WebService也是可以传递参数的,如下WebService的WebMethod定义:

 1     /// <summary>
 2     /// 将传递进来的参数转化为大写字符返回
 3     /// </summary>
 4     /// <param name="value"></param>
 5     /// <returns></returns>
 6     [WebMethod]
 7     public string ConvertToUpper(string value)
 8     {
 9         return value.ToUpper();
10     }

 

     通过在<mx:WebService>标签下配置<mx:operation>执行该方法就可以访问了,如下:

1 <mx:operation name="ConvertToUpper"  result="onSuccess(event)" fault="onFault(event)"/>

 

1     /**
2      * 向WebService发起请求
3      * */
4     internal function onRequest():void
5     {
6         //dataService.HelloWorld();
7         dataService.ConvertToUpper("abcdefg");
8     }

     另外,我们还可以通过<mx:request>来传递参数,这里只需要知道<mx:request></mx:request>里的参数配置与WebService提供的WebMethod方法参数同名就OK。

     回到前面看看WebService的方法定义,其中一个方法GetBook是返回的一个Book对象,如果是返回的对象我们在Flex的客户端怎么来获取这个对象的值呢?详细见如下代码示例:

 1     internal function onObject():void
 2     {
 3         dataService.GetBook();
 4     }
 5     
 6     internal function onObjectSuccess(evt:ResultEvent):void
 7     {
 8         //直接通过事件的result属性得到返回值,然后直接访问属性便OK
 9         Alert.show(evt.result.Name);
10     }
11     
12     /**
13      * 请求失败的处理函数
14      * */
15     internal function onFault(evt:FaultEvent):void
16     {
17         Alert.show("访问WebService失败!");
18     }

 

     如上便完成了服务端的WebService返回对象到客户端的调用。

版权说明

  本文属原创文章,欢迎转载,其版权归作者和博客园共有。  

  作      者:Beniao

 文章出处:http://beniao.cnblogs.com/  或  http://www.cnblogs.com/

[SEO]网站推广经验

mikel阅读(963)

作者:糖果盒
 
 推广经验1:不要做线下推广,网站的优势在于传播性强,结果为了做一个网站反而在传播性不强的线下媒体做宣传岂不是本末倒置。我们糖果盒网站在线上推 广的时候注册用户量稳步增长,每天过百,只要哪天开始线下推广,那天注册人数就会少的可怜,设想一个在校园里闲逛的学生看到了我们网站的海报、宣传单,等 他半小时后回到宿舍休息了一下,再过会儿打开电脑还记得我们网址的几率能有多少呢?
  
  
  推广经验2:不要做线上ads广告,除非搜索引擎给你带来的每个流量会再创造出更多的价值,例如你是个做培训代理的网站或者是做鲜花买卖的网 站,用户不会消费的话基本不会点进来,那可以做一些baidu ads或google ads,否则大部分流量都是搜索引擎检查人员带来的或者用户误点的,设想一下,你现在还会点击网页两侧框框里的广告吗?
  
  
  推广经验3:要做好关系营销,推广一个网站给你的朋友很容易,如果我有50个朋友,每个人让他们注册只是一句话介绍+一个人情就搞定了,但让 我营销第51个人,即使是专业的营销人员也很难说动一个陌生人注册您的网站并且注册好了之后还会经常回来看看,所以一定要做好关系营销,把可利用的关系都 利用起来,我们tangguohe.com网站3周增加的过万用户几乎全是关系营销带来的。
  
  
  推广经验4:要把钱花在已经是你客户的人身上,而不是把钱花在不是你潜在客户身上。理解了1、2、3点,就不能理解第4点,最近好几个朋友百 万的资金砸在了线下广告上,结果都倒闭了,效果和投入比起来…我们小时候就几个台,十多个广告,背都能背出来,现在电视台内容多出了1万倍,广告效果 也就稀释了1万倍,路边广告更是漫天飞,用户也到了反感期,把钱砸在这个市场上有意义吗?相反,对于已经是我们网站的会员,每个会员奖励个几块钱奖品并不 难,而对客户来说感觉很好,他们会帮你邀请更多的朋友加入,如果再把这两个过程结合起来,用户邀请了朋友就根据朋友数和朋友质量给他奖励奖品那岂不是很好 吗?
  
  
  推广经验5:要利用好社区营销,目前社区的圈地已经结束,大公司频频发话,以后不可能有新网站诞生,SNS对用户的掌握确实是很强势的,有了 用户,可以迅速开发出各类应用适应用户需求。但大公司毕竟也有其弱点,张小盒hezi.cc网站在各大社区如kaixin001.com xiaonei.com等都建立有很强大的群组,为社区的用户提供了有趣的内容,也为自己赢得了众多用户,岂不是双赢?

[SEO]网站SEO并非一定需要静态化

mikel阅读(783)

在国内,很多“SEO专家”给客户网站的第一诊断结果就是要页面静态化。这倒不是因为动态页面就做不了SEO,而是相对静态页面而言,动态页面的SEO更加难做,受“SEO专家”的技术能力所限而已。

  对于搜索引擎而言,在主观上对静态页面和动态页面并没有特殊的好恶,只是很多动态页面的参数机制不利于搜索引擎收录,而静态页面更容易收录而 已。此外,页面静态化在一定程度上也提高了页面访问速度和系统性能及稳定性——这使得在搜索引擎优化上面,为使得效果更加明显,问题简单快速解决,大家对 站点的静态化趋之若骛。

  然而对于一些大型网站,静态化带来的问题和后续成本也是不容忽视的:

  由于生成的文件数量较多,存储需要考虑文件、文件夹的数量问题和磁盘空间容量的问题———需要大量的服务器设备;

  程序将频繁地读写站点中较大区域内容,考虑磁盘损伤问题及其带来的事故防范与恢复——硬件损耗要更新、站点备份要到位;

  页面维护的复杂性和大工作量,及带来的页面维护及时性问题——需要一整套站点更新制度和专业的站点维护人员;

  站点静态化,增加了更新维护难度和网站管理人员工作强度,增加了硬件设备需求和损耗速度,增加了站点潜在的访问冲突和故障概率。对于一个大型网站而言,这都是必须考虑的问题。

  对于SEO优化,我们不需要真正静态化,只需要假装就可以了。动态页面也一样能够做好SEO优化。

  目前大多数搜索引擎基本都能收录动态页面,使用动态页面的站点数也远远大于静态页面的站点数。

  许多大型网站虽然网址的后缀为。htm,但其实还是动态页面,只是用了URL Rewrite的方式“欺骗”搜索引擎,真正完全静态的没有发现几个。

  目前对于一个动态网站,实施相对静态化的做法基本有如下几种:

  1. 伪静态,URL Rewrite方式。

  2. 类似蜘蛛的方法,动态站点也存在,只是通过一个程序去抓取整个站点并保存发布为需要访问的静态站点。

  不论是真静态页面还是伪静态页面,在方便搜索引擎收录这一点上,效果都是一样的。既然如此,为什么不使用效率更高的“相对静态化”的方法,以避免真正静态化所产生的诸多问题呢?

  在页面更新维护问题上,即使是伪静态,也带来了不少维护的复杂性和工作量。目前较为可取的更新方式有:

  触发式更新:当维护人员在后台更改某些信息后,系统自动或提供手动更新相应显示页面。

  独立、分片式更新:更新与维护分开,页面划分为不同的区,根据一定的规则对于区进行更新。区之间的整合与分离,有的是采用活动域,有的是采用SSI(Server Side Include)。

  对于独立、分片式更新,应当是大型网站相对静态化后较为理想的更新维护模式:

  1. 将各页面定义分区、编号,给定存储规则和更新规则,更新规则分为“依据数据变更更新”和“周期更新”。

  2. 对于各区采用优先级的方式,并提供手工触发的即时更新,以保证部分信息的更新时间需要。

  3. 静态页面替换动态页面,同时保留动态页面,并在静态页面未生成完毕时采用动态页面代替。

  静态化对于网站SEO来说,应当只是一个信号,告诉搜索引擎我的站点很好收录,然后带领搜索引擎尽可能多的“浏览”站点内的内容。只要能够方便浏览和收录,不论是静态页面还是动态页面,搜索引擎都会一视同仁的去收录。

  对于小网站而言,站点静态化或许是解决网站收录量的一个简便的办法,而对于大网站来说,则要认真考虑了,是不是真的有必要去做静态化,还是做一下“相对静态化”就够了。

[Javascript]改变世界的Web前端开发

mikel阅读(796)

乔布斯说:“活着就是为了改变世界,难道还有其它原因吗?” 2008年,在Web前端开发界,无论国外还是国内,都发生了不少事情,有哪些是改变世界或即将改变世界的大事件呢?

JavaScript游戏

2008年4月9日,Dion Almaer发现了一款非常经典的JavaScript游戏:Super Mario. 这款游戏由Jacob Seidelin开发,大小仅14k.
mario
(Super Mario JavaScript版本: http://jsmario.com.ar/)

不少Web开发者们大跌眼镜:这真的是用JavaScript开发出来的?答案是肯定的。这款游戏利用了Canvas元素(IE中用HTML模 拟),图像存储在加密的字符串中,还用base64存储了MIDI背景音乐。除了这些技巧,其它代码就是我们熟悉的HTML、CSS和 JavaScript.

Super Mario JavaScript版本的横空出世(之前也出现过用JavaScript写的游戏,但没有像Super Mario一样引起大家的关注),激起了一股用JavaScript编写游戏的热潮:

许多经典的游戏都有了JavaScript版本:Pac-Man(经典的吃豆子游戏), Space Invaders(太空入侵者),Spacius(百玩不厌的雷电)等等。

甚至还出现了一些比较复杂的角色扮演游戏:Andrew Wooldridge创造的Tombs of Asciiroth 和 CanvasQuest,Pierre Chassaing创造的ProtoRPG等。

伴随JavaScript游戏的热潮,还出现了不少专门用于游戏开发的JavaScript库。最突出的是GameJS(基于Canvas的一个2D游戏开发库) 和 GameQuery(这是JQuery的一个插件)。

除了用Canvas构建2D游戏,用JavaScript还可以构建3D游戏,还出现了非常出色的Processing.js,以及JavaScript PlotTool绘图工具等等。

感慨:JavaScript游戏一段时间内将还只是开发者们的“玩物”,要真正转换为商业应用,可能还有一段漫长的路要走。但是,当Super Mario跳跃在Web网页上时,这昭示着JavaScript的时代已经到来了。JavaScript能做什么?2008年的答案是:JavaScript连游戏都能做!

大放异彩的JQuery

2008年,无论对于jQuery的作者John Resig还是jQuery库本身来说,都是非常棒的一年。jQuery首页上有一行很明显的加粗文字:

jQuery is designed to change the way that you write JavaScript.
jQuery设计成可以改变你书写JavaScript的方式。

jQuery用数据和事实证明了它的魅力。一定程度上,甚至可以毫不夸张地说:jQuery改变了Web前端开发界。下面是用Google Trends统计的常用JavaScript库在2008年的搜索量曲线图:
jslib_trend

2008年9月份,jQuery团队战绩斐然:Microsoft和Nokia正式将jQuery集成进他们的应用程序开发平台。此 外,Google的部分应用里,也早就采纳了jQuery. 从jQuery的首页上还可以看出,DELL, Bank of America, Digg, Technorati, Mozzila等站点都在使用jQuery.

当然,除了jQuery,其它JavaScript在2008年也都有可圈可点的发展。YUI3的Preview版本,是我见过的最具有发展潜力的 框架。ExtJS在国内的普及也非常迅猛,JavaEye社区里,ExtJS一定程度上成了Ajax的代名词,各种有关ExtJS的技术文章和书籍非常多 (遗憾的是书籍的质量不高)。Prototype不温不火。Mootools则在低调中用其优雅的代码吸引了不少忠实用户。

感慨:上面提到的每个JavaScript库都是非常优秀的,掌握任何一个,对于我们的日常工作来说,都绰绰有余了。只是对于 2008年来说,jQuery的表现太突出了,连我这个天天工作用YUI的人,在2008年,都不得不为jQuery鼓掌,为John Resig喝彩!各种JavaScript库的争奇斗艳,这是JavaScript时代已经到来的另一个标志。

蹒跚起步的网页工业化

2008年,如果你是一名Web前端开发工程师,却没有听说过“栅格”两个字,那你一定是工作太忙太专心了。2008年10月份,在淘宝UED博客,出现了一篇“960的秘密”,揭开了网页栅格系统在国内的研究小热潮。
grid

伴随着栅格系统的争论,国内的前端技术博客里还出现不少对CSS框架和布局的探讨。这一切,所要解决的是以下两个问题:

  1. 网页的规范性。随着站点的成长,页面会以几何级数的速度增加。面对成千上万个网页,如何保持风格的一致性是一个不小的挑战。
  2. 网页的工业化产出。在遵守规范和保证质量的基础上,如何让页面制作容易,如何让运营人员能批量制造页面,这是目前许多大型站点面临的另一个问题。

国内站点中,淘宝、百度有啊、网易等站点的已逐步采用栅格系统。淘宝的首页和频道目前已经全部栅格化,同时尝试性开发了TMS(模板管理系统)来解决网页的工业化产出问题。

感慨:网页的高质量工业化产出,在国内很多公司才刚起步。2009年,我相信工业化将依旧是Web前端开发界的关键词。

这些也很出色

  1. 渐进增强。2008年10月份,Aaron Gustafson在ALA网站上发表了一系列有关渐进增强的文章,探讨的核心问题是:JavaScript应该做什么以及Web前端开发的技术流程。 JavaScript游戏让我们看到了JavaScript的魔力,Aaron提醒我们不能滥用JavaScript,我们要仔细考虑 JavaScript的使用场景。可用性,无侵入性,可访问性等等,这些理念是每一个前端开发工程师需要好好思考的。
  2. D2(前端技术论坛)。 2008年,在北京和上海分别举办了两届D2,这是国内前端开发工程师们的两场盛会。前端工程师,这个新生的职位逐步被国内各大公司接受。D2的意义在于,我们聚集在一起,发出了自己的声音!
  3. Google Chrome的诞生。 2008年,Chrome, JS V8引擎,Google迫使各大浏览器厂商开始比拼JavaScript引擎速度,这是JavaScript时代已经到来的另一个标志。Google和 Mozzila的努力,在年末的时候带来鼓舞人心的统计结果:IE的使用率跌破70%. 万恶的IE6,早点灭亡吧。2009年,Google的号角和淘宝网即将掀起的 NO IE6 活动,将加速IE6的灭亡。

最后,用两句话来结束本文:

2008年,我们努力改变世界!
2009年,我们继续改变世界,同时世界将开始为我们而改变!

[C#]用一个实例说说委托,匿名委托,Lamda表达式

mikel阅读(965)

C#到3.0中一直都在不断地提高,增加了很多特性,从2.0的匿名委托到现在的LAMDA表达式,为的就是让大家觉得语言越来越人性化。以下是我写的一个小DEMO,用来简单示例一下他们之间的关系。非常简单易懂。

Code
using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
namespace WebApplication1
{
    
public partial class _delegate : System.Web.UI.Page
    {
      
        
delegate string DelegateTest(string s);
        
public static string getString(string t)
        { 
return t; }
        DelegateTest normalDelegate 
= new DelegateTest(getString);
        DelegateTest anonymousDelegate 
= delegate(string a) { return a; };
        DelegateTest lamada 
= s => { return s; };
        
protected void Page_Load(object sender, EventArgs e)
        {
            Response.Write(normalDelegate(
"一般委托<br>"));
            Response.Write(anonymousDelegate(
"匿名方法<br>"));
            Response.Write(lamada(
"lamda表达式"));
        }
       
    }
}