Data Mining the StackOverflow Database

From SQLServerPedia

Jump to: navigation, search

See Also: Main Page - Business Intelligence

StackOverflow is a question-and-answer site for programmers. Anyone can post questions or answers, and each user's reputation goes up or down depending on how much their questions and answers are liked by the community. The site is licensed with Creative Commons, meaning the community owns the data they post to the site.

You can download the StackOverflow database in XML format via BitTorrent. It's a great dataset to use when learning data mining because:

  • It's programming-related, so us programmers and DBAs are more likely to understand the data we're looking at
  • It's freely licensed with Creative Commons, so you can use it in slideshows and tutorials
  • It's a simple, straightforward data model

Here's how to get started: