StackOverflow Question Demographics Query

From SQLServerPedia

Jump to: navigation, search

See Also: Main Page - Business Intelligence - Data Mining the StackOverflow Database - Interesting StackOverflow Database Queries

Contents

How The Query Works

This query joins Posts to Users to find information about the kinds of people who ask questions. As of 6/2009, the results look like this:

  • Questions - 176,137
  • Average Score - 1.89
  • Average View Count - 311
  • Distinct Questioners - 39,795 (meaning anyone who has asked a single question has asked an average of 4.4 questions - there may be some odd stuff in here around anonymous questions though, haven't looked at that yet)
  • Average Answer Count - 4
  • Average Comment Count - 2.3
  • Average Questioner Reputation - 1,506
  • Average Questioner Age - 30 (but remember, that's unreliable, because it's a self-entered number)


The T-SQL Query

SELECT COALESCE(COUNT(DISTINCT p.ID),0)           AS Questions
       ,COALESCE(AVG(p.Score * 1.00),0)           AS AvgScore
       ,COALESCE(AVG(p.ViewCount * 1.00),0)       AS AvgViewCount
       ,COALESCE(COUNT(DISTINCT p.OwnerUserId),0) AS DistinctQuestioners
       ,COALESCE(AVG(p.AnswerCount * 1.00),0)     AS AvgAnswerCount
       ,COALESCE(AVG(p.CommentCount * 1.00),0)    AS AvgCommentCount
       ,COALESCE(AVG(p.FavoriteCount * 1.00),0)   AS AvgFavoriteCount
       ,COALESCE(AVG(u.Reputation * 1.00),0)      AS AvgQuestionerReputation
       ,COALESCE(AVG(u.Age * 1.00),0)             AS AvgQuestionerAge
       ,COALESCE(AVG(u.UpVotes * 1.00),0)         AS AvgQuestionerUpVotes
       ,COALESCE(AVG(u.DownVotes * 1.00),0)       AS AvgQuestionerDownVotes
FROM   dbo.Posts p
       INNER JOIN dbo.Users u
         ON p.OwnerUserId = u.Id
WHERE p.PostTypeId = 1 AND ClosedDate IS NULL

Result Screenshot:

Image:StackOverflow-PostDemographics1.jpg

Variation: Grouping Results by Tags

SELECT t.TagName
       ,COALESCE(COUNT(DISTINCT p.ID),0)           AS Questions
       ,COALESCE(AVG(p.Score * 1.00),0)           AS AvgScore
       ,COALESCE(AVG(p.ViewCount * 1.00),0)       AS AvgViewCount
       ,COALESCE(COUNT(DISTINCT p.OwnerUserId),0) AS DistinctQuestioners
       ,COALESCE(AVG(p.AnswerCount * 1.00),0)     AS AvgAnswerCount
       ,COALESCE(AVG(p.CommentCount * 1.00),0)    AS AvgCommentCount
       ,COALESCE(AVG(p.FavoriteCount * 1.00),0)   AS AvgFavoriteCount
       ,COALESCE(AVG(u.Reputation * 1.00),0)      AS AvgQuestionerReputation
       ,COALESCE(AVG(u.Age * 1.00),0)             AS AvgQuestionerAge
       ,COALESCE(AVG(u.UpVotes * 1.00),0)         AS AvgQuestionerUpVotes
       ,COALESCE(AVG(u.DownVotes * 1.00),0)       AS AvgQuestionerDownVotes
FROM   dbo.Posts p
       INNER JOIN dbo.Users u
         ON p.OwnerUserId = u.Id
       INNER JOIN dbo.PostTags pt
         ON p.Id = pt.PostId
       INNER JOIN dbo.Tags t
         ON pt.TagId = t.Id
WHERE p.PostTypeId = 1 AND ClosedDate IS NULL
GROUP BY t.TagName
ORDER BY t.TagName

Result Screenshot:

Image:StackOverflow-PostDemographics2.jpg

Variation: Grouping Results by Month

SELECT  YEAR(p.CreationDate) AS PostYear
       ,MONTH(p.CreationDate) AS PostMonth
       ,COALESCE(COUNT(DISTINCT p.ID),0)           AS Questions
       ,COALESCE(AVG(p.Score * 1.00),0)           AS AvgScore
       ,COALESCE(AVG(p.ViewCount * 1.00),0)       AS AvgViewCount
       ,COALESCE(COUNT(DISTINCT p.OwnerUserId),0) AS DistinctQuestioners
       ,COALESCE(AVG(p.AnswerCount * 1.00),0)     AS AvgAnswerCount
       ,COALESCE(AVG(p.CommentCount * 1.00),0)    AS AvgCommentCount
       ,COALESCE(AVG(p.FavoriteCount * 1.00),0)   AS AvgFavoriteCount
       ,COALESCE(AVG(u.Reputation * 1.00),0)      AS AvgQuestionerReputation
       ,COALESCE(AVG(u.Age * 1.00),0)             AS AvgQuestionerAge
       ,COALESCE(AVG(u.UpVotes * 1.00),0)         AS AvgQuestionerUpVotes
       ,COALESCE(AVG(u.DownVotes * 1.00),0)       AS AvgQuestionerDownVotes
FROM   dbo.Posts p
       INNER JOIN dbo.Users u
         ON p.OwnerUserId = u.Id
       INNER JOIN dbo.PostTags pt
         ON p.Id = pt.PostId
WHERE p.PostTypeId = 1 AND ClosedDate IS NULL
GROUP BY YEAR(p.CreationDate), MONTH(p.CreationDate)
ORDER BY YEAR(p.CreationDate), MONTH(p.CreationDate)

Result Screenshot:

Image:StackOverflow-PostDemographics3.jpg

Related Articles About the StackOverflow Database