Georgetown University LogoGeorgetown University Library LogoDigitalGeorgetown Home
    • Login
    View Item 
    •   DigitalGeorgetown Home
    • Georgetown University Institutional Repository
    • Georgetown College
    • Department of Economics
    • Graduate Theses and Dissertations - Economics
    • View Item
    •   DigitalGeorgetown Home
    • Georgetown University Institutional Repository
    • Georgetown College
    • Department of Economics
    • Graduate Theses and Dissertations - Economics
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Production and Consumption in Knowledge Market: Solving the Old Puzzles with New Techniques

    Cover for Production and Consumption in Knowledge Market: Solving the Old Puzzles with New Techniques
    View/Open
    View/Open: Guo_georgetown_0076D_14019.pdf (979kB) Bookview

    Creator
    Guo, Dongbo
    Advisor
    Rust, John
    Abstract
    The first chapter investigates the drivers of citation counts of academic papers. I match yearly citation data, full texts, and yearly author data of 4,482 papers in the top 5 economics journals, and use textual analysis to construct high dimensional vectors of features of papers and authors. The 10-year citation distribution is highly right-skewed, and the upper tail of the distribution is well approximated by a power law. In addition, higher 10-year citation counts are associated with higher popular topic coverage, numbers of authors, and total citations of authors' co-authors, while associated with lower "Micro" intensity, paper complexity, and numbers of authors' top field publications. I use several state-of-the-art machine learning methods and develop a hybrid method that combines variable construction of dictionary-based textual analysis, variable selection of regression shrinkage, and model fitting of Gradient Boosted Trees to predict papers' 10-year citations with the information available as of the year of publication. My proposed hybrid method gives the smallest Mean Squared Error for 10-year citation out-of-sample prediction test while using a relatively small number of variables compared to other machine learning methods. It correctly predicts 72.7% of the papers that are in the upper half of the citation distribution and correctly predicts 76.7% of the papers that are in the lower half of the citation distribution.
     
    The second chapter analyzes editorial decision making in the academic publishing process. I analyze data on keywords, abstract, referee recommendations, historical records of authors, and records of editorial decision making of 13,517 manuscripts submitted to four academic journals, linked with data on paper citation counts. I use textual analysis to analyze keywords and abstracts of each paper to construct high dimensional measures of research topics and fields. Then, I estimate the effects of features of papers, authors, and referee recommendations on editorial decision making, duration from submission to decision, and paper citations. Empirical results suggest that papers with higher referee recommendation scores, higher scientific contribution scores, lower standard deviation of referee recommendation scores, higher share of positive referee recommendations, higher coverage of popular research topics, and written by authors with longer and more solid submission history (higher number of submissions and lower rejection rate) are more likely to be published. Papers with lower coverage of popular research topics and written by authors with shorter and weaker submission history are more likely to be desk rejected. For non-desk-rejected papers, the ones with higher referee recommendation scores and lower standard deviation of the scores have shorter durations of the first round of review. The results for paper citations suggest that accepted papers on average get higher citations than rejected ones, and higher paper citation counts are associated with higher coverage of popular research topics, referee recommendation scores, and scientific contribution scores. In the prediction part, I use machine learning methods (regression shrinkage methods, Random Forest, and Gradient Boosted Trees) to predict paper citations with the information available at the time of submission. The model that uses Random Forest method, measures of publication information, measures of research fields and topics, and high dimensional measures of the appearance of popular topic words gives the best out-of-sample prediction performance. Using the preferred prediction model, I test the possibility of combining artificial intelligence (AI) and human experts in the academic publishing process. The experiment shows that the average number of cumulative citations of the published papers is more than 24% higher than all submissions. This result suggests that papers published by the human intelligence based academic publishing process turn to have higher average citations than rejected ones, even though editors may not use paper's expected citations as one of the criteria when they decide which paper to publish. As an exercise, I use the citation prediction model to decide which papers to publish based on maximizing citations. For a comparable acceptance rate as the human-based editorial process, the papers published by the algorithm have 2% higher citation counts. In addition, the average number of cumulative citations of the papers selected by the artificial intelligence from the publishable paper is 22% higher than all publishable papers. Admittedly, there are other factors that affect editors' decision on which paper to publish. However, the artificial intelligence based prediction model may help editor to identify the papers that are more likely to be highly cited from publishable papers.
     
    Description
    Ph.D.
    Permanent Link
    http://hdl.handle.net/10822/1050764
    Date Published
    2018
    Subject
    Academic publishing process; Editorial decision; Knowledge market; Machine learning; Paper citation distribution; Scientific impact prediction; Economics; Economics;
    Type
    thesis
    Publisher
    Georgetown University
    Extent
    161 leaves
    Collections
    • Graduate Theses and Dissertations - Economics
    Metadata
    Show full item record

    Related items

    Showing items related by title, author, creator and subject.

    • Thumbnail

      Ethical considerations at the various stages in the development, production, and consumption of GM crops 

      Reiss, Michael J. (2001-06)
    Related Items in Google Scholar

    Georgetown University Seal
    ©2009 - 2023 Georgetown University Library
    37th & O Streets NW
    Washington DC 20057-1174
    202.687.7385
    digitalscholarship@georgetown.edu
    Accessibility
     

     

    Browse

    All of DigitalGeorgetownCommunities & CollectionsCreatorsTitlesBy Creation DateThis CollectionCreatorsTitlesBy Creation Date

    My Account

    Login

    Statistics

    View Usage Statistics

    Georgetown University Seal
    ©2009 - 2023 Georgetown University Library
    37th & O Streets NW
    Washington DC 20057-1174
    202.687.7385
    digitalscholarship@georgetown.edu
    Accessibility