博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
软考 中级职称哪些最热门_我如何利用有史以来最热门的中级故事来建立排行榜。 以及它几乎是怎么死的。...
阅读量:2522 次
发布时间:2019-05-11

本文共 12275 字,大约阅读时间需要 40 分钟。

软考 中级职称哪些最热门

by Michael Deng

邓小平

我如何利用有史以来最热门的中级故事来建立排行榜。 以及它几乎是怎么死的。 (How I built a leaderboard with the top Medium stories of all time. And how it almost died.)

Last year I built Top Medium Stories — a website that showcases the Medium’s top stories of all time. This is the tale of how a lone developer scraped thousands of stories and hitting seemingly fatal roadblocks.

去年,我建立了“顶级中级故事”(Top Medium Stories)-一个展示“中型”历史上最热门故事的网站。 这是一个孤独的开发人员如何抓取数千个故事并击中看似致命的障碍的故事。

Spoiler alert: Life finds a way. You can check out the leaderboard — updated daily — at .

剧透警报:生活找到了路。 您可以在查看排行榜(每天更新)。

我为什么做这个? (Why did I make this?)

As a long-time reader on Medium, I’ve always been curious what the most popular stories were. While the personalized feed and topic pages surface many great stories, just as many slip through the cracks.

作为《 Medium》的长期读者,我一直很好奇最受欢迎的故事是什么。 尽管个性化的提要和主题页面显示了许多精彩的故事,但也有很多人从裂缝中溜走。

Driven to unearth the gems buried in Medium’s annals, I set a new goal in early 2017: I was going to find the most popular stories of all time on Medium and share them with the rest of the world.

为了发掘埋在Medium纪事中的宝石,我在2017年初设定了一个新目标:我将在Medium上寻找有史以来最受欢迎的故事,并与世界其他地方分享。

My goal culminated in me publishing my list of the .

我的目标达到了顶峰,即我发布了“ 列表。

I compiled the stories manually, which was a grueling task. Over the period of a week, I visited every top stories page since September 10, 2014 (when the top stories feature debuted). To find even earlier stories, I dug through publication archives using the Wayback Machine to fetch ancient copies of Medium pages.

我手动编写了故事,这是一项艰巨的任务。 自2014年9月10日以来(当热门新闻功能首次亮相时),我在一周内访问了每个热门新闻页面。 为了找到更早的故事,我使用Wayback Machine翻阅了出版物档案,以获取Medium页面的古代副本。

I built a massive spreadsheet of every story I found. It was a ton of mind-numbing work, but I was proud of the result.

我为发现的每个故事构建了一个庞大的电子表格。 这是一项令人费解的工作,但我为结果感到骄傲。

But my feeling of pride was short-lived, as the list became quickly outdated. I wanted to keep it up-to-date, but doing so manually was impossible.

但是,随着这份名单很快过时,我的自豪感变得短暂。 我想使其保持最新状态,但是手动进行操作是不可能的。

Then, something dawned on me. I had already determined the manual steps to collect data the first time. There was was no reason why I couldn’t just automate those steps with code. Thus, I decided to turn the list into a dynamic website.

然后,我身上发生了一些事情。 我已经确定了第一次手动收集数据的步骤。 我没有理由不能仅仅使用代码来自动化这些步骤。 因此,我决定将列表变成一个动态网站。

自动化数据收集 (Automating data collection)

To automate the manual steps described above, I wrote a web scraper using Python .

为了使上述手动步骤自动化,我使用Python 编写了一个Web 。

The scraper crawls through every top stories page since September 10, 2014 and tosses any story it sees into a Python dictionary. The dictionary is then sorted by the number of claps each story received and written to a JSON file. (Claps are Medium’s equivalent of a “like” or an “upvote” and readers can give a story up to 50 claps.)

自2014年9月10日以来,该抓取工具会在所有热门故事页面中进行抓取,并将看到的所有故事都放入Python字典中。 然后,根据收到的每个故事并写入JSON文件的拍手次数对字典进行排序。 (鼓掌相当于“赞”或“赞”的中型,读者最多可以给一个鼓掌50个故事。)

Here’s a snippet of the JSON file:

这是JSON文件的代码段:

[  “We fired our top talent. Best decision we ever made.”,   {    “recommends”: 79000.0,     “pub_url”: “https://medium.freecodecamp.org",     “author”: “Jonathan Sol\u00f3rzano-Hamilton”,     “image”: “https://cdn-media-1.freecodecamp.org/images/1*4hU3Xn7wunA81I3v17JIrg.jpeg",     “year”: “2017”,     “story_url”: “https://medium.freecodecamp.org/we-fired-our-top-talent-best-decision-we-ever-made-4c0a99728fde",     “pub”: “freeCodeCamp”,     “author_url”: “https://medium.freecodecamp.org/@peachpie"  }],...

Before building the scraper, I checked Medium’s file to verify that I wasn’t violating any policies. I also set the scraping speed very slow (2 seconds between each request), so the scraper wouldn’t hammer Medium’s servers.

制作刮板之前,我检查了Medium的文件以确认我没有违反任何政策。 我还将抓取速度设置得非常慢(每个请求之间间隔2秒),因此抓取器不会影响Medium的服务器。

将数据转换为HTML (Converting the data to HTML)

The next step was transforming the JSON file into HTML to display the stories on a web page.

下一步是将JSON文件转换为HTML以在网页上显示故事。

I installed to do this. First, I constructed an HTML template with empty tables and rows. Then, I wrote a script that uses BeautifulSoup to populate the template from the JSON file.

我安装了来做到这一点。 首先,我构造了一个带有空表和行HTML模板。 然后,我编写了一个脚本,该脚本使用BeautifulSoup从JSON文件填充模板。

With a basic HTML file containing all the stories I want to display, it was time to create the actual website.

使用包含要显示的所有故事的基本HTML文件,现在可以创建实际的网站了。

建立一个kickass网站 (Building a kickass website)

When planning the website, I had three goals in mind:

规划网站时,我想到了三个目标:

1.简约典雅的设计 (1. Minimal and elegant design)

The design language is centered around plenty of whitespace and high-contrast text. This way, the focal point is on the stories its trying to highlight, not on the aesthetics of the website itself.

设计语言以大量空白和高对比度文本为中心。 这样,重点就在于它试图突出的故事,而不是网站本身的美学。

I also added a “Compact” view mode, which hides feature images from the website. This allows readers to skim through the list with ease.

我还添加了“紧凑”查看模式,该模式可隐藏网站中的特征图像。 这使读者可以轻松浏览清单。

2.快速 (2. Fast)

The first version of the website was quite sluggish. This is because it was trying to load hundreds of feature images at once.

该网站的第一个版本相当缓慢。 这是因为它试图一次加载数百个特征图像。

To solve this issue, I used “lazy loading.” When you land on the website, only the first 50 stories under “All” are loaded. If you want to see more stories, you have to click on “Load more.” This design pattern drastically reduces the initial loading time.

为了解决此问题,我使用了“延迟加载”。 当您登陆网站时,仅加载“全部”下的前50个故事。 如果您想查看更多故事,则必须单击“加载更多”。 这种设计模式大大减少了初始加载时间。

Also, to make navigation feel more responsive, I designed this website as a single-page web app. When you click on a button, you don’t navigate to another HTML page. Instead, jQuery switches the view instantaneously.

另外,为了使导航感觉更灵敏,我将此网站设计为单页Web应用程序。 当您单击一个按钮时,您不会导航到另一个HTML页面。 相反,jQuery会立即切换视图。

3.轻巧 (3. Lightweight)

To keep the website light, I chose to forgo most popular frontend libraries. I didn’t use Bootstrap, and I kept JavaScript/jQuery usage to a minimum.

为了使网站保持明亮,我选择放弃最受欢迎的前端库。 我没有使用Bootstrap,并且将JavaScript / jQuery的使用降至最低。

Taking a glance at the project repo reveals a very minimal setup. A few HTML files, a CSS file, a couple scripts, and a handful of data files.

快速浏览一下项目回购,就会发现设置非常少。 一些HTML文件,一个CSS文件,一些脚本和一些数据文件。

As a result, the website doesn’t have many moving parts and dependencies. It’s very simple to maintain and debug.

结果,该网站没有太多的活动部分和依赖性。 维护和调试非常简单。

测试并启动 (Testing and launching)

I shared the prototype with a couple of friends and asked them to rip it apart. Using their feedback, I iterated on the design twice. Then, I launched on Product Hunt.

我与几个朋友分享了原型,并要求他们将其拆开。 利用他们的反馈,我对设计进行了两次迭代。 然后,我启动了Product Hunt。

I could barely sleep that night. I still remember constantly refreshing the page checking for new comments until I passed out from exhaustion.

那天晚上我几乎睡不着。 我仍然记得不断刷新页面以检查新评论,直到我精疲力尽。

The next morning, I scrambled out of bed and clawed open my computer. I couldn’t believe my eyes! Top Medium Stories was at the top of Product Hunt’s homepage. At the end of the day, it was awarded the #2 product of the day.

第二天早上,我爬下床,抓着我的电脑。 我简直不敢相信我的眼睛! 热门中级故事位于Product Hunt主页的顶部。 最终,它被评为当天的第二名产品。

中层故事的突然死亡 (The sudden death of Medium Top Stories)

The Product Hunt launch exceeded my wildest expectations, and I was on cloud nine for a long time. But I knew I wasn’t done until I shared my project on Medium. I started this post half a year ago and I finally finished it a few weeks ago. I was beyond excited to publish it.

“产品搜寻”的发布超出了我的最大期望,而且我在云9上呆了很长时间。 但是我知道直到我在Medium上共享我的项目之前,我还没有完成。 我半年前开始撰写此帖子,几周前终于完成。 我很激动地发表它。

Before submitting, I decided to run the data collection script one more time to update the website.

在提交之前,我决定再运行一次数据收集脚本来更新网站。

The script failed catastrophically.

该脚本灾难性地失败了。

“No big deal. Either Medium has an outage or my internet isn’t working,” I thought. But I was so wrong. When I realized what actually happened, I slumped into my chair and dragged my fingers down my face in frustration.

“没什么大不了的。 “我的介质中断了,或者我的互联网无法正常工作。” 但是我错了。 当我意识到实际发生的事情时,我跌落在椅子上,沮丧地将手指从脸上拖下。

I kid you not, two days prior Medium had removed the top stories page from their website. They scuttled the very page my scraper depended on to function!

我不骗你,两天前Medium从他们的网站上删除了热门新闻页面。 他们破坏了我的刮板运行所依赖的页面!

I emailed Medium promptly, asking them to consider reverting the top stories page. I didn’t get the response I was looking for.

我及时通过电子邮件发送给Medium,要求他们考虑还原热门新闻页面。 我没有得到想要的答复。

But I didn’t blame them. My website wasn’t officially supported — they weren’t obligated to do anything. Even if they didn’t make this particular change, eventually one of their updates would break my website. It was inevitable.

但是我没有责怪他们。 我的网站不受官方支持-他们没有义务做任何事情。 即使他们没有进行此特定更改,最终他们的更新之一也会破坏我的网站。 这是不可避免的。

I felt hopeless. Since the website couldn’t be updated anymore, it was no more than a static list that was soon to be obsolete. In my mind, Top Medium Stories was dead on arrival.

我感到绝望。 由于无法再更新该网站,因此它只是一个很快就会过时的静态列表。 在我看来,《热门中等故事》在到达时就死了。

新生命的萌芽 (The sprouts of new life)

For a while, I worked on other stuff and didn’t look at Top Medium Stories at all. But I couldn’t stop thinking about the unfinished story of the website. I wanted to publish a postmortem — even if it didn’t have a happy ending. It felt like a good way close out the project.

有一阵子,我从事其他工作,根本没有看过“中级故事”。 但是我不能停止思考网站未完成的故事。 我想发布一个验尸报告-即使结局不理想。 结束项目感觉很不错。

I closed the article with:

我用以下内容关闭了这篇文章:

“So, I hope you enjoyed reading about Top Medium Stories. It was an amazing experience and I’m proud of what I made — I’m sorry it had to end this way. There will always be things you can’t predict or control, and they can wipe away your work in a heartbeat. That’s life.”
“所以,我希望您喜欢阅读有关“顶级中等故事”的文章。 这是一次了不起的经历,我为自己的所作所为感到自豪-很抱歉,我不得不以这种方式结束。 总会有一些您无法预测或控制的事情,它们会使您心跳加速。 那就是生活。”

As I stared at my finished draft, I realized something. I hate sad endings.

当我凝视完稿时,我意识到了一些事情。 我讨厌悲伤的结局。

Suddenly, my eyes locked on that same JSON blob I mentioned earlier.

突然,我的目光聚焦在我前面提到的同一JSON Blob上。

[  “We fired our top talent. Best decision we ever made.”,   {    “recommends”: 79000.0,     “pub_url”: “https://medium.freecodecamp.org",     “author”: “Jonathan Sol\u00f3rzano-Hamilton”,     “image”: “https://cdn-media-1.freecodecamp.org/images/1*4hU3Xn7wunA81I3v17JIrg.jpeg",     “year”: “2017”,     “story_url”: “https://medium.freecodecamp.org/we-fired-our-top-talent-best-decision-we-ever-made-4c0a99728fde",     “pub”: “freeCodeCamp”,     “author_url”: “https://medium.freecodecamp.org/@peachpie"  }],...

And I had a revelation. I didn’t need the top stories page to update the website. Instead, I could visit each url in the JSON file and pull the data directly from the story’s webpage.

我有一个启示。 我不需要热门新闻页面来更新网站。 相反,我可以访问JSON文件中的每个URL,然后直接从故事的网页中提取数据。

To fetch new stories, I could scrape the new Popular on Medium page, which would give me the top stories published recently.

要获取新故事,我可以抓取新的“中型热门”页面,该页面将为我提供最近发表的热门故事。

Having refactored my code, I realized something: it is possible that not every single popular new story will end up being showcased on the Popular on Medium page. So if you happen to read a story that you think should be on Top Medium Stories but isn’t, please let me know. Just send the story’s url to michaeldeng18@gmail.com, and I’ll add it in right away. Together, we can ensure the leaderboards are as comprehensive as possible.

重构代码后,我意识到了一些事情:可能并非每个流行的新故事都会最终出现在“中型流行”页面上。 因此,如果您碰巧读了一个您认为应该放在“最佳中级故事”上的故事,但不是,请告诉我。 只需将故事的URL发送到michaeldeng18@gmail.com ,我将立即添加它。 在一起,我们可以确保排行榜尽可能全面。

There is always the risk that Medium might one day restrict scraping completely, or even release their own ranking of stories. Either of these changes could make Top Medium Stories obsolete.

中总是存在这样的风险:Medium有一天可能会完全限制抓取,甚至发布自己的故事排名。 这些更改中的任何一个都可能使“中级故事”过时。

But in the meantime, I will continue maintaining Top Medium Stories, the best website for discovering awesome stories.

但与此同时,我将继续维护“顶级中级故事”,这是发现真棒故事的最佳网站。

If by this point you still have not seen Top Medium Stories, ! It’d make me very happy if the site helps you find extraordinary stories that you would’ve otherwise never stumbled upon.

如果此时您仍未看到“热门中级故事”, ! 如果该网站可以帮助您找到本来从未有过的非凡故事,那会让我感到非常高兴。

Thanks for reading! If you liked this, give it some love by pressing on the ? button!

谢谢阅读! 如果您喜欢这个,请按?来给它一些爱。 按钮!

翻译自:

软考 中级职称哪些最热门

转载地址:http://pdgwd.baihongyu.com/

你可能感兴趣的文章
Lambda表达式语法进一步巩固
查看>>
Vue基础安装(精华)
查看>>
Git 提交修改内容和查看被修改的内容
查看>>
PAT - 1008. 数组元素循环右移问题 (20)
查看>>
请求出现 Nginx 413 Request Entity Too Large错误的解决方法
查看>>
配置php_memcache访问网站的步骤
查看>>
hibernate的id生成策略
查看>>
树莓派3B+学习笔记:5、安装vim
查看>>
[Spfa][bfs] Jzoj P5781 秘密通道
查看>>
企业帐号进行IPA的打包、分发、下载安装的详细流程(转载)
查看>>
《项目架构那点儿事》——快速构建Junit用例
查看>>
{"errmsg":"invalid weapp pagepath hint: [IunP8a07243949]","errcode":40165}微信的坑
查看>>
DB2V9.5数据库使用pdf
查看>>
Java Bigdecimal使用
查看>>
SQL注入之绕过WAF和Filter
查看>>
jquery validate使用方法
查看>>
DataNode 工作机制
查看>>
windows系统下安装MySQL
查看>>
错误提示总结
查看>>
实验二+070+胡阳洋
查看>>