Monday, November 16, 2015

The SMAC stack

Technology changes with time, but as the French say, the more things change, the more they stay the same. Recently, four technologies are gaining wider traction and acceptance in markets - both the stock market, where companies are valued and shareholders (and ahem, Wall Street) make money, as well as the companies themselves, which are able to sell more product and reap rich rewards for their employees while truly delighting their customers with cheery new offerings. So let's get on with it, the SMAC stack - technologies that will rule the roost into the near to mid-term future.

Social: the rise of Facebook, Twitter, and related companies has been meteoric, and it has been incredibly difficult for even established large players in related domains (Google+ anyone?) to take market-share from them.

Mobile: in the days of yore, Bell Labs used to be one of the greatest (if not _the_greatest_) places where advanced technologies were incubated. In fact, that's where mobile phone technology first came from... and they worked on some new (at the time) realizations of specialized services for location, collaboration, content delivery etc to the handset - using industry standard APIs like Parlay and OSA. In today's world, sadly Bell Labs doesn't exist in any form close to its former glory anymore, and most of the (good) people moved on to either academia, Google, Microsoft Research, Wall Street, or other such places, but mobile still rocks. Attempts to standardize APIs seem to have failed, and who needs them anyway when companies can build proprietary solutions in walled gardens, or gradually open up walled gardens to gain market-share, each with their own APIs?

Analytics: care to extract some actionable intelligence from huge data sets? what about making data-driven (as opposed to whimsical) decisions to advance a corporate cause? Well, analytics is your thing then. With the greater proliferation of machine learning, improved (and improving) algorithms, massive parallelization, larger more efficient means of accessing large data sets, analytics will continue to grow. Be warned though, as Andrew Lo says "if you torture a data set long enough, it will confess to anything."

Cloud: you got to keep your content somewhere. Cloud anyone? First there was Java with WORA (write once, run anywhere), but that is so late 90s now, with virtualization becoming much more commonplace. What really makes things go today is large amounts of data gathered from a large number of sensors. Earlier one would capture only data they knew they might need. Today the mantra seems to be "how do you know what you will need later? get all the data you can hold" -- and this leads to a massive "data-base in the sky", a cloud if you will, where you can write once, and access from anywhere. Cool, huh?

"Micro-analytics" - a budding field - don't collect all the data in the world (in some cases, you are given the data set and don't have the luxury to get more data from historical sources), try using the data you have to extract the best intelligence you can from it. Admittedly in many cases the inferences you can draw will be limited by the data you have. In others though, the focus on the smaller data set might actually lead to a sharper focus on the variables captured and better results. In machine learning, as with all else, it comes down to what data you have or collect, what methods you know, and most importantly, how you use the information at your disposal. As they say, "it is always up to the operator".

Saturday, August 15, 2015

ChiPrime Quant with new content now on reddit!!!

A sub-reddit for ChiPrime Quant prep was launched here today. This will be periodically updated with new problems with a particular focus on preparation for the quant sections of competitive examinations like the GRE, GMAT, and CAT. Most questions posed there will at least initially be of the multiple choice variety, though over time, as the material increases in degree of difficulty, problems (still based on topics within the tested material for the above mentioned competitive exams) might be offered in different formats encouraging readers to work out their own solutions, with options to choose from not being provided in all cases.

Enjoy the new content! 

Thursday, July 9, 2015

why we study what we study (or don't) ... and (possible) applications of early education in later life

Kids often complain that many of the things they are "made to study" in school simply have no real world application or use ... so why bother? In this post we look at some subjects people study growing up, and how they might (not will, might) potentially be useful as they grow into adulthood. Of course, given my STEM background, some of those biases might show, but I try to take as objective a view as I can, as I type this. Might add more as new ideas occur to me...

History: 
"Why study history, mommy? How does it matter who ruled when, what wars were fought, and how today's world came to be?" - seems like rational criticism, especially when kids are forced to remember dates for major events - not fun. However, history plays an important role in letting us see cause and effect, learn from mistakes (some that were monumental) others have made, without having to make them ourselves. It also teaches us how statesmen learned, were forged in the fire of destiny, and remade the world into (in many cases) a better place than they found it. It also helps us understand how policy is formed, why it is important, and how to live well. "Those who do not know history are condemned to repeat it." and "History doesn't repeat itself but it often rhymes" are two quotes that immediately spring to mind about history. Besides, history of specific areas - like the history of technology or of finance, can be extremely important to understand recent and upcoming developments, or why regulations are crafted in specific ways. There can be profit in this too.

Geography:
Does it matter which country is where on the planet? Why, of course it matters! Two quick examples:

  1. If you are an investor, you want to know what kinds of social, cultural, political, meteorological, agricultural etc impacts geographic boundaries and tensions might have on your capital deployment plans. Will a war in a neighboring country affect my investment in this one?
  2. If you are a technologist, you want to know all about the markets you are targeting, to learn more about the demography, spending patterns, the "lay of the land" to discern technical solution delivery options etc. A colleague of mine used to give his students in a communications networks class the assignment to design a communications network strategy for all the kingdoms in the Lord of the Rings, given their distinct geographic features and weather patterns. 

Social Studies:
People are not automatons. Different people have different cultures and social mores, and even within those contexts behave differently. But getting a macro-level understanding of crowd behavior often requires the ability to slice populations laterally by spending patterns, or age, or gender, and vertically by country, continent or some other metric. Purely statistical social studies venture into the realm of Asimov's "psychohistory" (from his brilliant "Foundation" series), but apply directly today in many ways. Again, two examples:

  1. A country edges ever closer to default with a heavy debt burden from borrowings from multilateral organizations already, but severe austerity measures have left it weak. What is the potential impact a possible default may have on the social fabric within the country? on various surrounding markets?
  2. You want to deploy a new mobile technology into a country with a large population. What do you know about the spending patterns of the people there, and their willingness to purchase your service, would they move quickly or slowly? en masse or separately? How is the diaspora of one country connected to the rest of the world? (typically there are lots of connections to the home country and then with other emigrated populations in other countries as well).
Math:
The importance of mathematics to real life cannot be over-emphasized. If you work in retail, at the very least you need to be able to make change, and depending on role, need to track market structure, marketing opportunities, inventory, supply chain management and the like. If you work as a programmer, you need to understand data structures, algorithms etc. You work in finance, you need to understand (at least) the basic mathematics that underlies finance. Every role you can think of that has a "science" component to it requires at least some understanding of math, and every person needs to learn at least some of it to be successful.

Finance:
Everyone needs to manage their finances well, tap the cheapest sources of funding when they need the cash to go to school, acquire an asset etc, get out of (hopefully never get into in the first place) credit card debt, manage stock options at work, manage their 401K accounts if they have them, and tax, estate, and retirement planning. This is another key field everyone needs to at least get somewhat of a grasp of.

Music:
Some things nourish our bodies, others, our souls. Music is one such. It can elevate your emotions when you feel down, and as recent research has shown, synthesized happiness is as good as the real thing. I tend to think of music as (the good kind of) chocolate for the soul.

Physics:
Well, things that go up come down, but there is more to physics than just gravity. Learning physics enables one to appreciate the natural world we live in, while at the same time enabling us to apply natural analogs to thinking about complex dynamic systems - if we work with such in other fields e.g. finance. And physics lends itself to mathematical modeling and application. Need I say more?

Chemistry:
This was popularized in a cool but perhaps not so righteous way in "Breaking Bad". But chemistry has powerful uses - want to get tarnished silver clean? neutralize the sting of a bee or a wasp? Well, recall your knowledge of chemistry then. Besides, you have to admit, knowing the formulas for chemical compounds and all kinds of equations for various reactions is wicked fun in its own way (alright, I am a geek, but still...).

Biology: (includes basic knowledge of anatomy, pharmacology)
A basic knowledge of biology and biological processes is extremely important both for first-aid type situations - what to do when Henry broke his arm? Jeffrey is bitten by a snake? Callum suffers pain radiating into his left shoulder into his arm? Also, knowing where is what in your body, at least roughly helps - severe pain on the right side of your abdomen? That's where your appendix is, you dolt! Rush to the doctor!

Languages:
Language here refers not just to the syntax and semantics, but also the underlying culture. Not enough to know Mandarin Chinese if you don't understand how to use it idiomatically and the underlying nuances of what various ways of stating the same thing convey to native speakers.

Engineering:
People use ideas from engineering in some form or another every day without thinking too much about it. Need to raise a heavy load? Use a lever. Or set it on wheels and use an inclined plane. Or use a pulley. Want to simulate a complex natural system? Build a computer (or better still, use one already built... see below). Still wondering if the moon is made of cheese? Go build a spaceship and take a trip.

Programming:
This helps you speak with our future computer overlords in their native tongue (besides helping you solve incredibly complex problems or simulate/model complicated dynamical systems seemingly effortlessly). Need I say more?

Friday, July 3, 2015

simple sample questions for financial services interviews


This starts out as a grab-bag of questions, but will evolve over time into separate categories. These are all quite simple questions that are fair game in any financial services interview.
  1. what is the capital structure of a firm? Explain the Miller-Modigliani Theorems.
  2. what is the difference between debt and equity financing? explain the different ways in which a company can obtain financing.
  3. what is the difference between disinflation and deflation? what is the difference between deficit and debt?
  4. explain how you would compute capex from financial statements.
  5. what are some differences between stocks and bonds?
  6. what are derivatives? what are Greeks?
  7. what is fundamental analysis? quantitative analysis? technical analysis? what is ratio analysis? explain with examples. what ratios would you focus on while picking stocks?
  8. what is the Black Scholes model? what are some other ways of pricing options?
  9. what is stochastic volatility? what is local volatility?
  10. what is liquidity risk? how would you measure it?
  11. what is the yield curve? what does it tell you about the condition of the macro-economy? what are key rate duration or KRD points? what is interpolation?
  12. what is the impact of interest rate hikes on various financial instruments? stocks? bonds? commodities?
  13. what is the difference between fiscal and monetary policy?
  14. what is the difference in roles between the Federal Reserve and the Department of the Treasury?
  15. what is a foreign exchange rate? what is triangular arbitrage?
  16. what are futures? forwards? how are they different?
  17. what is a "commitment of traders" report? what does it tell you?
  18. what are the various ways in which you might value a security?
  19. what are interest rate swaps? uneven swaps? total return swaps? variance swaps? what is a day count convention? why is it important?
  20. what is portfolio immunization? how does it work?
  21. what do you understand from "market micro-structure"?
  22. what is back-testing? how is this carried out? what is data mining, why is it dangerous?
  23. what is machine learning? how do you think it can be used in Finance?
  24. what is the difference between a balance sheet and income statement?
  25. what is VaR? what does it tell you? what is estimated short-fall (ES)?
  26. what are the different ways of measuring VaR? give examples of each.
  27. what is a bond? what is a callable bond? a putable bond? a floater? a sinker?
  28. what is PPP? what is the big Mac index?
  29. what is an index? what are the different types of indices? what is an ETF? how does it work?
  30. what is a real option? explain with a simple example where and how you would use one.
  31. (*) what is "contingent claims analysis"?
  32. what is credit risk? explain with an example how you would analyze credit risk.
  33. what was the "flash crash"?
  34. explain with examples the different kinds of exchange rate regimes. what is a dirty float? currency substitution? currency board? a currency union?
  35. what is volatilty? how is it measured in the stock market? in the bond market?
  36. what is a regression? what is a t-statistic? what is R2?
  37. what are some things you would check when you run a simple linear regression? what are some advantages of a regression analysis? disadvantages?
  38. what is correlation? how is it different from causation?
  39. what is the difference between correlation and covariance? how do a correlation matrix and the corresponding covariance matrix relate to each other?
  40. what is the CAPM? what is Beta? what is alpha? True or False: "capturing alpha is a zero sum game, Beta is not." True or False: "higher risk investments produce higher returns". True or False: "Retained Earnings are an asset." - why, or why not?
  41. what are some measures of portfolio performance? what is the Sharpe Ratio?
  42. what is modern portfolio theory (MPT)?
  43. what is a mortgage loan? explain why it is an amortizing loan, and how such loans work. what is a negatively amortizing loan?
  44. you are given 100M USD to invest. explain in the current economic context how you would invest it, and why. also factor in major likely sources of economic uncertainty on the horizon and explain how you would manage your portfolio through these events.
  45. what are convertible bonds?
  46. what is a bail-out? what is a bail-in? how are they different? explain with examples.
  47. explain "taper tantrum". pick a financial crisis, explain which countries were affected, what happened then, and how things were eventually resolved.
  48. what is principal components analysis? how does it work?
  49. what is the volatility surface? what is a vol-cube? how is it useful to price options?
  50. what is the difference between implied, historical, forecasted, and realized volatility? (*) what is GARCH?
  51. explain the advantages and disadvantages of using the normal distribution for modeling stock returns.
  52. what is bond duration? how is it different from maturity? what is Macaulay duration? effective duration? how does coupon size affect duration? can an instrument have negative duration?
  53. what is convexity? can bonds have negative convexity? explain.
  54. what is the skewness of a distribution? kurtosis? 
  55. (*) what is DxS? what is spread duration?
  56. what is contango? what is normal backwardation? what is backwardation? how are backwardation and normal backwardation different if at all? can you have contango and normal backwardation at the same time?
  57. explain compounding with interest rates. derive the formula for continuous compounding.
  58. what is arbitrage? what is index arbitrage? risk arbitrage? give an example.
  59. what are the different types of options? explain with examples.
  60. what is leverage? how does it work? why would increased leverage increase risk?
  61. what is the difference between the real and the nominal interest rate? what is inflation? why is gold considered an inflation hedge by some? what are TIPS? how do they work?
  62. what is a stop-loss? what is a Japanese candlestick? what is a technical indicator?
  63. what is a credit default swap? how does it work?
  64. what is the difference between being illiquid vs. being insolvent?
  65. what is margin? what is initial margin? variation margin? what is a margin call?
  66. (*) what is geometric Brownian Motion? why is it useful in finance?
  67. what is the normal distribution? what is the height of a standard normal distribution (mean 0, standard deviation 1)?
  68. (*) what is a Markov Chain?
  69. (*) what is a martingale?
  70. what is Monte-Carlo simulation? when would you use it? explain with an example.
  71. what are on-the-run and off-the-run securities? which ones are more liquid? more expensive? why?
  72. what is sub-prime mortgage debt? what are AltA mortgages? 
  73. what are asset-backed securities? what is a collateralized debt obligation (CDO)? what are REITs? how do they work?
  74. what is securitization? what is collateral? what is over-collateralization? what are recourse and non-recourse loans?
  75. describe the functions of rating agencies? give examples.
  76. what is the difference between investment grade and high yield debt? what are debt covenants? how would you evaluate them?
  77. what happens to stock and bond holders' investments in a firm if it goes bankrupt? what is the difference between chapter 7 and chapter 11 bankruptcy?
  78. what is a hedge? what is a hedge ratio? what does it mean to say you "roll a hedge"?
  79. what is mean-variance optimization? how does it work, and what does it do?
  80. what is the capital market line (CML)? the security market line (SML)?
  81. what is diversification? how many stocks or bonds do you think you need to hold in a portfolio for risk to be diversified? why?
  82. (*) give examples of leading, lagging, and contemporaneous economic indicators that might impact markets.
  83. what is senior and subordinated debt? what is a mezzanine tranche?
  84. what is the difference between US Treasury and corporate debt?
  85. what is an IPO? (*) what is a reverse IPO? how does it work? what is a hostile takeover? what is a poison pill provision?
  86. what are primary and secondary markets? what is the meaning of the terms "exchange traded" and "traded OTC"?
  87. (*) what is the Taylor Rule? 
  88. (*) what is Okun's Law? 
  89. (*) what is the Beveridge Curve? 
  90. (*) what is the Philips Curve? what is money velocity?
  91. what is trade surplus? what are twin deficits? explain the impact of a country's exchange rate on its trade with examples.
  92. what is balance of payments accounting? how does it work?
  93. what is hyper-inflation? what is deflation? what is quantitative easing? what do you understand from the phrase "Operation Twist"?
  94. (*) can a company have negative shareholder equity? what does it mean?
  95. what are retained earnings? how does dividend policy influence stock price?
  96. what is sector rotation? how does it help?
  97. what is a benchmark? what is a peer-group? how do these factor into portfolio management?
  98. what is the difference between financial and managerial accounting?
  99. what is a recession? what would you do as Fed Chair if the USA was in a recession? what is the Fed dual mandate? how does the Fed's mandate differ from those of other central banks e.g. BoE and ECB?
  100. explain some ways in which firms can use accounting tricks to make their financial statements look better. say what you would do as an analyst to uncover these tricks and evaluate the true quality of a company.
  101. what is arbitrage pricing theory (APT)? what is the Fama-French 3 factor model?
  102. what is FCF? what is EV? what are some ratios you would consider with these quantities to evaluate an investment opportunity?
  103. what is value investing? how is it different from growth investing?
  104. what is scenario analysis? what is stress testing? how does it work? explain with an example.
  105. what is tracking error volatility or TEV?
  106. why does a company like Apple with so much cash take on debt?
  107. are savings deposits assets or liabilities from a bank's perspective? (careful!)
  108. explain how you would evaluate which projects a firm should take on and which it should reject. compare and contrast at least three different modeling techniques.
More questions:
  1. What is the Fed Funds rate? What is the Fed Discount rate? How is the latter different from LIBOR?
  2. What is the TED spread? The LIBOR-OIS spread? What is their significance?
  3. What is a currency war? How does currency devaluation impact trade?
  4. When does the central bank raise rates? cut rates? why?
  5. What does the changing shape of the yield curve tell you about the broader macroeconomy?
  6. What is a curve flattener? steepener? what is a bull/bear steepener/flattener? How is it different from a bear/bull steepener/flattener?
  7. What are curve trades? Give examples of how you could put them on.
  8. (*) What is the Nelson-Siegel-Svensson method? Where and how is it used?
  9. A stock has low correlation with its index but a high beta. Is this possible? Explain.
  10. Explain the concept of a macroeconomic cycle and give two examples of sectors or asset classes that tend to do well in particular conditions.
  11. What do you understand from the term 2y forward 3y rate? 
  12. Which is a riskier investment, a treasury bond or a junk bond, and why? (Careful)
  13. What is the likely impact on an equity portfolio if the Fed raises rates? How would you minimize the impact?
  14. What is the likely impact on an equity portfolio if the yield curve flattens? steepens? How would you minimize the impact in adverse scenarios?
  15. Pick any famous investor. Explain his or her investment methodology, discuss its performance.
  16. What is private equity? How does it work?
  17. What is venture capital? How does a start-up avail of and use this source of funding? How is it different from more general private equity?
  18. What is a stock split? What is a reverse stock split? Give an example of a situation where a stock split and a reverse stock split might be advantageous to a company.
  19. You are an equity portfolio manager. Describe how you would handle subscriptions (money flowing in) and redemptions (investors asking for their money back). 
  20. [continues previous question] Describe how you would measure the relative liquidity of your holdings. Explain how you would manage redemptions if some subset of holdings suddenly became more illiquid.
  21. What are exotic options? Illustrate with an example. 
  22. What are second order greeks? Explain. Where would these potentially have uses?
  23. True or false. The delta of an option gives the probability that an option finishes in the money. (If false, which options greek, if any, gives this probability?)
  24. What is the put-call parity theorem, and what applications does this have?
  25. Are the following equivalent? (a) selling a call, or (b) buying a put? Why or why not?
  26. Are the following equivalent? (a) selling a put or (b) buying a call? Why or why not?
  27. Explain what a covered call is, and how you might use it to generate potentially better returns.
  28. State the Black Scholes equation. Prove it. Next, working through each of the input parameters, state what happens as it takes values at the extreme ends of its ranges, all other parameters remaining the same.
  29. What is the volatility smile? How might it be useful? (see also question 49 above)
  30. What is a moving average? What is an exponential moving average? What is the difference between them? Which one moves faster relative to the data series over which the averaging operation is performed?
  31. What is a cross-currency swap? How is it useful?
  32. Which are the only US companies to have a AAA rating?
  33. What is Ito's lemma? Where is it used?
  34. (** expect these only if you are going for a quant interview, especially with an MSCF degree) What is the Reimann-Steiltjes integral? How is it different from a Lebesgue integral? 
  35. (**) What is the Radon-Nykodim theorem? 
  36. (**) What is Grisanov's theorem?
  37. (**) What is a P measure? Q measure? What is the difference between them?
  38. Describe in some detail the steps you would follow to go from conceiving of a trading or investment strategy all the way through implementing it.
  39. (*) what is the Kelly Criterion? what is the gambler's ruin problem?
  40. Can two managers have the same volatility of returns but different betas? Can they have the same betas in their performance but offer different volatilities of their returns?
  41. Are the rates on the yield curve annualized rates?

Wednesday, May 13, 2015

The "true value" of a dollar

What is the true value of a dollar?

Let's say you are moving from Noira with currency N dollars (NOD) to another country Moira. That country also has (local) M dollars (MOD), but the tax rate there is much lower. Let's say the tax rate at Noira is 40%, and in Moira 15%, but there is also a 7% or so VAT on every good you by.

As you move, you are told to take a pay-cut of 10% i.e. your current emoluments in NOD are converted to MOD, then multiplied by 0.9, and that's the yearly MOD you are paid once you make the move. Question is, should you take the job?

This calculation is a completely non-trivial undertaking. For instance, there may be things about Moira that are more of value to you in intangible terms than what Noira offers. And since life never works this way, it is unlikely that taken as a set, all attributes of one or the other country totally dominate the attributes of the other under consideration in terms of your preference. [OK, I am considering roughly comparable economies. If you were to take "the best" (by whatever arbitrary metric you choose) country that is in the so-called "developed economies", and another that is the lowest of the countries in the so called "developing world", then attribute dominance may in fact exist.]

Total attribute dominance makes answering your question easier, but the easier the question is to answer in real terms, the higher the barrier to entry, or the moat to your promised land. Not to say such barriers are insurmountable, just that it takes more to get there.

So back to the question at hand...
From a very simple economic perspective, we can classify all consumables people use to be of one of two types - either a. goods, or b. services.

In more developed countries, manufactured goods tend to be cheaper, while services are more expensive. The converse is true in developing countries - people-power tends to be more plentiful and less expensive, while finished goods cost a great deal more. For sake of argument, let's say Noira has more expensive goods than services, and Moira, more expensive services than goods. Let's also say as a long-time resident of Noira, you spend 50% on goods, 30% on services, though change in life-style when you move might mean the mix might go down to 30/50 on each. And you are habituated to saving 20% of your income each year on a monthly basis.

The model becomes slightly more involved. If the NODMOD exchange rate is x, this means you receive x MOD for 1 NOD. If your monthly pay before the move is m NOD, then you are paid: mx*0.9 MOD after.

Your take home pay in Noira is m*0.6 NOD (40% tax rate), and you save m*0.6*0.2=0.12 m NOD every month. Along the same lines, in Moira you would save mx*0.9*0.2=0.18 mx MOD

Now, what if Noira gave you the ability to put in some amount into tax-deferred 401k like savings plan but Moira had no such? What if Noira had an international tax regime for citizens and Moira didn't? What if you were a Noira citizen, and wanted to return to Noira from Moira after a few years? What if Moira had subsidized medicine, but Noira did not? When you return you want to convert your MOD back to NOD, so you want to make sure the exchange rate is in your favor (you want a very low NODMOD rate because you want as many NOD dollars as you can get for your MOD dollars you have saved while living in Moira). And of course, you also want to make certain that when your pay is first bench-marked, the exchange rate is in your favor as well (here you want a very high NODMOD rate - the highest you can manage, because you want the benchmark to give you as high a salary in MOD as you can get for your current NOD pay). All these factors play into your decision.

This is an example of a simple mathematical model. Of course, this can be made much more realistic by adding in more assumptions. And as we get more complicated, we need to start using spreadsheets. Excel is your friend as you go down that path.

Saturday, January 31, 2015

ML applications: news aggregation, MBA admissions

In studying Machine Learning, we always of course want to learn what kinds of applications we can target with any particular method. In this post we look at a couple.


  1. News aggregators - sites like Google News need a means of aggregating the links of the same news story from different news broadcasting sites together. In previous posts we have looked at how we can identify the closeness of different news stories by constructing the reverse indices of various documents and then finding the distance between them in N-dimensional space defined by the statistically improbable words in each document as a guide - speaking in terms of vectors that computes the angle between any two vectors each representing a unique news article, and then picking those that have the smallest angle as belonging to the same story. We also looked at using Robust Hyperlinks as another equivalent method of intercepting dead links to a topic (maybe a news topic) of interest and then using the statistically improbable terms in the search or from the URL to find other links to the same story though perhaps from other sites, and forward on that content instead if the requested link is dead. A different way of achieving comparable results might be to instead cluster the documents in N-dimensional space where N is defined as the top 5 or 10 words in each article's tf-idf score set. This has similar characteristics, uses unsupervised learning, and is able to bucket a large number of articles in each go-around as news happens and new articles are published with similar tf-idf terms as those belonging to other articles that tie to the same story.
  2. MBA admissions and hiring in large corporations follow an interesting pattern. These are similar in many ways, but to keep things non-controversial let's just use the B-school example. Say for instance, you are targeting a particular B-school. Let's say you have a fantastic GMAT score, a great GPA, excellent extra-curriculars, terrific letters of recommendations, meet all the right criteria for age, academic pedigree for previous degrees, etc. How will the school decide whether or not to admit you? Well, the school of course wants to admit the best students, but what constitutes "the best" might be an illusion. Let's look at it this way. If a school says they want to have "as diverse a class as possible" and by that they factor in nationalities, competencies, pre-MBA careers, ages of students, GMAT scores, pre-MBA schools, earlier degrees, etc, then what they really mean is that rather than taking their cutoffs based on a global maximum and working their way down the hill till they give out seats to all the students that qualify before they run out, they probably cluster all students along various criteria in N-dimensional space and then pick "the best" students to admit from each cluster. This explains perfectly why if you are a male software engineer from Asia with a 780 GMAT, a background from one of the top schools in that nation, and stellar letters of recommendation, you may still lose out to a female lawyer from some other country in the developed world that is under-represented in the program/school, with a very respectable 720 GMAT, and stellar metrics in all other ways. The more there are comparables to you that apply, the harder it is for you to excel in that pool. Your class of applicant may destroy the GMAT percentile curve for all other classes of applicant, but the classes of applicant compete amongst themselves for seats available, competition across classes might not count for as much as competition within a class - there things like age, academic pedigree, past successes etc may count for more in whether or not you get offered a seat. So how does this help you, you ask? Well, if, from the list of all the factors that determine admission into a top MBA program, you are able to determine uniquely for your target school which features fall into the set they cluster on, and which features they use to differentiate candidates within the cluster, you can more effectively prepare your application to get an admission - of course, you cannot change who you are when you apply (so your cluster is likely pre-defined depending on the inflexible characteristics that make you, you), but you can present your story differently to better position yourself within your cluster. Also, it is more likely that the "qualitative" factors like your essays etc become relevant after cut-off has already been done on your cluster, you if you know the dimensionality of the clustering algorithm each school uses, and what features are actually used in that algorithm, this could help you.

Friday, January 30, 2015

Learning Machine Learning

Lately I have spent a lot of time re-learning Machine Learning from scratch (to both reinforce what I once knew but forgot, as well as to build more extensive data analytics and model building muscle that is ever so useful at work these days). This ties in well with my interest in biologically inspired algorithms (genetic algorithms etc), variants of which are widely used in fields like Finance - for example if you build a Stochastic Volatility model for Option Pricing like the Heston or SABR, or the Bates model for Stochastic Volatility with Jump Diffusion, typically you will have to use an algorithm like differential evolution which is a variant (or maybe a special kind) of a genetic algorithm. You will also have to use techniques like partial functions also called currying or schonfinkelization.

Anyway, here is what I have done so far. This is one of but many paths possible to learn this material. Each line here takes 30+ hours to get through, but you will improve your understanding of both the underlying mathematics as well as your ability to actually implement the ideas in a work setting if you spend the time here. Here's the path that worked for me. Yes, the material in some of the lectures below overlaps, but I learn best when I see the same material presented by different people who come at it from different angles.

In terms of prerequisites, a liking for mathematics and a somewhat minimal knowledge of multivariate algebra, multivariate calculus (at least basic vector calculus including the notions of div, and grad would be useful), and some multivariate statistics would be useful. Completing MIT OCW courses like 6.004 and 6.042J or equivalently, the algorithms class offered by Tim Roughgarden (Stanford) or Sedgewick (Princeton) would be good. Those, and of course, the desire to work hard through the material when things get a little difficult.

(*) are next to what I felt were the best quality courses. Of course, your mileage may vary. Some of these are difficult and require serious work.
  1. (*) Trevor Hastie and Robert Tibshirani's Lectures at Stanford - very accessible even without too much of a math background. This is truly phenomenal. And what great guys, their textbooks are legally free to download. Hats off to them!
  2. (*) Yaser Abu Mostafa's extremely well-done lectures at Caltech on Machine Learning also cover lots of theory with mathematics (learning theory sections are a bit challenging, but very necessary). Very clear explanations.
  3. (*) Professor Andrew Ng's lectures again at Stanford have more of a practical feel to them. Again extremely well done. I took the actual Stanford class (slightly more difficult, but totally worth it), not the somewhat diluted Coursera one.
  4. (*) Plan on re-taking Professor Koller's Stanford course on Probabilistic Graphical Models - demands a lot of work and paying close attention, very challenging at times, but definitely worth the effort. Re-taking to ensure I understand everything correctly.
  5. (*) Geoffrey Hinton's lectures from the University of Toronto on Neural Nets. These are on Coursera, the content is excellent and extremely clear.
  6. (*) Coursera lectures on Mining Massive Datasets by Anand Rajaraman and Jeffrey Ullman that follow along the lines of their free book published some time ago. This course is very good but requires quite a bit of work.
  7. (*) Taking MIT 6.006 (Introduction to Algorithms), 6.034 (Introduction to AI), and 6.042J (Mathematics for Computer Science). Love Srini Devadas,Tom Leighton, and the other instructors - very gifted teachers.
  8. A Practical Machine Learning course offered by Johns Hopkins. This was quite a bit easier after all of the above.
  9. Completed the University of Washington Machine Learning on Coursera - interesting difference here is that it is case-study based and application oriented. Also quite a bit easier than the starred courses above.
  10. I plan on working my way through MIT lectures on Advanced Probability Reasoning taught by Dr. Tsitsiklis (6.041), and the Harvard CS109 class on Statistics and Analytics also online.
I have already made significant use of the methods at work - building models utilizing techniques from Neural Networks, Support Vector Machines, and Logistic Regression, and overall the lectures were so clear I understood exactly what I did in each case, what decisions I made, and why.

I will update the post with more material as I learn more machine learning.

Thursday, January 29, 2015

ChiPrime quant prep video channel!

ChiPrime is the competitive exam quant prep portal that offers free, computer adaptive, high quality, targeted content to ace the math portions of exams like the GMAT and the GRE that I wrote about in earlier posts. It looks like they have recently made a couple of improvements to their website. (Full disclosure: I do support this website, and help them maintain their high quality standards.)

One, they now offer free lessons on the content of the exams, broken down into several easy to understand modules with logical flow and worked examples. The modules build upon each other quite well.

Two, they also have a video channel on YouTube here. Not too many videos on it just yet, but the focus there is to add more quality content slowly but surely.

If you use it and like it, please like it on Facebook to help spread the word!